Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering

Shao, Wenjiang; Zhang, Xiaowei

doi:10.3390/app14114617

Open AccessArticle

Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering

by

Wenjiang Shao

and

Xiaowei Zhang

^*

School of Computer Science and Technology, Qingdao University, Qingdao 266071, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4617; https://doi.org/10.3390/app14114617

Submission received: 30 March 2024 / Revised: 23 May 2024 / Accepted: 24 May 2024 / Published: 27 May 2024

(This article belongs to the Special Issue Deep Learning and Machine Learning in Image Processing and Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

:

Subspace clustering algorithms have demonstrated remarkable success across diverse fields, including object segmentation, gene clustering, and recommendation systems. However, they often face challenges, such as omitting cluster information and the neglect of higher-order neighbor relationships within the data. To address these issues, a novel subspace clustering method named Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering is proposed. This method seamlessly embeds Markov transition probability information into the self-expression, leveraging a fine-grained neighbor matrix to uncover latent data structures. This matrix preserves crucial high-order local information and complementary details, ensuring a comprehensive understanding of the data. To effectively handle complex nonlinear relationships, the method learns the underlying manifold structure from a cross-order local neighbor graph. Additionally, connectivity constraints are applied to the affinity matrix, enhancing the group structure and further improving the clustering performance. Extensive experiments demonstrate the superiority of this novel method over baseline approaches, validating its effectiveness and practical utility.

Keywords:

subspace clustering; spectral clustering; Markov chain; manifold

1. Introduction

In the era of the data explosion, the demand for high-dimensional data is increasing in various fields, such as computer vision [1,2], gene expression analysis [3], and hyperspectral image processing [4,5]. The effective handling and representation of high-dimensional data has received the most attention. In fact, high-dimensional data have underlying low-dimensional structures [6]. This motivates us to effectively represent high-dimensional data in low-dimensional subspaces [6]. To retrieve these low-dimensional subspaces, it is typically necessary to cluster the data into distinct groups. Each of these groups can be fitted with a subspace, and this procedure is referred to as subspace clustering or subspace segmentation [7]. As an unsupervised learning approach, subspace clustering can be applied in numerous cases. For example, in supervised learning, data labeling is an important and costly operation. As a clustering method, subspace clustering can assist in data labeling, thereby reducing the learning costs. Over the past few years, various subspace clustering algorithms have been developed [8,9,10]. Among them, methods based on spectral clustering have attracted a great deal of attention due to their excellent clustering performance and theoretical guarantees. Typically, the key to spectral clustering methods lies in finding an accurate affinity matrix and utilizing it to derive clustering results. To obtain an affinity matrix that accurately characterizes the relationships among samples, researchers have proposed numerous innovative methods.

The low-rank representation clustering algorithm (LRR) and sparse subspace clustering algorithm (SSC) are classic subspace clustering methods [8,9]. The SSC employs the L-1 norm as a regularization term to achieve sparse representation while preserving local data information. The LRR utilizes the nuclear norm as a regularization term to achieve low-rank representation and preserve the global structure of the data.

The classic algorithms mentioned above have achieved excellent clustering performance and have been widely applied. However, there are still some shortcomings. First, the approaches to retaining the clustering information are limited. For example, both LRR and SSC solely utilize the self-expressive property, while other properties of the data can also be explored to extract clustering information [11,12]. Second, it is difficult for these methods to capture the high-order neighbor information. These methods usually focus on the first-order neighbor information, whereas higher-order information plays a crucial role in clustering [13]. Meanwhile, existing methods learn the manifold structure based on the first-order neighbor graph and neglect the high-order nonlinear information. Third, the algorithms do not impose a direct constraint on the block-diagonal representation of the affinity matrix.

To this end, this paper proposes a new clustering algorithm, Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering (MKSC). We briefly summarize the key contributions of this paper as follows. (1) Leveraging the Markov transition probability matrix, Markov transition probability information is embedded into the self-expression framework. This enables the algorithm to preserve more comprehensive clustering information. (2) We use the Markov transition probability matrix to construct a fine-grained neighbor matrix embedded in the MKSC algorithm, fully exploiting the high-order neighbor information and complementary neighbor information. (3) We incorporate the high-order local neighbor relationship and complementary information of the data into manifold learning, thereby learning the underlying manifold structure from a cross-order local neighbor graph. This strategy is starkly different from traditional manifold learning. (4) We enforce a connectivity constraint on the representation matrix, which is more suitable for clustering.

The remainder of this paper is structured as follows. First, we provide the research background and related work in Section 2 and Section 3, respectively. Then, we present the details of the proposed method in Section 4. Afterwards, we outline the corresponding optimization strategies in Section 5. Subsequently, we conduct comprehensive experiments and present the detailed results as well as discussions in Section 6. Finally, we conclude the paper in Section 7.

2. Background

In this section, we introduce some key terms related to MKSC.

2.1. Self-Expression Representation

Let

X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{d \times n}

denote the data matrix and each column

x_{i}

represent a sample. The self-expressive property of the data can be represented by

x_{j} = X Z_{j}

, where Z is the representation matrix and

Z_{j}

is the j-th column of Z [9]. Specifically,

x_{j} = \sum_{i = 1}^{n} x_{i} Z_{i j}

, which means that a sample in the data can be represented by a linear combination of other samples. According to the self-expression property of the data, the following optimization problem can be obtained:

\begin{matrix} min_{Z} {∥ X - X Z ∥}_{F}^{2} + f (Z), \end{matrix}

(1)

where Z is the coefficient matrix to be solved,

{∥ \cdot ∥}_{F}

is the Frobenius norm, and

f (\cdot)

is the regularization term. In fact,

Z_{i j}

reveals the importance of

x_{i}

in the new expression of

x_{j}

. The larger the value of

Z_{i j}

, the more similar

x_{i}

and

x_{j}

are and the more likely they are to be assigned to the same cluster. Therefore, Z contains rich clustering information about the data.

2.2. Markov Transition Probability Matrix

First, the similarity matrix of the data matrix X is defined as S, where

S_{i j}

represents the similarity between

x_{i}

and

x_{j}

. The details of the construction of matrix S are as follows:

\begin{matrix} S_{i j} = \{\begin{matrix} 1, & x_{j} \in η_{K} (x_{i}), \\ 0, & x_{j} \notin η_{K} (x_{i}), \end{matrix} \end{matrix}

(2)

where

η_{K} (x_{i})

represents the set of K nearest neighbors of

x_{i}

. Throughout this paper, without loss of generality, we use the Euclidean distance to measure the distances of samples and set

K = 5

.

Then, we define P as the Markov transition probability matrix based on S, and

P = D^{- 1} S

, where

P_{i j}

reveals the probability of data point

x_{i}

randomly walking to data point

x_{j}

, and D is a diagonal matrix with

D_{i i} = \sum_{j = 1}^{n} S_{i j}

. Research has found that there is a close connection between the Markov transition probability matrix and the self-expression [12]. Markov random walks tend to occur primarily within clusters, rarely crossing over to other clusters, which indicates that the Markov transition matrix can also capture the neighbor information of the data. This property has an essential advantage in measuring the relationships of the data. In fact, the Markov transition probability matrix mentioned above preserves the one-step neighbor relationships of the data. However, there are not only one-step neighbor relationships between samples but also higher-order neighbor relationships [13]. These relationships reveal the latent neighbor relationships of the samples, which are essential to fully exploit the comprehensive structures of the data. All types of relationships are visualized in Figure 1.

3. Related Work

In this section, we briefly review several classic subspace clustering methods.

3.1. Low-Rank Representation

LRR seeks a low-rank representation of the data with the following minimization problem [8]:

\begin{matrix} min_{Z} {∥ X - X Z ∥}_{2, 1} + γ {∥ Z ∥}_{*}, \end{matrix}

(3)

where

{∥ \cdot ∥}_{2, 1}

is the sum of the column-wise

l_{2}

norms of a matrix,

{∥ \cdot ∥}_{*}

is the nuclear norm, and

λ

is a balancing parameter.

3.2. Sparse Subspace Clustering

SSC assumes a sparse representation of the data, which leads to the following [9]:

\begin{matrix} min_{Z, S, E} {∥ E ∥}_{F}^{2} + {γ ∥ Z ∥}_{1} + ω {∥ S ∥}_{1}, \\ s . t . X = X Z + S + E, d i a g (Z) = 0, \end{matrix}

(4)

where

γ

is a balancing parameter.

d i a g (\cdot)

converts the diagonal elements of the input matrix into a column vector;

d i a g (Z) = 0

prevents the model from obtaining a trivial solution.

3.3. Robust and Efficient Subspace Segmentation via Least Squares Regression

Robust and Efficient Subspace Segmentation via Least Squares Regression (LSR) utilizes the least squares method to process the expression matrix and introduces the Frobenius norm to seek the low-rank representation of the data [14]. The basic model of LSR involves

\begin{matrix} min_{Z} {∥ X - X Z ∥}_{F}^{2} + γ {∥ Z ∥}_{F}^{2}, \\ s . t . d i a g (Z) = 0 . \end{matrix}

(5)

Many studies have expanded on these methods, such as block diagonal representation, nonlinear extension, and so on [15,16,17,18].

4. Proposed Method

In Figure 2, we illustrate the flowchart of the proposed method. Firstly, the data matrix and parameters are provided as inputs. Subsequently, the similarity matrix is derived from the data, which is used to compute the Markov transition probability matrix and the fine-grained neighborhood matrix. After this, leveraging the optimization results from Section 5, all variables are iteratively solved. Finally, spectral clustering as described in Section 5 is performed on affinity matrix Z, resulting in the final clustering outcome.

As described in Section 2, the representation matrix obtained from the self-expressive property preserves the similarity information of the data. Meanwhile, the Markov transition probability matrix retains the neighbor information. Generally, samples that are neighbors to each other tend to have higher similarity. Inspired by this, in order to retain the discriminative similarity and neighbor information of the data, we embed the Markov transition probability matrix into the self-expression framework and obtain the following model:

\begin{matrix} min_{Z} \frac{1}{2} {∥ X - X Z ∥}_{F}^{2} + \frac{α}{2} {∥ P - Z ∥}_{F}^{2} + γ {∥ Z ∥}_{F}^{2}, \\ s . t . d i a g (Z) = 0, \end{matrix}

(6)

where

α \geq 0

is the balance parameter and P is the Markov transition probability matrix. In Equation (6), P reveals the one-step random walk probability of the samples, which directly fuses such information with subspace clustering. However, it is insufficient to only adopt the first-order neighbor information. Besides the first-order neighbor information, high-order neighbor information is also important. This type of relationship can preserve more potential structures between the samples and improve the clustering capability of the algorithm. The high-order neighbor relationships of samples can be measured by the probability of a multi-step random walk. In order to fully mine the high-order neighbor information, we define

P_{[a]} \in R^{n \times n}

as the high-order Markov transition probability matrix, which reveals the probability of establishing a neighbor relationship between any two samples through a random walk and is defined as

\begin{matrix} P_{[a]} = \underset{a t i m e s}{\underset{︸}{P \cdot P \cdot \dots \cdot P}} = {(P)}^{a} . \end{matrix}

(7)

Using

P_{[a]}

to preserve the

a^{t h}

-order neighbor information of the data, we may recover the latent structure of the data in the representation matrix. To fully utilize the diverse-order neighbor information of the samples, we propose a fine-grained neighbor matrix:

\begin{matrix} P_{N} = \sum_{a = 1}^{N} \frac{P_{[a]} + {(P_{[a]})}^{T}}{2}, \end{matrix}

(8)

where N is the highest order of the sample neighbor information. Here, the matrix transpose ensures the symmetry of the neighbor relation, which is natural and necessary. Since

P_{N}

preserves the high-order neighbor relationships and complementary information, we adopt

P_{N}

to further extend our model as

\begin{matrix} min_{Z} \frac{1}{2} {∥ X - X Z ∥}_{F}^{2} + \frac{α}{2} ∥ P_{N} {- Z ∥}_{F}^{2} + γ {∥ Z ∥}_{F}^{2}, \\ s . t . d i a g (Z) = 0 . \end{matrix}

(9)

In Equation (9), we construct a fine-grained neighbor matrix through the Markov transition probability matrix. At the same time, this fine-grained neighbor matrix is embedded into the self-expression framework of subspace clustering.

However, nonlinear structures commonly exist in real-world data and are crucial in revealing the underlying true structural information. In order to better deal with the nonlinear relationships, manifold learning is often integrated into the learning model to capture the latent structures of the data on a manifold. Differing from the traditional manifold learning used in [16], we propose to use the cross-order neighbor graph

P_{N}

for manifold learning, which enables our model to capture the comprehensive latent structures of the data. Thus, our model is developed as

\begin{matrix} min_{Z} \frac{1}{2} {∥ X - X Z ∥}_{F}^{2} + \frac{α}{2} ∥ P_{N} {- Z ∥}_{F}^{2} + γ {∥ Z ∥}_{F}^{2} + β T r (Z L_{N} Z^{T}), \\ s . t . d i a g (Z) = 0, \end{matrix}

(10)

where

β \geq 0

,

L_{N}

is the graph Laplacian matrix of the fine-grained neighbor matrix

P_{N}

,

L_{N} = D i a g (P_{N} 1^{*}) - P_{N} \in R^{n \times n}

,

1^{*}

is a vector with all elements being 1, and

D i a g (\cdot)

converts the input vector into a diagonal matrix. The affinity matrix Z in Equation (10) essentially reveals the correlation of the data, which is expected to have non-negativity and symmetry properties in our model. Thus, the matrix Z can be treated as a connected graph and is significant if it contains k connected components that correspond to the k clusters of the data. According to the Ky Fan theorem, ideally, the number of zero eigenvalues of the Laplacian matrix of Z is equal to the number of connected components in Z. In order to better ensure this property, we further impose a connectivity constraint in our model. We define the graph Laplacian matrix of Z as

L_{Z} = D i a g (Z 1^{*}) - Z

; then, the connectivity constraint can be expressed as

min \sum_{i = 1}^{k} λ_{i} (L_{Z})

[15], where

λ_{i} (\cdot)

is the i-th smallest eigenvalue of the input matrix. Since the connectivity constraint is more efficient in preserving the data structure information and helps as a penalty, we replace the Frobenius norm with the connectivity constraint and further develop our model as

\begin{matrix} min_{Z} \frac{1}{2} {∥ X - X Z ∥}_{F}^{2} + \frac{α}{2} {∥ P_{N} - Z ∥}_{F}^{2} + γ \sum_{i = 1}^{k} λ_{i} (L_{Z}) + β T r (Z L_{N} Z^{T}), \\ s . t . d i a g (Z) = 0, Z \geq 0, Z = Z^{T}, \end{matrix}

(11)

where

γ \geq 0

. In Equation (11),

\sum_{i = 1}^{k} λ_{i} (L_{Z})

, as a connectivity constraint, can effectively partition the connected components on the graph and connect the samples within the clusters. This characteristic of the connectivity constraint can improve the algorithm’s ability to handle the group structure of the data, thereby enhancing the clustering performance.

It is noted that there are excessive restrictions on the affinity matrix Z in the model. These restrictions may limit the expression of Z. Thus, we introduce an intermediate variable B to alleviate this problem in a way similar to [15]:

\begin{matrix} min_{Z, B} \frac{1}{2} {∥ X - X Z ∥}_{F}^{2} + \frac{α}{2} ∥ P_{N} {- Z ∥}_{F}^{2} + γ \sum_{i = 1}^{k} λ_{i} (L_{B}) + β T r (Z L_{N} Z^{T}) + \frac{ζ}{2} {∥ Z - B ∥}_{F}^{2}, \\ s . t . d i a g (B) = 0, B \geq 0, B = B^{T}, \end{matrix}

(12)

where

L_{B} = D i a g (B 1^{*}) - B

, and, when

ζ \geq 0

, by minimizing Equation (12), we can make

B = Z

. Moreover, introducing

{∥ Z - B ∥}_{F}^{2}

ensures that the subproblems related to Z and B are strongly convex, which is more convenient for optimization. Equation (12) is named Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering (MKSC). MKSC has four balancing parameters:

{ζ, α, β, γ}

. In Equation (12),

ζ

controls the speed at which matrix Z and B become similar. The larger the value of

ζ

, the faster Z and B become similar, and vice versa. Similar to

ζ

,

α

controls the speed of approximation between

P_{N}

and Z. The larger the value of

α

, the faster

P_{N}

and Z approximate each other, and vice versa. For

γ

, the larger

γ

is, the stronger the connectivity constraint on the affinity matrix Z.

β

controls the manifold, which can affect the model’s ability to handle nonlinear data.

5. Optimization

In this section, we give the complete optimization process of Equation (12). Firstly, we reformulate Equation (12) in the following manner [15]:

\begin{matrix} min_{Z, B, W} \frac{1}{2} {∥ X - X Z ∥}_{F}^{2} + \frac{α}{2} ∥ P_{N} {- Z ∥}_{F}^{2} + γ T r (W^{T} (D i a g (B 1^{*}) - B)) + β T r (Z L_{N} Z^{T}) + \frac{ζ}{2} {∥ Z - B ∥}_{F}^{2}, \\ s . t . d i a g (B) = 0, B \geq 0, B = B^{T}, I ⪯ W ⪯ 0, T r (W) = k, \end{matrix}

(13)

where I is an identity matrix.

5.1. Optimization of Variable W

Fixing the variables Z and B, we can obtain the subproblem with respect to W:

\begin{matrix} min_{W} γ T r (W^{T} (D i a g (B 1^{*}) - B)), \\ s . t . I ⪯ W ⪯ 0, T r (W) = k . \end{matrix}

(14)

According to [19], the closed-form solution for W is

\begin{matrix} W = H H^{T}, \end{matrix}

(15)

where H is the matrix composed of the eigenvectors corresponding to the k smallest eigenvalues of

L_{B} = D i a g (B 1^{*}) - B

.

5.2. Optimization of Variable B

Fixing the variables Z and W, we can obtain the subproblem with respect to B:

\begin{matrix} min_{B} γ T r (W^{T} (D i a g (B 1^{*}) - B)) + \frac{ζ}{2} {∥ Z - B ∥}_{F}^{2}, \\ s . t . d i a g (B) = 0, B \geq 0, B = B^{T}, \\ I ⪯ W ⪯ 0, T r (W) = k . \end{matrix}

(16)

By calculation, Equation (16) is equivalent to

\begin{matrix} min_{B} \frac{1}{2} {∥ B - Z + \frac{γ}{ζ} (d i a g (W) {(1^{*})}^{T} - W) ∥}_{F}^{2}, \\ s . t . d i a g (B) = 0, B \geq 0, B = B^{T} . \end{matrix}

(17)

For convenience in calculation, we define

A \in R^{n \times n}

and let

A = Z - \frac{γ}{ζ} (d i a g (W) {(1^{*})}^{T} - W)

, and then we substitute A into Equation (17):

\begin{matrix} min_{B} \frac{1}{2} {∥ B - A ∥}_{F}^{2}, \\ s . t . d i a g (B) = 0, B \geq 0, B = B^{T} . \end{matrix}

(18)

The closed-form solution for Equation (18) is [15]

\begin{matrix} B = {[\frac{(Λ + Λ^{T})}{2}]}_{+}, \end{matrix}

(19)

where

Λ = A - D i a g (d i a g (A))

,

{[\cdot]}_{+}

is the non-negative projection of the matrix, and

{[L]}_{+} = max (0, L)

.

5.3. Optimization of Variable Z

Fixing the variables W and B, we can obtain the subproblem with respect to Z:

\begin{matrix} min_{Z} \frac{1}{2} {∥ X - X Z ∥}_{F}^{2} + \frac{α}{2} ∥ P_{N} {- Z ∥}_{F}^{2} + \frac{ζ}{2} {∥ Z - B ∥}_{F}^{2} + \frac{β}{2} T r (Z L_{N} Z^{T}) . \end{matrix}

(20)

Obviously, Equation (20) is convex, and, by taking the derivative of Equation (20), we obtain

\begin{matrix} Z = {[X^{T} X + (ζ + α) I + β L_{N}]}^{- 1} (X^{T} X + ζ B + α P_{N}) . \end{matrix}

(21)

We repeat the above steps until convergence. For clarity, we summarize the above optimization process in Algorithm 1.

Algorithm 1 Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering

1.: Input: X, $α$ , $ζ$ , $γ$ , $β$ , $t_{m a}$ , $t o l e r$ .
2.: Output: Z, B.
3.: Initialize: $W_{t} = 0$ , $B_{t} = 0$ , $Z_{t} = 0$ , $t = 0$ .
4.: Construct fine-grained neighbor matrix $P_{N}$ .
5.: repreat:
6.: Update $W_{t + 1}$ by Equation (15),
7.: Update $B_{t + 1}$ by Equation (19),
8.: Update $Z_{t + 1}$ by Equation (21),
9.: $t = t + 1$ .
10.: until $t \geq t_{m a x}$ or $∥ Z_{t + 1} - Z_{t} ∥_{F}^{2} \leq t o l e r$ or $∥ B_{t + 1} - Z_{t} ∥_{F}^{2} \leq t o l e r$ .

After we solve Equation (13), we use Z to construct A in a standard post-processing manner, as in [20]. Following [20], we construct A with the following steps. (1) We compute the skinny SVD of Z as

Z = U σ V^{T}

and define

\bar{Z} = U σ^{\frac{1}{2}}

to be the weighted column space of Z. (2) We obtain the normalized matrix

\bar{U}

by normalizing each row of

\bar{Z}

. (3) The affinity matrix A is constructed as

{[A]}_{i j} = (| [\bar{U}

{\bar{U}}^{T}]_{i j} {|)}^{Φ}

, where the parameter

Φ \geq 1

controls the sharpness of the affinity matrix between samples. (4) Finally, we perform Normalized Cut (NCut) [21] on A in a manner similar to [20,22].

5.4. Complexity Analysis

In this subsection, we analyze the computational complexity as follows. In Equation (15), the size of H is

n \times k

. Therefore, updating W requires complexity of

O (k n^{2})

. For B, the computational complexity is

O (n^{2})

. In Equation (21), the complexity of

X^{T} X

is

O (d n^{2})

. Due to the inversion operation, the complexity of

{[X^{T} X + (ζ + α) I + β L_{N}]}^{- 1}

is

O (n^{3})

. Simultaneously, the complexity of

(X^{T} X + ζ B + α P_{N})

is

O (d n^{2})

. Therefore, for Z, the computational complexity is

O (n^{3} + d n^{2})

. Overall, the total complexity can be written as

O (n^{3} + k n^{2} + n^{2})

in each iteration. The dominant complexity arises from the inversion of the

n \times n

matrix and the multiplication of the

n \times n

matrix, both of which have complexity of

O (n^{3})

. For multiplication, we can solve this through parallel computation. For inversion, we note the special structure. This part is composed of the sum of the diagonal matrix and the low-rank matrix. Ref. [23] provides a potential method for acceleration.

6. Experiments

To verify the effectiveness of MKSC, extensive experiments are conducted in this section.

6.1. Research Methodology

In this subsection, we introduce the details of the research methodology. We seek to verify the effectiveness of the proposed method by raising the following questions.

Can the proposed method achieve excellent clustering performance in various real application scenarios?
Can high-order neighbor information improve the clustering performance of the proposed method?
Does the high-order neighbor manifold structure enable the method to effectively handle nonlinear data?
Can the connectivity constraint guarantee the group structure of the affinity matrix?
Can the learned affinity matrix preserve the key information of the original data?

To answer these questions, we design a comprehensive series of experiments. First, we validate the clustering performance of the algorithm, including its accuracy, speed, sensitivity to parameters, and the ability to handle nonlinear data. In particular, the proposed method is compared with several state-of-the-art baseline methods regarding their clustering performance on three types of real application data sets using five clustering evaluation metrics. The speed of the proposed method is validated using a convergence analysis and the clustering time cost in Section 6.3.3 and Section 6.3.5. An algorithm that is insensitive to the parameters can be applied to more potential scenarios. Therefore, we conduct parameter sensitivity experiments in Section 6.3.7. In Section 6.3.2, the proposed method is compared with the baseline methods on a nonlinear data set. Second, we verify the impact of the high-order neighborhood information, the connectivity constraints, and the novel manifold term on the clustering performance of the proposed algorithm in Section 6.3.6. Third, we validate the ability of the learned affinity matrix of the proposed algorithm to capture key information. Generally, the number of block diagonals in the affinity matrix is equal to the number of clusters in the data. Therefore, we visualize the learned representation matrix in Section 6.3.4. Meanwhile, to validate the ability of the affinity matrix to preserve neighbor information, we demonstrate the changes in the distribution of the original data and clustered data in Section 6.3.4. In fact, by identifying key features, the data can be effectively partitioned into different clusters. In Section 6.3.8, by utilizing the self-expressive property, we visually compare the reconstructed images of X and

X Z

in order to demonstrate the ability of the affinity matrix to preserve key features.

The details of the five evaluation metrics are as follows [24,25,26].

Accuracy (ACC) is defined as

\begin{matrix} \frac{\sum_{i = 1}^{n} δ (m a p (r_{i}), l_{i})}{n}, \end{matrix}

(22)

where

r_{i}

and

l_{i}

represent the predicted and true labels of the data point

x_{i}

, respectively.

m a p (r_{i})

maps each cluster label

r_{i}

to the equivalent label from the data set by permutation such that (22) can be maximized, and

δ (x, y)

is the delta function, which returns 1 when

x = y

or 0 otherwise.

Normalized mutual information (NMI) is defined as

\begin{matrix} \frac{\sum_{i = 1}^{M} \sum_{j = 1}^{M} n_{i, j} l o g \frac{n_{i, j}}{n_{i} \hat{n_{j}}}}{\sqrt{(\sum_{i = 1}^{M} n_{i} l o g \frac{n_{i}}{n}) (\sum_{j = 1}^{M} \hat{n_{j}} l o g \frac{\hat{n_{j}}}{n})}}, \end{matrix}

(23)

where

n_{i}

and

\hat{n_{j}}

denote the sizes of the

i

-th cluster and

j

-th class, respectively.

n_{i, j}

denotes the number of data points that are present in the intersection between them, and M is the number of clusters.

Purity (PUR) is defined as

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{M} max (n_{i}^{j}), \end{matrix}

(24)

where

max (n_{i}^{j})

is the number of data points in the

j

-th cluster that belong to the

i

-th class. PUR measures the extent to which each cluster contains data points from primarily one class. More details about these measures can be found in [27].

The F-score (FS) is defined as

\begin{matrix} \frac{(1 + ρ^{2}) \times P r e c i s i o n \times R e c a l l}{ρ^{2} \times P r e c i s i o n + R e c a l l}, \end{matrix}

(25)

where

P r e c i s i o n = T P / (T P + F P)

and

R e c a l l = T P / (T P + F N)

. Here,

T P

(True Positive) indicates the number of cases where two samples belonging to the same class are grouped into the same cluster.

F P

(False Positive) occurs when two samples belonging to different classes are grouped into the same cluster.

F N

(False Negative) occurs when two samples belonging to the same class are grouped into different clusters.

P r e c i s i o n

and

R e c a l l

often exhibit a trade-off relationship, where improving one metric may occur at the cost of reducing the other. FS is a comprehensive balance between

P r e c i s i o n

and

R e c a l l

. In (25),

ρ

is the balance parameter. When

ρ = 1

,

P r e c i s i o n

and

R e c a l l

are equally important. Without loss of generality, we set

ρ

to 1.

The Adjusted Rand Index (ARI) is defined as

\begin{matrix} \frac{R I - E (R I)}{m a x (R I) - E (R I)}, \end{matrix}

(26)

where

R I

quantifies the ratio of correct decisions.

R I = (T P + T N) / C_{n}^{2}

, where

C_{n}^{2}

indicates the total number of pairs that can be formed from the sample.

E (\cdot)

is the expectation operator, and the ARI is an adjusted version of

R I

that takes values in

[- 1, 1]

. The larger the ARI, the more consistent the clustering results are with the true clustering structure.

6.2. Experimental Setup

In this subsection, we introduce the details of the experimental setup. Comparison experiments are conducted in Matlab 2018a on a machine with an Intel (R) Core (TM) i7-8700 CPU @ 3.20G Hz and 32.0 G memory. The time cost study is executed in Matlab 2018a on a machine with an 11th-Gen Intel Core i7-11700T CPU @ 1.40 GHz and 16 G memory.

In certain scenarios (e.g., classification), the label information of the data is crucial. However, it is generally expensive to manually label unlabeled data. Clustering can directly and effectively produce fairly accurate labeling results. This can greatly enhance the work efficiency and guarantee the operational performance. To verify the universality of the proposed method’s labeling capability, we enumerate three real application scenarios to conduct the experiments, including faces, handwritten digits and letters, and actions. Figure 3 shows some examples of these eight data sets.

Firstly, on the face real application data sets, we perform clustering based on the facial features of different individuals, with all images of the same individual grouped into one cluster. The details of these data sets are as follows. The Jaffe data set contains 213 images, collected from 10 Japanese female students making 7 facial expressions, and each image’s size is 26 × 26 pixels. The PIX data set comprises 100 images of 10 objects, with each image sized at 100 × 100 pixels. The YALEB data set contains 165 grayscale face images of 15 individuals; all images are 32 × 32 pixels. The ORL data set includes face images of 40 different individuals, with each individual represented by 10 images. Each image measures 32 × 32 pixels. Secondly, on the real application datasets of handwritten digits and letters, we obtain the final clusters based on the categories of handwritten digits 0–9 and letters A–Z. This allows all images of the same digit or letter to be grouped into the same cluster. The details of these data sets are as follows. The Binary Alphadigits (Alphadigit) data set contains 36 categories of digits 0–9 and letters A–Z, each category has 39 images, and each image is 20 × 16 pixels. The Semeion Handwritten Digit (Semeion) data set contains 1593 handwritten digit images from 80 people, encompassing 10 categories. Each image is 16 × 16 pixels and features 256 gray levels. The Modified National Institute of Standards and Technology (MNIST) data set comprises 7000 images of digits 0–9. For this experiment, we randomly select 400 images; each image is 28 × 28 pixels. Among these images, 50% are collected from high school students and 50% from the Census Bureau. Finally, on the Still Images (StillDB) action real application data set [28], we group the images into different clusters based on different actions. The StillDB data set comprises 467 images, encompassing six distinct actions: running, walking, catching, throwing, crouching, and kicking.

We compare the proposed method with ten baseline methods, namely Non-Negative Sparse Hyper-Laplacian Regularized LRR (NSLLRR) [29], SSC [9], Subspace Clustering Using Log-Determinant Rank Approximation (SCLA) [20], Scaled Simplex Representation for Subspace Clustering (SSRSC) [30], S-TLRR [31], R-TLRR [31], Tensor Low-Rank Sparse Representation for Tensor Subspace Learning (TLRSR) [32], Enhanced Tensor Low-Rank Representation for Clustering and Denoising (ETLRR) [33], Efficient Deep Embedded Subspace Clustering (EDESC) [34], and Pseudo-Supervised Deep Subspace Clustering (PSSC) [35]. Among them, the first four are traditional subspace clustering methods, the last two are deep subspace clustering methods, and the remaining four are tensor-based subspace clustering methods. For all methods, the range of the balance parameter is

{0.001, 0.01, 0.1, 10, 100, 1000}

, the binary kernel is used in this experiment, and 5 neighbors are retained to construct the similarity matrix. The order of the fine-grained neighbor matrix is set to 3. For EDESC, we set the parameters according to [34], with

β

set within the range

{0.001, 0.005, 0.1, 0.5, 1, 5, 10}

and d and

η

set within the range

{1, 2, 3, 4, 5, 6, 7}

. For the parameters of PSSC, they are set according to [35]. A three-layer convolutional network with a depth set to

{20, 10, 5}

and a kernel size set to

{5, 3, 3}

is employed. For each method, the experiment considers all combinations of parameters.

Brief descriptions of each method are provided in the following. (1) NSLLRR extends LRR on manifolds. We construct the similarity matrix using a binary weighting strategy and retain 5 neighbors on the graph. (2) SSC is a classic method that uses the L-1 norm as a regularization term to achieve sparse representation on the coefficient matrix. (3) SCLA is another type of extension of LRR that approximates the rank with a non-convex LDRA rather than the NN. (4) SSRSC recovers physically meaningful representations. For its parameters, we follow the original paper. (5) S-TLRR and R-TLRR are different forms of TLRR. TLRR is an approach that can exactly recover the clean data of an intrinsic low-rank structure and accurately cluster them as well. S-TLRR and R-TLRR handle slightly and severely corrupted data, respectively. (6) TLRSR can directly perform subspace learning on three-dimensional tensors. (7) ETLRR decomposes the original data tensor into three parts: a low-rank structure tensor, a sparse noise tensor, and clustering. (8) EDESC learns a set of subspace bases from the latent features extracted by a deep auto-encoder, where the bases and network parameters are iteratively refined. (9) PSSC integrates the deep SC and pseudo-supervision to obtain the similarity matrix. By utilizing pseudo-supervision, it enables the model to have better feature extraction and similarity learning capabilities.

6.3. Results

We compare our method with the ten baseline methods on three different real application data sets and demonstrate the effectiveness of the method through a series of visualization methods in this subsection.

6.3.1. Clustering Results of Real Application Data Sets

In this subsection, we assess the clustering performance of the baseline methods and MKSC. For each individual method, we tune its parameters as described in Section 6.2.

In Table 1, Table 2, Table 3, Table 4 and Table 5, we present the clustering performance of all methods on the face real application data sets. It can be observed that on the face real application data sets, the proposed method is the best clustering method, achieving the best clustering performance in 16 out of 20 cases. Several baseline methods also perform well in certain scenarios. For example, PSSC achieves top-two performance in the FS metric. However, the clustering performance of PSSC still lags behind that of the proposed method.

In Table 6, Table 7, Table 8, Table 9 and Table 10, we present the clustering performance of all methods on the handwritten digit and letter real application data sets. In all cases, the proposed method achieves the best performance. Among the baseline methods, the most competitive methods are SCLA and SSRSC. However, their clustering performance still lags significantly behind that of the proposed method. For example, in terms of the NMI evaluation metric, the MKSC algorithm outperforms SCLA by 5.96%, 11.29%, and 12.17%, respectively, on three different real application data sets.

In Table 11, we present the clustering performance of all methods on the StillDB data set. Compared to the baseline methods, the proposed method achieves the overall best performance across five metrics. Although the proposed algorithm takes second place in the PUR metric, it only lags behind the first-place SCLA by approximately 1%.

Specifically, we analyze the evidence of the superiority of the proposed method as follows. The proposed method is compared with three types of algorithms. First, compared with traditional subspace clustering, the proposed method performs better, mainly due to the utilization of the fine-grained neighborhood matrix. The application of this matrix significantly improves the ability of the algorithm to preserve the latent information and handle nonlinear data. At the same time, the introduction of connectivity constraints also enhances the group structure of the affinity matrix. Second, when compared with tensor subspace clustering, the proposed method does not involve flattening or folding operations on the matrix, thus reducing the potential for information loss. Finally, deep subspace clustering algorithms often require a large amount of samples for learning. In this paper, the size of the data sets is relatively small, potentially resulting in the model not being fully utilized. Specifically, the EDESC algorithm, which deviates from the self-expressive framework, is more suitable for large data sets.

6.3.2. Clustering Results on Toy Data Set

In this subsection, we use the nonlinear moon data set to validate the ability of the proposed algorithm to handle nonlinear data. The moon data set can intuitively demonstrate the nonlinear structures among data. Specifically, the moon data set consists of two parts, represented by purple and yellow. Although some samples of the purple and yellow parts are close to each other, they actually belong to different clusters. However, the two ends of the purple or yellow parts, which are far apart, still belong to the same cluster. The generation of the moon is as follows. First, a data matrix of size

N \times 2

is randomly generated, where N represents the number of samples, which we set to 400. Subsequently, data labels corresponding to the N samples are generated, with a total of two classes. Finally, noise is added to the data, and the noise level is set to 0.1. We provide an example of the moon data set in Figure 4.

In Table 12, we demonstrate the clustering results of the baseline method and the proposed method on the moon data set. It can be observed that the proposed algorithm exhibits significant advantages on the moon data set. This indicates that the proposed algorithm can effectively handle complex nonlinear data relationships, thereby improving the clustering performance.

6.3.3. Time Cost

In this test, we conduct experiments to verify the efficiency of the proposed method. Because the clustering accuracy is the key focus, we only compare our method with the most competitive baseline methods, i.e., SCLA and NSLLRR. In particular, the experiments are conducted in Matlab 2018a on a machine with an 11th-Gen Intel Core i7-11700T [email protected] GHz and 16 G memory (Intel, Santa Clara, CA, USA), for which the results are reported in Figure 5.

From the results, it can be seen that MKSC is much faster than NSLLRR and comparable to SCLA. Considering that MKSC has superior clustering performance, the time cost of MKSC is fairly acceptable.

6.3.4. Learned Representation

To better investigate the learning capabilities of MKSC, we visually show the group structure of the representation matrix Z learned by MKSC. In particular, we show the results on the Jaffe, MNIST, ORL, and PIX data sets in Figure 6. It can be seen that the representation matrices have clear block diagonal structures, which correspond to the clusters of the data sets. Moreover, we visualize the data points of Jaffe and PIX in Figure 7, with t-distributed stochastic neighbor embedding (t-SNE) [36]. In Figure 7, the first and second rows are the original data and the t-SNE visualization of the representation matrix Z, respectively. It is seen that the samples are overlapped with the original features and are well separated with the similarity graph. These observations visually confirm the effectiveness of MKSC in recovering the group structures of the data.

6.3.5. Convergence Analysis

To empirically verify the convergence of the proposed algorithm, we show some experimental results for a clearer illustration. For the sake of generality, we show the results on the Jaffe, MNIST, ORL, and PIX data sets. To show the convergence properties of

{Z_{t}}

and

{B_{t}}

, we plot the values of

∥ Z_{t + 1} - Z_{t} ∥_{F}^{2}

and

∥ B_{t + 1} - B_{t} ∥_{F}^{2}

for the first 50 iterations in Figure 8. We can observe that both

∥ Z_{t + 1} - Z_{t} ∥_{F}^{2}

and

∥ B_{t + 1} - B_{t} ∥_{F}^{2}

converge to zero within about 50 iterations, which confirms the efficiency and convergence of the MKSC algorithm.

6.3.6. Ablation Study

In this subsection, we validate the effectiveness of the embedded Markov transition probability information. First, we show some empirical results to confirm the effectiveness of the Markov transition probability information and the high-order neighbor information. Without loss of generality, we conduct ablation experiments on the MNIST and ORL data sets. In particular, we set

α = 0

and treat the model as the baseline, while the other parameters are tuned in the same way as in Section 6.2. Specifically, we tune the other balancing parameters within

{0.001, 0.01, 0.1, 10, 100, 1000}

and report the highest clustering performance in Figure 9.

It is observed that MKSC has better performance than the baseline, which confirms the effectiveness of the Markov transition probability information and high-order neighbor information used in the MKSC model. Then, we validate the effect of the connectivity constraints on the group structure. The clustering performance is closely related to the group structure. Generally speaking, excellent clustering performance indicates a clear group structure. Therefore, we conduct an ablation study by setting

γ = 0

to verify the impact of the connectivity constraints on the group structure. As shown in Figure 10, the connectivity constraints improve the clustering performance, indicating that the connectivity constraints make the group structure clearer.

Furthermore, we conduct an ablation study to discuss the complexity of the nonlinear relationships that can be handled by the proposed method. We validate the capability of the manifold term on three data sets, including faces, handwritten digits, and actions. The complexity of the nonlinear relationships in these data sets is different. We set

β

to 0 in order to validate the ability of the proposed cross-order manifold term to handle nonlinear relationships of different complexity levels. Figure 11 demonstrates that on the three types data sets, the proposed manifold term consistently enhances the clustering performance of the algorithm, enabling it to tackle nonlinear relationships of varying complexity effectively.

6.3.7. Parameter Sensitivity

It is crucial for an unsupervised learning method to be insensitive to the parameters to ensure its potential applicability in real-world scenarios. In this test, we conduct experiments to show how the parameters affect the learning performance of MKSC, where, without loss of generality, we show the results on the MNIST data set. Firstly, we show the effects of different combinations of

{ζ, α}

on the MNIST data set. Both

ζ

and

α

vary within the range of

{0.001, 0.01, 0.1, 10, 100, 1000}

, while

γ

and

β

are fixed to their optimal values. Secondly, we investigate the impact of various combinations of

{β, γ}

on the MNIST data set. Both

β

and

γ

are varied within the range of

{0.001, 0.01, 0.1, 10, 100, 1000}

while keeping

ζ

and

α

constant at their respective optimal values. This analysis aims to assess the performance sensitivity of our approach with respect to these parameters.

From Figure 12 and Figure 13, it can be observed that the MKSC algorithm is quite insensitive to the parameters on the MNIST data set in all metrics. These observations indicate that the MKSC algorithm is insensitive to the parameters, suggesting its potential suitability for unsupervised learning problems.

In particular, we explain the impact of the parameters on the algorithm performance. In Figure 12, the impact of

α

is small. When

α

is set to

{10, 100, 1000}

, MKSC performs well. The impact of

ζ

is also slight, and the overall performance remains stable. In Figure 13, when beta is set to

{10, 100, 1000}

, the performance overall remains good. Similar to

ζ

, the variation in

γ

has a small impact on the algorithm’s performance.

6.3.8. Data Reconstruction

To validate the capability of the affinity matrix in preserving key features, we present a comparison of the reconstructed images from X and

X Z

in Figure 14. Without loss of generality, we perform experiments on the Alphadigit, ORL, and Semeion data sets. All parameter settings are the same as in Section 6.2. It is seen that the reconstructed images contain the key features of all original samples. For example, in Figure 14, the original Semeion images of digit 4 appear distinct, but, in the reconstructed Semeion images of XZ, the key features of the original images of digit 4 are preserved, and they share very high similarity. Meanwhile, in Figure 14, although the original ORL images of the same person have different facial expressions or orientations, the reconstructed ORL images of XZ still share a high degree of similarity. Therefore, the affinity matrix Z effectively retains the key information from the original data.

6.4. Discussion

It can be observed that the proposed method achieves the best clustering performance on the three real application data sets compared to the baseline methods. While the baseline methods may perform exceptionally well in some cases, they do not consistently do so. In terms of computational speed, the proposed algorithm converges quickly and the time cost of clustering is also highly competitive. In data reconstruction, it can be seen that the proposed method is able to retain the key features of the original data. It is observed that the learned representation matrix has clear block diagonal structures. Simultaneously, as can be seen from the t-SNE visualization, the clustered sample points transition from a dispersed state to a compact formation. The ablation study verifies the effectiveness of embedding Markov transition probability information into the self-expression framework. Simultaneously, the ability of the proposed manifold term to handle nonlinear data is also demonstrated in the ablation study. The impact of connectivity constraints on the group structure is also validated in the ablation study. In the parameter sensitivity study, the proposed method maintains good insensitivity across the five clustering evaluation metrics. Overall, the proposed algorithm exhibits excellent capabilities in terms of clustering performance, key feature capture, and computational speed.

7. Conclusions

In this paper, we propose a novel subspace clustering algorithm named MKSC. In MKSC, the Markov transition probability information is embedded into a self-expression framework, preserving the diverse clustering characteristics. Additionally, a fine-grained neighbor matrix is employed to comprehensively capture the cross-order neighbor relationships and complementary information among samples. To better handle nonlinear data, MKSC leverages the manifold structure of cross-order local neighbor graphs. Furthermore, connectivity constraints are imposed on the affinity matrix to enhance the grouping structure. The experiments show that the proposed method is better than the current popular methods on real application cases.

It should be noted that noise and outliers are widely present. However, MKSC mainly considers clean data. Therefore, when dealing with data that are heavily influenced by noise, it may become less robust. Moreover, with the increasing diversity of information media, the processing of multi-view data is also a key issue. However, the MKSC algorithm only focuses on single-view data, necessitating the conversion of multi-view data into a single-view format for its application. In the future, we will further refine the algorithm to effectively handle multi-view data and achieve greater robustness.

Author Contributions

Conceptualization, methodology, software and validation, W.S. and X.Z.; formal analysis, investigation, data curation, writing—original draft preparation and visualization, W.S.; resources, writing—review and editing, supervision, project administration and funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Peng, C.; Chen, Y.; Kang, Z.; Chen, C.; Cheng, Q. Robust principal component analysis: A factorization-based approach with linear complexity. Inf. Sci. 2020, 513, 581–599. [Google Scholar] [CrossRef]
Zhang, X.; Cheng, L.; Li, B.; Hu, H.M. Too Far to See? Not Really!—Pedestrian Detection with Scale-Aware Localization Policy. IEEE Trans. Image Process. 2018, 27, 3703–3715. [Google Scholar] [CrossRef] [PubMed]
Peng, C.; Cheng, Q. Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2595–2609. [Google Scholar] [CrossRef] [PubMed]
Huang, B.; Ge, L.; Chen, G.; Radenkovic, M.; Wang, X.; Duan, J.; Pan, Z. Nonlocal graph theory based transductive learning for hyperspectral image classification. Pattern Recognit. 2021, 116, 107967. [Google Scholar] [CrossRef]
Peng, C.; Liu, Y.; Kang, K.; Chen, Y.; Wu, X.; Cheng, A.; Kang, Z.; Chen, C.; Cheng, Q. Hyperspectral Image Denoising Using Nonconvex Local Low-Rank and Sparse Separation With Spatial-Spectral Total Variation Regularization. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
Peng, C.; Zhang, J.; Chen, Y.; Xing, X.; Chen, C.; Kang, Z.; Guo, L.; Cheng, Q. Preserving bilateral view structural information for subspace clustering. Knowl. Based Syst. 2022, 258, 109915. [Google Scholar] [CrossRef]
Du, Y.; Lu, G.F.; Ji, G.; Liu, J. Robust subspace clustering via multi-affinity matrices fusion. Knowl. Based Syst. 2023, 278, 110874. [Google Scholar] [CrossRef]
Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust Recovery of Subspace Structures by Low-Rank Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed]
Elhamifar, E.; Vidal, R. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef]
Wang, J.; Shi, D.; Cheng, D.; Zhang, Y.; Gao, J. LRSR: Low-Rank-Sparse representation for subspace clustering. Neurocomputing 2016, 214, 1026–1037. [Google Scholar] [CrossRef]
Xia, R.; Pan, Y.; Du, L.; Yin, J. Robust Multi-View Spectral Clustering via Low-Rank and Sparse Decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar] [CrossRef]
Wu, J.; Lin, Z.; Zha, H. Essential Tensor Learning for Multi-View Spectral Clustering. IEEE Trans. Image Process. 2019, 28, 5910–5922. [Google Scholar] [CrossRef] [PubMed]
Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015. [Google Scholar]
Lu, C.Y.; Min, H.; Zhao, Z.Q.; Zhu, L.; Huang, D.S.; Yan, S. Robust and Efficient Subspace Segmentation via Least Squares Regression. In Proceedings of the Computer Vision—ECCV 2012, Florence, Italy, 7–13 October 2012; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 347–360. [Google Scholar]
Lu, C.; Feng, J.; Lin, Z.; Mei, T.; Yan, S. Subspace Clustering by Block Diagonal Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 487–501. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Chen, Y.; Zhang, J.; Xu, Z. Enhancing Low-Rank Subspace Clustering by Manifold Regularization. IEEE Trans. Image Process. 2014, 23, 4022–4030. [Google Scholar] [CrossRef] [PubMed]
Xiao, S.; Tan, M.; Xu, D.; Dong, Z.Y. Robust Kernel Low-Rank Representation. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2268–2281. [Google Scholar] [CrossRef] [PubMed]
Ji, P.; Zhang, T.; Li, H.; Salzmann, M.; Reid, I. Deep Subspace Clustering Networks. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Long Beach, CA, USA, 2017; Volume 30. [Google Scholar]
Xie, X.; Guo, X.; Liu, G.; Wang, J. Implicit Block Diagonal Low-Rank Representation. IEEE Trans. Image Process. 2018, 27, 477–489. [Google Scholar] [CrossRef] [PubMed]
Peng, C.; Kang, Z.; Li, H.; Cheng, Q. Subspace Clustering Using Log-Determinant Rank Approximation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 10–13 August 2015; pp. 925–934. [Google Scholar] [CrossRef]
Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar] [CrossRef]
Agarwal, P.K.; Mustafa, N.H. k-means Projective Clustering. In Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, New York, NY, USA, 14–16 June 2004; pp. 155–165. [Google Scholar] [CrossRef]
Xu, G.; Kailath, T. Fast subspace decomposition. IEEE Trans. Signal Process. 1994, 42, 539–551. [Google Scholar] [CrossRef]
Zhong, G.; Pun, C.M. Subspace clustering by simultaneously feature selection and similarity learning. Knowl. Based Syst. 2020, 193, 105512. [Google Scholar] [CrossRef]
Larson, R. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar] [CrossRef]
Hubert, L.; Arabie, P. On comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
Peng, C.; Kang, Z.; Hu, Y.; Cheng, J.; Cheng, Q. Nonnegative Matrix Factorization with Integrated Graph and Feature Learning. ACM Trans. Intell. Syst. Technol. 2017, 8, 1–29. [Google Scholar] [CrossRef]
Ikizler, N.; Cinbis, R.G.; Pehlivan, S.; Duygulu, P. Recognizing Actions from Still Images. In Proceedings of the 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar] [CrossRef]
Yin, M.; Gao, J.; Lin, Z. Laplacian Regularized Low-Rank Representation and Its Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 504–517. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Yu, M.; Shao, L.; Zuo, W.; Meng, D.; Zhang, L.; Zhang, D. Scaled Simplex Representation for Subspace Clustering. IEEE Trans. Cybern. 2021, 51, 1493–1505. [Google Scholar] [CrossRef] [PubMed]
Zhou, P.; Lu, C.; Feng, J.; Lin, Z.; Yan, S. Tensor Low-Rank Representation for Data Recovery and Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1718–1732. [Google Scholar] [CrossRef] [PubMed]
Du, S.; Shi, Y.; Shan, G.; Wang, W.; Ma, Y. Tensor low-rank sparse representation for tensor subspace learning. Neurocomputing 2021, 440, 351–364. [Google Scholar] [CrossRef]
Du, S.; Liu, B.; Shan, G.; Shi, Y.; Wang, W. Enhanced tensor low-rank representation for clustering and denoising. Knowl. Based Syst. 2022, 243, 108468. [Google Scholar] [CrossRef]
Cai, J.; Fan, J.; Guo, W.; Wang, S.; Zhang, Y.; Zhang, Z. Efficient Deep Embedded Subspace Clustering. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 21–30. [Google Scholar] [CrossRef]
Lv, J.; Kang, Z.; Lu, X.; Xu, Z. Pseudo-Supervised Deep Subspace Clustering. IEEE Trans. Image Process. 2021, 30, 5252–5263. [Google Scholar] [CrossRef]
van der Maaten, L.; Hinton, G.E. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. Visual illustration of neighbor relationships of different orders.

Figure 2. The flowchart of Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering (MKSC).

Figure 3. Examples of images selected from three different real application data sets.

Figure 4. Examples of moon data set.

Figure 5. Time costs of different methods on various data sets.

Figure 6. Examples of the learned representation matrix Z.

Figure 7. The t-SNE visualization of the Jaffe and PIX data sets, respectively. The first and second rows correspond to the t-SNE visualization of the original data and the reconstructed representation on Jaffe and PIX, respectively.

Figure 8. Examples to show the convergence of

{Z_{t}}

,

{B_{t}}

on different data sets.

Figure 8. Examples to show the convergence of

{Z_{t}}

,

{B_{t}}

on different data sets.

Figure 9. Results of comparison between MKSC and comparison method on MNIST and Semeion.

Figure 10. Results of comparison between MKSC and comparison method on ORL and Alphadigit.

Figure 11. Results of comparison between MKSC and comparison method on ORL, MNIST, and StillDB.

Figure 12. Clustering performance of MKSC with combination of

{α, ζ}

on MNIST data set.

Figure 12. Clustering performance of MKSC with combination of

{α, ζ}

on MNIST data set.

Figure 13. Clustering performance of MKSC with combination of

{β, γ}

on MNIST data set.

Figure 13. Clustering performance of MKSC with combination of

{β, γ}

on MNIST data set.

Figure 14. Comparison of data reconstruction effects. The first and second rows correspond to the effects of the original and reconstructed data, respectively.

Table 1. Clustering ACC on face real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Jaffe	1.0000	0.7653	1.0000	1.0000	0.9531	0.9432	0.9202	0.9507	0.8925	0.9812	1.0000
PIX	0.9700	0.8500	0.9700	0.8400	0.9770	0.9770	0.9800	0.9800	0.6280	0.9800	0.9900
YALEB	0.5697	0.4121	0.5515	0.5697	0.4770	0.4952	0.5303	0.3915	0.5030	0.4964	0.5697
ORL	0.7000	0.6750	0.7475	0.7055	0.5975	0.5872	0.6545	0.6440	0.1927	0.7125	0.7750

Table 2. Clustering NMI on face real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Jaffe	1.0000	0.8133	1.0000	1.0000	0.9359	0.9236	0.9010	0.9364	0.8825	0.9697	1.0000
PIX	0.9620	0.8661	0.9620	0.8989	0.9667	0.9667	0.9765	0.9765	0.6549	0.9713	0.9900
YALEB	0.5659	0.4495	0.5716	0.5773	0.5249	0.5313	0.5738	0.5380	0.4515	0.5270	0.5943
ORL	0.8071	0.8246	0.8550	0.8273	0.7560	0.7354	0.7916	0.7902	0.3981	0.8298	0.8774

Table 3. Clustering PUR on face real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Jaffe	1.0000	0.7934	1.0000	1.0000	0.9531	0.9432	0.9202	0.7967	0.9812	0.9697	1.0000
PIX	0.9700	0.8600	0.9700	0.8700	0.9770	0.9770	0.9800	0.9800	0.5540	0.9800	0.9854
YALEB	0.5758	0.4364	0.5515	0.5697	0.4976	0.5055	0.5485	0.5133	0.3861	0.5212	0.5818
ORL	0.7150	0.7175	0.7825	0.7350	0.6278	0.6172	0.6813	0.6728	0.0313	0.7325	0.8125

Table 4. Clustering FS on face real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Jaffe	1.0000	0.7099	1.0000	1.0000	0.9111	0.8924	0.8554	0.9048	0.8369	0.9656	1.0000
PIX	0.9392	0.7902	0.9392	0.8200	0.9520	0.9520	0.9599	0.9602	0.5567	0.9655	0.9789
YALEB	0.3701	0.2457	0.3544	0.3846	0.3041	0.3180	0.3599	0.3156	0.3504	0.4432	0.3784
ORL	0.5557	0.5725	0.6498	0.5666	0.4585	0.4305	0.4981	0.5155	0.1628	0.6595	0.6252

Table 5. Clustering ARI on face real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Jaffe	1.0000	0.6732	1.0000	1.0000	0.9016	0.8808	0.8399	0.8945	0.8018	0.9582	1.0000
PIX	0.9331	0.7673	0.9331	0.8000	0.9472	0.9472	0.9559	0.9562	0.4126	0.9536	0.9789
YALEB	0.3369	0.1922	0.3094	0.3418	0.2567	0.2721	0.3163	0.2689	0.1795	0.2952	0.3362
ORL	0.5449	0.5614	0.6411	0.5558	0.4453	0.4166	0.5037	0.5155	0.0768	0.5775	0.6154

Table 6. Clustering ACC on handwritten digit and letter real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Alphadigit	0.4580	0.0349	0.4900	0.3796	0.3425	0.2184	0.3891	0.3711	0.3655	0.3141	0.5812
Semeion	0.6095	0.5267	0.6918	0.7219	0.4242	0.4242	0.4500	0.4313	0.4751	0.5286	0.8242
MNIST	0.5773	0.1547	0.5801	0.5994	0.4036	0.3329	0.4329	0.4166	0.4393	0.4650	0.6906

Table 7. Clustering NMI on handwritten digit and letter real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Alphadigit	0.6004	0.0587	0.6178	0.5240	0.4923	0.3560	0.5331	0.5205	0.5184	0.4437	0.6747
Semeion	0.5719	0.4129	0.6138	0.6024	0.3032	0.3032	0.3711	0.3490	0.3895	0.3744	0.7267
MNIST	0.5199	0.0468	0.5367	0.5136	0.3156	0.2606	0.3822	0.3529	0.3869	0.3810	0.6584

Table 8. Clustering PUR on handwritten digit and letter real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Alphadigit	0.4950	0.0598	0.5192	0.4038	0.3645	0.2357	0.4114	0.3915	0.3878	0.3440	0.5962
Semeion	0.6585	0.5355	0.7094	0.7219	0.4395	0.4395	0.4570	0.4453	0.4480	0.5480	0.8242
MNIST	0.6326	0.1685	0.6354	0.6271	0.4511	0.3649	0.4881	0.4494	0.4775	0.5125	0.7293

Table 9. Clustering FS on handwritten digit and letter real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Alphadigit	0.3262	0.0556	0.3510	0.2436	0.1137	0.1137	0.2443	0.2384	0.2748	0.2288	0.4041
Semeion	0.4841	0.3886	0.5684	0.5753	0.2947	0.2947	0.3465	0.3313	0.3554	0.3685	0.7042
MNIST	0.4719	0.1971	0.4574	0.4921	0.2801	0.2248	0.2957	0.2943	0.3470	0.3395	0.4940

Table 10. Clustering ARI on handwritten digit and letter real application data sets.

Data	NSL-LRR	SSC	SCLA	SSRSC	S-TLRR	R-TLRR	TLRSR	ETLRR	EDESC	PSSC	MKSC
Alphadigit	0.3071	0.0032	0.3326	0.2220	0.0884	0.0884	0.2228	0.2168	0.2161	0.1538	0.3858
Semeion	0.4184	0.3183	0.5171	0.5269	0.2154	0.2154	0.2705	0.2534	0.2732	0.2283	0.6711
MNIST	0.4068	0.0019	0.3937	0.4401	0.1900	0.1272	0.2058	0.2068	0.2371	0.2249	0.4746

Table 11. Clustering results on action real application data sets.

Method	ACC	NMI	PUR	FS	ARI
NSLLRR	0.4423	0.1643	0.3945	0.3621	0.4465
SSC	0.3123	0.1498	0.3544	0.2943	0.1165
SCLA	0.4531	0.2365	0.4674	0.3411	0.3026
SSRSC	0.4120	0.2540	0.4206	0.3368	0.4032
S-TLRR	0.3754	0.2144	0.3598	0.2864	0.3742
R-TLRR	0.3754	0.2144	0.3326	0.2742	0.3651
TLRSR	0.3868	0.3095	0.3654	0.3421	0.4013
ETLRR	0.3875	0.2412	0.4469	0.3018	0.2874
EDESC	0.4032	0.2946	0.3587	0.3522	0.3746
PSSC	0.3956	0.1361	0.3496	0.3136	0.3534
MKSC	0.4769	0.3203	0.4532	0.4432	0.4963

Table 12. Clustering results on moon data set.

Method	ACC	NMI	PUR	FS	ARI
NSLLRR	0.8900	0.5448	0.8900	0.8035	0.6045
SSC	0.6538	0.1837	0.6538	0.5618	0.1361
SCLA	0.6925	0.1121	0.6925	0.5759	0.1461
SSRSC	0.8475	0.3841	0.8475	0.7402	0.4817
S-TLRR	0.6575	0.0729	0.6575	0.5476	0.0970
R-TLRR	0.6575	0.0729	0.6575	0.5476	0.0970
TLRSR	0.5025	0.1394	0.5025	0.3618	0.1029
ETLRR	0.6817	0.1539	0.6817	0.5537	0.2639
EDESC	0.7716	0.3384	0.7716	0.6071	0.3382
PSSC	0.8637	0.7034	0.8637	0.7418	0.5539
MKSC	0.9975	0.9773	0.9975	0.8134	0.7361

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shao, W.; Zhang, X. Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering. Appl. Sci. 2024, 14, 4617. https://doi.org/10.3390/app14114617

AMA Style

Shao W, Zhang X. Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering. Applied Sciences. 2024; 14(11):4617. https://doi.org/10.3390/app14114617

Chicago/Turabian Style

Shao, Wenjiang, and Xiaowei Zhang. 2024. "Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering" Applied Sciences 14, no. 11: 4617. https://doi.org/10.3390/app14114617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering

Abstract

1. Introduction

2. Background

2.1. Self-Expression Representation

2.2. Markov Transition Probability Matrix

3. Related Work

3.1. Low-Rank Representation

3.2. Sparse Subspace Clustering

3.3. Robust and Efficient Subspace Segmentation via Least Squares Regression

4. Proposed Method

5. Optimization

5.1. Optimization of Variable W

5.2. Optimization of Variable B

5.3. Optimization of Variable Z

5.4. Complexity Analysis

6. Experiments

6.1. Research Methodology

6.2. Experimental Setup

6.3. Results

6.3.1. Clustering Results of Real Application Data Sets

6.3.2. Clustering Results on Toy Data Set

6.3.3. Time Cost

6.3.4. Learned Representation

6.3.5. Convergence Analysis

6.3.6. Ablation Study

6.3.7. Parameter Sensitivity

6.3.8. Data Reconstruction

6.4. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI