Next Article in Journal
Multi-Agent Vision System for Supporting Autonomous Orchard Spraying
Previous Article in Journal
Synthetic Aperture Radar Image Despeckling Based on a Deep Learning Network Employing Frequency Domain Decomposition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Tensorized Discrete Multi-View Spectral Clustering

1
School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, China
2
School of Telecommunications Engineering, Xidian University, Xi’an 710071, China
3
Department of Computing, The Hong Kong Polytechnic University, Hong Kong 100872, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(3), 491; https://doi.org/10.3390/electronics13030491
Submission received: 23 November 2023 / Revised: 28 December 2023 / Accepted: 3 January 2024 / Published: 24 January 2024

Abstract

:
Discrete spectral clustering directly obtains the discrete labels of data, but existing clustering methods assume that the real-valued indicator matrices of different views are identical, which is unreasonable in practical applications. Moreover, they do not effectively exploit the spatial structure and complementary information embedded in views. To overcome this disadvantage, we propose a tensorized discrete multi-view spectral clustering model that integrates spectral embedding and spectral rotation into a unified framework. Specifically, we leverage the weighted tensor nuclear-norm regularizer on the third-order tensor, which consists of the real-valued indicator matrices of views, to exploit the complementary information embedded in the indicator matrices of different views. Furthermore, we present an adaptively weighted scheme that takes into account the relationship between views for clustering. Finally, discrete labels are obtained by spectral rotation. Experiments show the effectiveness of our proposed method.

1. Introduction

Multi-view clustering is attracting more and more attention in artificial intelligence and pattern recognition due to the fact that multi-view data, which are everywhere in reality, include some useful complementary information for clustering [1,2,3,4,5,6]. It aims to partition multi-view data into several clusters via rationally leveraging the complementary information such that the data points in the same cluster overall have high similarity to each other. One of the most representative multi-view clustering techniques is graph-based multi-view clustering, which has good performance on arbitrary shaped clusters.
Graph-based multi-view clustering aims to obtain the view-consensus adjacency matrix or embedding, i.e., indicator matrix, by using different diffusion strategies. As we all know, it is an NP-hard problem to optimize the graph-based clustering model with the discrete constraint imposed on the labels. To get rid of this problem, an intuitive method is to learn an approximate real-valued indicator matrix instead of discrete labels. Co-regularized spectral clustering (Co-reg) [7] is one of the most classical methods. It leverages the minimum mean squared error to minimize the divergence between the soft indicator matrices of different views. However, it implicitly considers that all views are equally important for clustering, which is unreasonable in practical applications. To this end, many related multi-view clustering methods have been developed, such as adaptive weighted multi-view spectral clustering methods [8,9] and adaptive graph learning clustering methods [10,11,12,13]. Although having promising clustering results, all of the aforementioned methods need post-processing operation, e.g., K-means, to obtain the clusters of data, i.e., discrete labels. This cannot make sure that the learned real-valued indicator matrix is optimal for K-means, which leads to suboptimal performance. Furthermore, the initialization of the cluster centroid points has a great influence on the clustering performance of K-means, which leads to unstable results.
To get rid of the aforementioned disadvantage, a frequently used technique is to learn the view-consensus graph with the connected component constraint [14,15]. Although these methods directly obtain the clusters of data via the number of connected components and achieve good clustering results in the experiments, it is difficult to learn or manually select the rational parameter to ensure that the learned graph has the exact number of connected components in reality. Moreover, these methods cannot effectively utilize the complementary information embedded in the adjacency matrices of different views. Another common used method is to achieve the discrete labels by joint spectral rotation [16]. However, spectral embedding and spectral rotation are two separate processes in the model, which results in sub-optimal performance. Moreover, it cannot effectively utilize the complementary information hidden in indicator matrices of different views. To achieve a rational solution, a unified framework was proposed [17,18,19] for discrete clustering. This framework simultaneously optimizes spectral embedding and spectral rotation to achieve the discrete labels, but it, despite working for single-view data, cannot be directly applied to multi-view data, which are ubiquitous in artificial intelligence and pattern recognition. Based on this framework, some works extended it to multi-view clustering [20,21,22]. But all of them minimize the divergence between the adjacency matrices of views by minimum mean square error, which is a one-dimensional and pixel-by-pixel measurement model. Thus, they cannot exploit the spatial structure and complementary information of views. Moreover, they assume that the real-valued indicator matrices of different views are identical, which does not make any sense in practical applications.
Drawing inspiration from the fact that the difference between soft indicator matrices of different views contains some useful complementary information for clustering [9], and inspired by the advantage of the weighted tensor nuclear norm [23,24] that can leverage the complementary information hidden in views, we utilize the weighted tensor nuclear-norm regularizer on the third-order tensor, which consists of real-valued indicator matrices of views, to minimize the divergence between the indicator matrices of views. Thus, the learned view-consensus real-valued indicator matrix encodes the complementary information of the indicator matrices of views. Meanwhile, considering the fact that different views play different roles in final clustering, we present an adaptive weighted strategy, which explicitly takes into account the relationship between views for clustering. The highlights of our paper can be listed as follows:
  • Our model makes spectral clustering collaborate with spectral rotation in a unified framework for multi-view clustering. Thus, it directly obtains a discrete label matrix without post-processing.
  • Our method effectively encodes both the complementary information and discriminative information of indicator matrices of the views by using a weighted tensor nuclear norm regularizer.
  • Our weighted scheme directly considers the relationship between views for clustering. Thus, the learned indicator matrix effectively encodes discriminative information. Extensive experimental results indicate the effectiveness and efficiency of our proposed algorithm.

2. Related Works

Spectral clustering has attracted intensive attention in the literature due to its good performance on arbitrary shaped clusters and good spectral theory. Since multi-view data can provide some important complementary information for clustering, most multi-view spectral clustering methods have been proposed [7,22,25], where Co-reg [7] is one of the most classical clustering methods. It achieves the consensus indicator matrix, which is shared by all views, via leveraging the minimum mean squared error, and it has good clustering results. Wen et al. [26,27] applied it for incomplete multi-view clustering and proposed a new method. But they neglected the effect of different views for clustering. To take advantage of the effect of views for clustering, Nie et al. [8] developed an auto-weighted multiple graph learning spectral clustering. When it adaptively learns weights, it ignores the interaction between views. To tackle this problem, Zong et al. [28] leveraged spectral perturbation theory to adaptively update weights for all views and proposed weighted multi-view spectral clustering (WMSC). However, all of them do not effectively utilize the complementary information. Inspired by the advantage of the tensor nuclear norm based on tensor singular value decomposition (t-SVD) [1,24], Xu et al. [9] used the weighted tensor nuclear norm regularizer to minimize the divergence between the indicator matrices of views and presented tensor low-rank constraint co-regularized spectral clustering, which adaptively updates weights for different views. Li et al. [29] leveraged the tensor nuclear norm regularizer on the tensor, which is composed of the normalized indicator matrices of views. Tan et al. [30] proposed to employ the topology of the data to capture the data manifold, while exploring the consistency information between different views in the sample level. However, the performance of the aforementioned spectral clustering methods depends on the quality of the pre-defined adjacency matrices of views.
To adaptively learn adjacency matrices which effectively present the relationship of the corresponding view data, many methods have been developed. For example, Zhan et al. [10] developed graph learning-based multi-view clustering (MVGL) which simultaneously learns the graph, which is shared by all views, and spectral embedding, but it ignores the spatial geometric structure of the adjacency matrix. To effectively exploit the spatial structure embedded in each adjacency matrix, Xia et al. [31] presented robust multi-view spectral clustering (RMSC), which learns the view-consensus graph with low-rank constraint. Tang et al. [32] leveraged rank constraint to learn the view consensus adjacency matrix and then achieved an indicator matrix by using classical spectral clustering. To effectively exploit the relationship between the adjacency matrices of views, Tang et al. [33] developed a parameter-free graph learning model by leveraging cross-view graph diffusion. The learned adjacency matrices effectively encode the cluster structure and discriminative information of views. Wu et al. [34] imposed a t-SVD-based tensor nuclear norm constraint on the third-order tensor, which is composed of the adjacency matrices of views, to obtain the adjacency matrix. It effectively encodes the complementary information and discriminative information of data.
Although achieving promising clustering results, all of the aforementioned methods need post-processing such as K-means to obtain the clusters of data. This leads to sub-optimal clustering results. Moreover, the clustering results of their methods are unstable. To overcome this disadvantage, many methods have been developed to directly solve the discrete labels of data. For example, Nie et al. [14] sorted the manifold learning technique to learn the view-consensus adjacency matrix, which has K-connected components, and presented multi-view learning with adaptive neighbors (MLAN). However, MLAN assumes that each point has the same neighbors in different views. This does not make sense in reality due to the fact that each view has unique properties of the object that other views do not have. To improve the quality of the adjacency matrix, Nie et al. [15] learned the view-shared adjacency matrix from the pre-defined adjacency matrices of views. Zhan et al. [35] leveraged the indicator matrices of different views to learn the view-shared graph with K-connected components and presented the multi-view consensus graph clustering (MCGC) method. These methods can directly obtain the clusters of data according to the number of connected components, but it is difficult to ensure that the learned graph has the exact number of connected components in reality. To avoid this problem, according to the relationship between the non-negative matrix decomposition and spectral clustering relationship [36], Shi et al. [37] integrated the non-negative constraint into the graph learning model. In addition, Yang et al. [38] proposed a concise multi-view clustering model in order to avoid post-processing, where directly represents the cluster structure of the data through the learned common shared graph. The motivations of the aforementioned graph-learning methods are effective, but they do not effectively mine the complementary information embedded in different views.
Another commonly used technique leverages spectral rotation to obtain the clusters of data. For example, Yu and Shi [16] leveraged spectral rotation technique to achieve the discrete labels of the single view when the spectral embedding were obtained. Motivated by this work, Tian et al. [39] extended it to multi-view clustering. But in these two methods, spectral embedding and spectral rotation are two separate processes, resulting in sub-optimal clustering performance results. To achieve good clustering results, spectral embedding and spectral rotation are integrated into a unified framework for discrete clustering; many different algorithms have been developed to solve this framework [17,18,19]. Recently, some methods integrated the prediction function into this framework to solve the out-of-sample problem [40]. This framework directly obtains the discrete labels of data and achieves good performance, but it, despite working for single-view data, cannot be directly applied to multi-view data which are ubiquitous in artificial intelligence and pattern recognition. To get rid of this limitation, some works extended this framework to multi-view clustering [20,21,22]. However, they used minimum mean squared error, which is one-dimensional and a pixel-by-pixel measurement method, to learn the view-shared adjacency matrix. Thus, they cannot exploit both the spatial structure and complementary information of views. Moreover, they all assume that the real-valued indicator matrices of different views are identical, which dose not make sense in reality. To address these limitations, inspired by the ability of tensors to fully mine complementary information between views for better performance [1,23,41], we integrate multi-view spectral embedding and spectral rotation into a unified framework and employ the weighted tensor nuclear norm to uncover complementary information and spatial structural relationships within the embedding matrices across different views. Simultaneously, we adopt a judicious weighting scheme that thoroughly considers the relationship between clustering and views, offering an effective algorithm for solving the discrete label matrix. Our approach, by effectively integrating complementary information from different views, enables the model to better capture the diversity and complexity of the data. It holds the potential to provide robust support for intelligent annotation systems, particularly in domains such as image recognition, bioinformatics, and social network analysis. This can contribute to the advancement of artificial intelligence in practical scenarios.

3. Notations

According to [23,42], in this paper, the third-order tensor is denoted by bold calligraphy letter, i.e., H R n 1 × n 2 × n 3 , and H ( i ) R n 1 × n 2 denotes the i-th frontal slice of tensor H . Bold upper-case letters are used to denote matrices, e.g., H , bold lower-case letters represent vectors, e.g., the j-th column h j of H , and lower-case letters are used for elements, e.g., the element h i j k of tensor H . The discrete Fast Fourier Transform (FFT) of tensor H along the third dimension is H ¯ = f f t ( H , [ ] , 3 ) , and H = i f f t ( H ¯ , [ ] , 3 ) , where i f f t ( · ) is the inverse Fast Fourier Transform. I denotes an identity matrix. The trace of matrix H is represented by t r ( H ) . H T is the transpose of H .

4. Proposed Method

4.1. Problem Formulation and Objective Function

Given data matrix X R d × N , which contains K clusters, where d is the dimension of each data, N denotes the number of multi-view data. Let L denote the Laplacian matrix. Spectral clustering aims to partition data matrix X into K clusters by solving the following objective function:
min F T F = I t r F T L F
where F R N × K denotes the indicator matrix of data.
After obtaining indicator matrix F , a discrete solution can be obtained using K-means. This results in the sub-optimal discrete solution due to the instability of the K-means. To solve the problem, discrete spectral clustering is presented [16,17,18,19], and its objective function is
min F T F = I t r F T L F + β F R Y F 2 s . t . R T R = I , Y Ind
where R R K × K is a rotation matrix, and Y R N × K denotes discrete label matrix. 0 β is a balance parameter.
Although the model (2) obtains a good discrete solution for clustering, it only focuses on single-view clustering, and a similar investigation for multi-view spectral clustering has been found to be lacking so far. Moreover, for multi-view data X ( v ) R d v × N , ( v = 1 , 2 , , m ), m is the number of views, and d v is the dimension of the v-th view data matrix X ( v ) . Each view usually characterizes the property of the same object that cannot be exploited in other views, and different properties have different effects for clustering. So, the difference between the indicator matrices of different views can help provide useful complementary information for clustering. Moreover, good clustering should require that the similarity between the learned soft indicator matrices of different views are high, and all of them are also close to the ideal solution. Combining the aforementioned insight analysis, we have
min F ( v ) , R , Y F ω , + v = 1 m { t r F ( v ) T L ( v ) F ( v ) + β F ( v ) R ( v ) Y F 2 } s . t . F ( v ) T F ( v ) = I , R T R = I , Y Ind
where F ( v ) denotes the indicator matrix of the v-th view, and L ( v ) is the Laplacian matrix of the v-th view. Lateral slices of F are composed of F ( v ) , i.e., F ( : , v , : ) = F ( v ) . Thus, the size of F is N × m × K , and F ω , denotes the weighted tensor nuclear norm of F  [23], which is defined as
F ω , = i = 1 K j = 1 min ( N , m ) ω j σ j ( F ¯ ( i ) )
where F ¯ ( i ) represents the i-th frontal slice of the tensor F , σ j ( F ¯ ( i ) ) is the j-th singular value of matrix F ¯ ( i ) , and ω j denotes the j element of weighted vector ω . By concatenating the clustering indicator matrices from multiple views into a third-order tensor, information from different views can be integrated. Utilizing a weighted tensor nuclear norm on the third-order tensor, measuring each frontal slice along the third dimension provides in-depth insights into the multi-view feature tensor. This approach better explores the complementary information in multi-view data.
In Equation (3), it can be seen that all indicator matrices F ( v ) ( v = 1 , 2 , , m ) are treated equally. It means that all views have the same contribution for clustering. This is unreasonable in practical applications. Like the aforementioned analysis, different views characterize different contents of the same object, and different contents usually have distinct effects for clustering. Unfortunately, the model (3) neglects this. Although some weighted schemes have been presented to consider the effect of different views, in most existing weighted schemes, either the hyperparameters need to be manually selected, or the adaptive weights are independent of each other. This affects the stability of the algorithm. To overcome this problem, we describe a new weighted scheme to take into account the effect of different indictor matrices. Thus, we revise the model (3) as
min F ( v ) , α ( v ) R , Y , λ ( v ) F ω , + v = 1 m { α ( v ) t r F ( v ) T L ( v ) F ( v ) + β λ ( v ) F ( v ) R ( v ) Y F 2 } s . t . F ( v ) T F ( v ) = I , 0 α ( v ) 1 , v = 1 m α ( v ) = 1 R ( v ) T R ( v ) = I , Y Ind , 0 λ ( v ) 1 , v = 1 m λ ( v ) = 1
where λ ( v ) and α ( v ) reflect the importance of the v-th view for clustering. Through a meaningful optimization process, our method assigns higher weights to beneficial views, further enhancing the algorithm’s performance.From Equation (5), it can be observed that the second term represents the adaptive relaxed spectral embedding model, and the third term corresponds to the adaptive spectral rotation model. We integrate multi-view spectral embedding and spectral rotation into a unified framework, eliminating the need for post-processing, and directly obtain a discrete label matrix.
In Equation (5), we have that F ( v ) R ( v ) is an orthonormal matrix, i.e., ( F ( v ) R ( v ) ) T ( F ( v ) R ( v ) ) = I , while Y is a column orthogonal matrix, i.e., Y T Y = d i a g [ n 1 , n 2 , , n K ] , where n i ( i = 1 , , K ) denotes the number of data in the i-th cluster. Thus, it is unreasonable to make an orthogonal matrix approximate an orthonormal matrix. To avoid this problem and obtain a good discrete solution, inspired by spectral clustering [16], we denote H = D 1 2 Y ( Y T D Y ) 1 2 by Y in Equation (5), where D = v = 1 m 1 v D ( v ) , and D ( v ) is a degree matrix corresponding to L ( v ) . It is easy to prove that H is an orthonormal matrix, i.e., H T H = I . Thus, our final objective function is
min F ( v ) , α ( v ) R , H , λ ( v ) F ω , + v = 1 m α ( v ) t r F ( v ) T L ( v ) F ( v ) + β λ ( v ) F ( v ) R ( v ) H F 2 s . t . F ( v ) T F ( v ) = I , 0 α ( v ) 1 , v = 1 m α ( v ) = 1 R ( v ) T R ( v ) = I , 0 λ ( v ) 1 , v = 1 m λ ( v ) = 1
Remark 1.
According to the construction of tensor F , which is shown in Figure 1, we have that, for tensor F , the i-th frontal slice Δ ( i ) is a matrix whose columns are composed of vectors F : , i ( v ) ( v = 1 , 2 , , m ), where F : , i ( v ) denotes the i-th column of indicator matrix F ( v ) , which characterizes the relationship between X ( v ) and the i-th cluster. The purpose of multi-view clustering is that F : , i ( 1 ) , F : , i ( 2 ) , F : , i ( m ) are as similar as possible; ideally, they are exactly equal. Moreover, in practical applications, there is a large difference between the cluster structures of different views. Thus, the first term in the model (6), i.e., the tensor multi-rank minimization constraint on F , can make sure that Δ ( i ) has a spatial low-rank structure, which helps exploit the complementary information embedded in the inter-views and obtain the view-consensus indicator matrix.

4.2. Optimization

Drawing inspiration from the augmented Lagrange multiplier (ALM) [43], we introduce an auxiliary tensor J instead of F in the model (6), and rewrite model (6) by solving the following minimization problem:
L ( F , J , α ( v ) , R ( v ) , H , λ ( v ) ) = v = 1 m { α ( v ) t r ( F ( v ) T L ( v ) F ( v ) ) + β λ ( v ) F ( v ) R ( v ) H F 2 } + J ω , + ρ 2 J ( F + G ρ ) F 2
where tensor G denotes the Lagrange multiplier, and parameter ρ > 0 is an adaptive penalty factor.
To solve the model (7), it involves the following several subproblems, which can be alternately optimized.
  • Solving R ( v ) with fixed F ( v ) and H . In this case, each R ( v ) can be solved independently. Thus, for the v-th rotation matrix R ( v ) , Equation (7) becomes
    min R ( v ) T R ( v ) = I F ( v ) R ( v ) H F 2
By simple algebra, we have
F ( v ) R ( v ) H F 2 = t r ( ( F ( v ) R ( v ) ) T ( F ( v ) R ( v ) ) ) + t r ( H T H ) 2 t r ( ( F ( v ) R ( v ) ) T H ) = Constant 2 t r ( ( F ( v ) R ( v ) ) T H )
Then, the optimal solution of Equation (8) can be obtained by solving model (10):
max R ( v ) T R ( v ) = I t r ( R ( v ) T F ( v ) T H )
Let the singular value decomposition (SVD) of F ( v ) T H be F ( v ) T H = U ( v ) S ( v ) V ( v ) T , and according to Theorem 1, the optimal solution of the model (10) is
R ( v ) * = U ( v ) V ( v ) T
Theorem 1
([44]). Let U Σ V T denote the compact singular value decomposition (SVD) of H , then W = U V T is the optimal solution of the following objective function:
max W T W = I t r ( W T H )
  • Solving F with fixed J , α ( v ) , R ( v ) , H , λ ( v ) . Thus, Equation (7) becomes
min F ( v ) T F ( v ) = I v = 1 m α ( v ) t r ( F ( v ) T L ( v ) F ( v ) ) + β λ ( v ) F ( v ) R ( v ) H F 2 + ρ 2 J ( F + G ρ ) F 2 = min F ( v ) T F ( v ) = I v = 1 m ( α ( v ) t r ( F ( v ) T L ( v ) F ( v ) ) + β λ ( v ) F ( v ) R ( v ) H F 2 + ρ 2 A ( v ) F ( v ) F 2 )
where G ( v ) denotes the v-th frontal slice of tensor F , A ( v ) = ( J ( v ) G ( v ) ρ ) .
In Equation (13), we can rewrite the third term as
A ( v ) F ( v ) F 2 = t r ( ( A ( v ) ) T A ( v ) ) + t r ( ( F ( v ) ) T F ( v ) ) 2 t r ( F ( v ) T A ( v ) ) = 2 t r ( F ( v ) T A ( v ) ) + Constant
In Equation (13), each F ( v ) can be solved independently. Moreover, substituting Equation (14) and Equation (9) into Equation (13), for the v-th F ( v ) , by simple algebra, we have
min F ( v ) T F ( v ) = I α ( v ) t r ( F ( v ) T L ( v ) F ( v ) ) 2 β λ ( v ) t r ( ( F ( v ) R ( v ) ) T H ) ρ t r ( F ( v ) T ( A ( v ) ) = min F ( v ) T F ( v ) = I t r ( F ( v ) T α ( v ) L ( v ) F ( v ) ) 2 t r ( F ( v ) T ( β λ ( v ) H R ( v ) T + ρ 2 A ( v ) ) )
Theorem 2.
Given model (16),
min F T F = I t r ( F T AF ) 2 t r ( F T P )
its optimal solution can be obtained by solving the model (17)
max F T F = I t r ( F T A ˜ F ) + 2 t r ( F T P )
where A ˜ = λ I A is a positive definite matrix.
Proof. 
Multiply the model (16) by −1, and according to F T F = I , we easily obtain model (17).    □
According to Theorem 2, we can rewrite model (15) as model (18):
max F ( v ) T F ( v ) = I t r ( F ( v ) T ( λ I α ( v ) L ( v ) ) F ( v ) ) + 2 t r ( F ( v ) T ( β λ ( v ) H R ( v ) T + ρ 2 A ( v ) ) )
The problem of Equation (18) can be further written into
max F ( v ) T F ( v ) = I v = 1 m t r ( F ( v ) T ( B ( v ) F ( v ) + 2 C ( V ) ) )
where B ( v ) = ( λ I α ( v ) L ( v ) ) , λ is an arbitrary constant to ensure that B ( v ) is a positive definite matrix C ( v ) = β λ ( v ) H R ( v ) T + ρ 2 A ( v ) .
In Equation (19), B ( v ) F v ties up with the target variable F v , so Equation (19) cannot be solved directly. However, if we set B ( v ) F v to be stationary, then Equation (19) can be easily solved by Theorem 1. Denote U Σ V by the SVD of B ( v ) F ( v ) + C ( v ) , and according to Theorem 1, the solution of F ( v ) is
F ( v ) * = U ( v ) V ( v ) T
Algorithm 1 lists the pseudocode of solving F ( v ) .
Algorithm 1 Solve F ( v )
Input: The matrix L ( v ) , F ( v ) , H , R ( v ) , and A ( v ) .
Output: F ( v ) , β .
  1:
Initialize: Compute B ( v ) = λ I α ( v ) L ( v ) , and C ( v ) = β λ ( v ) H R ( v ) T + ρ 2 A ( v ) .
  2:
while not converge do
  3:
   Update E ( v ) = B ( v ) F ( v ) + C ( v ) ;
  4:
   Calculate U Σ V T = E ( v ) via compact SVD of E ( v ) ;
  5:
   Update F ( v ) = U V T ;
  6:
end while
  7:
Return the matrix F ( v ) .
  • Solving J with other fixed variables. Now, J can be optimized by sub-problem (21):
min J J ω , + ρ 2 J ( F + G ρ ) F 2
To optimize model (21), we first introduce Theorem 3.
Theorem 3
([23]). For A R n 1 × n 2 × n 3 , l = min ( n 1 , n 2 ) , let A = U S V T (t-SVD). For
argmin X 1 2 X A F 2 + τ X ω , ,
then, the optimal solution is
X * = Γ τ ω ( A ) = U i f f t ( P τ ω ( A ¯ ) ) V T ,
where A ¯ = f f t ( A , [ ] , 3 ) , P τ ω ( A ¯ ) is a tensor, and P τ ω ( A ¯ ( i ) ) is the i-th frontal slice of P τ ω ( A ¯ ) . P τ ω A = d i a g γ 1 , γ 2 , , γ l , γ i = max A i i τ ω i , 0 .
According to Theorem 3, the solution of the model (21) is
J * = Γ 1 ρ ω ( F + 1 ρ G ) .
  • Solving G and ρ . G can be updated by G = G + ρ ( F J ) . ρ can be updated by ρ = ρ μ , where μ is a positive number larger than 1.
  • Solving Y with fixed λ ( v ) , F ( v ) and R ( v ) . In this case, the problem (7) becomes
min Y Ind v = 1 m β λ ( v ) F ( v ) R ( v ) H F 2
Let K ( v ) = F ( v ) R ( v ) , then
F ( v ) R ( v ) H F 2 = t r ( K ( v ) T K ( v ) ) + t r ( H T H ) 2 t r ( H T K ( v ) )
It is easy to see that the first term in Equation (26) is not related to target variable Y , and the second term is not also related to Y due to the fact H T H = I . Thus, the optimal solution of the model (25) can be obtained by solving
max Y Ind v = 1 m t r ( β λ ( v ) H T K ( v ) ) = max Y Ind t r ( H P T )
where P = v = 1 m β λ ( v ) K ( v ) .
Equation (27) suffers from the expensive time burden due to the calculation of H = D 1 2 Y ( Y T DY ) 1 2 . To reduce the computational complexity, we propose a fast algorithm by Theorem 4.
Theorem 4.
H and Y have the same position of the non-zero element in each row. The non-zero element of the i-th row of H is d i d T y j .
Proof. 
For degree matrix D , its degree vector is d = D 1 , and 1 is a vector whose elements are all 1’s. Thus, we have that the i-th row of D 1 2 Y is just the i-th row of Y multiplied by d i , i.e.,
( D 1 2 Y ) i j = d i , y i j = 1 0 , else
where d i denotes the i-th element of vector d .
Thus, ( Y T D Y ) 1 2 = ( ( D 1 2 Y ) T D 1 2 Y ) 1 2 is a diagonal, and the k-th column of Y ( Y T D Y ) 1 2 is just the k-th column of Y multiplied by ( d T y k ) 1 2 , where y k is the k-th column of label matrix Y .
According to the definition label matrix Y , and combining the aforementioned analysis, we have that, for matrix H , the i-th row j-th column element h i j is
h i j = d i d T y j , y i j = 1 0 , else
From Equation (29), we have that H and Y have the same position of non-zero element in each row.    □
Now, we consider how to optimize the model (27). Since each row of label matrix Y is independent, and each row has only one non-zero element, we can update each row of Y one by one. To update the i-th row of Y , according to Theorem 4, we have
Y i j = 1 , j = arg max k t r ( H x i k P T ) 0 , else
where
H x i k = D 1 2 Y x i k Y T x i k D Y x i k 1 2
where Y x i k means that by setting the i-th data as the k-th cluster, the others remain unchanged.
Note that according to Theorem 4, we have that the i-th row of matrix H x i k is just the i-th row of matrix Y x i k multiplied by a factor. Thus, H x i k can be easily calculated by imitating Equation (29). Algorithm 2 lists the pseudo code for solving label matrix Y .
Algorithm 2 Solving Y
Input: d = D 1 = [ d 1 , d 2 , , d n ] .
Output: Y .
  1:
Initialize:  H = 0 ; label matrix Y whose each row has only one non-zero element 1;
  2:
for  i = 1 , 2 , , n  do
  3:
   Calculate Y x i k ;
  4:
   Calculate vector g . g i = j, if y i j = 1 ;
  5:
   Calculate H x i k . Update non-zero element of the i-th row of H by h i , g i = d i d T y g i , where y g i denotes the g i -th column of Y x i k ;
  6:
   Update i-row of Y according to Equation (30);
  7:
end for
  8:
Return the matrix Y .
  • Solving λ ( v ) . For the sake of a convincing description, let F ( v ) R ( v ) H F = ζ v which is known. In this case, λ ( v ) can be solved by
min λ ( v ) v = 1 m 1 λ ( v ) ζ v 2 , s . t . v = 1 m λ ( v ) = 1 , λ ( v ) 0
Due to v = 1 m λ ( v ) = 1 , according to the Cauchy–Schwarz inequality, we have
v = 1 m ζ v 2 λ ( v ) = v = 1 m ζ v 2 λ ( v ) v = 1 m λ ( v ) v = 1 m ζ v 2
In Equation (33), the equation holds if and only if λ v ζ v λ v . Moreover, the right-hand side in Equation (33) is a constant; thus, the optimal solution λ ( v ) ( v = 1 , 2 , , m ) is
λ ( v ) = ζ v / v = 1 m ζ v
  • Solving α ( v ) . According to  [8], the optimal α ( v ) is
α ( v ) = 1 2 ( F ( v ) T L ( v ) F ( v ) )
Finally, we summarize the aforementioned optimization procedure in Algorithm 3.
Algorithm 3 Tensorized discrete multi-view spectral clustering
Input:Data matrices { X ( v ) } v = 1 m , hyperparameters: K, ρ , ω and β .
Output: The label Y of data.
  1:
Initialize:  α ( v ) = 1 m , λ ( v ) = 1 m , S ( v ) according to [14];
  2:
 Calculate degree matrix D ( v ) , D = v = 1 m 1 v D ( v ) , L ( v ) = I D ( v ) 1 2 S ( v ) D ( v ) 1 2 and F ( v ) according to standard spectral clustering, randomly initialize Y ;
  3:
while not converge do
  4:
   for  v = 1 : m  do
  5:
     Calculate R ( v ) by Equation (11);
  6:
     Calculate F ( v ) by Algorithm 1;
  7:
     Calculate J ( v ) by Equation (24);
  8:
   end for
  9:
   Calculate Y by Algorithm 2;
  10:
 Calculate G by G = G + ρ ( F J ) ;
  11:
 for  v = 1 : m  do
  12:
     Calculate λ ( v ) by Equation (34);
  13:
     Update α ( v ) by Equation (35);
  14:
   end for
  15:
   Update ρ by ρ = ρ μ ;
  16:
end while
  17:
Return the label matrix Y of data.

4.3. Complexity Analysis

The computational complexity of the proposed method mainly involves the four variables ( R ( v ) , F ( v ) , Y , and J ). Firstly, solving the R ( v ) -subproblem involves calculating the SVD decomposition of the K × K matrix, which is with the complexity of O ( K 3 ) . Secondly, the complexity of solving tensor F ( v ) is O ( t 1 K 2 N ) because it involves the SVD decomposition of the N × K matrix, where t 1 denotes the number of iterations for solving F ( v ) . Thirdly, the computation of updating Y is O ( K 2 N 2 ) . Fourthly, solving the J -subproblem involves calculating the 3D FFT and 3D inverse FFT of an N × m × K tensor and N SVDs of K × m matrices in the Fourier domain, both of which are with the complexity of O ( 2 NmK log ( mK ) ) and O ( NK m 2 ) . Since in multi-view scenarios we have m N , and K and m are small constants, the main complexity of our proposed method approximately becomes O ( K 2 N 2 + t 1 K 2 N + NK ( 2 m log ( mK ) + m 2 ) ) in each iteration. Despite a certain increase in complexity compared to some non-tensor algorithms, we endeavored to pursue a more comprehensive clustering performance through the incorporation of complementary information and spatial structure in multi-view data, coupled with the adoption of an adaptive weighting scheme.

5. Converge Analysis

Theorem 5.
Algorithm 1 has good convergence, i.e.,
v = 1 m ( 1 α t + 1 ( v ) t r ( F t + 1 ( v ) T L ( v ) F t + 1 ( v ) ) + ρ 2 A ( v ) F t + 1 ( v ) F 2 + β λ t + 1 ( v ) F t + 1 ( v ) R ( v ) H F 2 ) v = 1 m ( 1 α t + 1 ( v ) t r ( F t ( v ) T L ( v ) F t ( v ) ) + ρ 2 A ( v ) F t ( v ) F 2 + β λ t + 1 ( v ) F t ( v ) R ( v ) H F 2 )
where F t + 1 ( v ) and F t ( v ) denote the optimal solution of Equation (15) in the t + 1 -th and t-th iterations in Algorithm 1, respectively.
Proof. 
For the v-th view, B ( v ) = ( λ I α ( v ) L ( v ) ) is positive definite, then we can rewrite B ( v ) = U T U via Cholesky factorization. Moreover, we have
U F t + 1 ( v ) U F t ( v ) F 2 = t r ( F t + 1 ( v ) T B ( v ) F t + 1 ( v ) ) 2 t r ( F t + 1 ( v ) T B ( v ) F t ( v ) ) + t r ( F t ( v ) T B ( v ) F t ( v ) ) 0
According to Algorithm 1, we have
t r ( F t + 1 ( v ) T B ( v ) F t ( v ) ) + t r ( F t + 1 ( v ) T C ( v ) ) t r ( F t ( v ) T B ( v ) F t ( v ) ) + t r ( F t ( v ) T C ( v ) )
where C ( v ) = β λ ( v ) H R ( v ) T + ρ 2 A ( v ) .
Combining Equation (37) and Equation (38), we have
t r ( F t + 1 ( v ) T B ( v ) F t + 1 ( v ) ) + 2 t r ( F t + 1 ( v ) T C ( v ) ) t r ( F t ( v ) T B ( v ) F t ( v ) ) + 2 t r ( F t ( v ) T C ( v ) )
Substituting B ( v ) = ( λ I α ( v ) L ( v ) ) and F ( v ) T F ( v ) = I into Equation (39), and by simple algebra, we have
t r ( F t + 1 ( v ) T 1 α t + 1 ( v ) L ( v ) F t + 1 ( v ) ) 2 t r ( F t + 1 ( v ) T C ( v ) ) t r ( F t ( v ) T 1 α t + 1 ( v ) L ( v ) F t ( v ) ) 2 t r ( F t ( v ) T C ( v ) )
Since all views are independent, we have
v = 1 m ( t r ( F t + 1 ( v ) T 1 α t + 1 ( v ) L ( v ) F t + 1 ( v ) ) 2 t r ( F t + 1 ( v ) T C ( v ) ) ) v = 1 m ( t r ( F t ( v ) T 1 α t + 1 ( v ) L ( v ) F t ( v ) ) 2 t r ( F t ( v ) T C ( v ) ) )
Substituting C ( v ) = β λ ( v ) H R ( v ) T + ρ 2 A ( v ) into the model (40), by simple algebra, we have
v = 1 m t r ( F t + 1 ( v ) T 1 α t + 1 ( v ) L ( v ) F t + 1 ( v ) ) 2 v = 1 m t r ( F t + 1 ( v ) T ( β λ t + 1 ( v ) H R ( v ) T + ρ 2 A ( v ) ) ) v = 1 m t r ( F t ( v ) T 1 α t + 1 ( v ) L ( v ) F t ( v ) ) 2 v = 1 m t r ( F t ( v ) T ( β λ t + 1 ( v ) H R ( v ) T + ρ 2 A ( v ) ) )
Combining Equation (42), Equation (14) and Equation (9), by simple algebra, we have that Equation (36) holds. □
Moreover, existing methods have demonstrated that [45] when the number of blocks is greater than or equal to 3, it is still an open problem to prove the convergence of ALM, which is leveraged in our proposed algorithm. Thus, we cannot prove the convergence of our proposed algorithm in theory. Fortunately, empirical evidence on a real dataset shows that our algorithm has a stable convergence property. We analyze the convergence on the Yale and MSRC-V1 datasets. Figure 2 lists the error F ( k + 1 ) ( v ) J ( k + 1 ) ( v ) versus the iteration number, where the x-axis and y-axis denote the iteration number and the corresponding error, respectively. It can be seen that the error decreases sharply and then becomes relatively stable in no more than 20 iterations. This indicates that our method has good convergence in real applications.

6. Experiment

We herein estimate the performance of our proposed method, which has been widely used in multi-view clustering, on five databases by using three standard clustering evaluation metrics, including Accuracy (ACC), Normalized Mutual Information (NMI), and the Purity and Adjusted Rand Index (ARI).

6.1. Experimental Setup

6.1.1. Datasets

In this subsection, we introduce five databases, which are used in the subsequent experiments:
  • Caltech101 (Cal-101) (https://tensorflow.google.cn/datasets/catalog/caltech101 (accessed on 10 September 2023)) [46] dataset has 101 classes. Like [9], we chose 441 samples with seven categories in our experiments. These samples have three views, i.e., 2560-dimensional (D) SIFT feature, 1160-D LBP features and 620-D HOG feature.
  • The MSRC-v1 (https://mldta.com/dataset/msrc-v1/ (accessed on 10 September 2023)) [47] dataset includes eight categories. Like [9], we chose 210 samples with seven categories in our experiments. These samples have five views, i.e., 24-D color moment, 512-D GIST feature, 576-D Histogram of Oriented Gradient, 254-D CENTRIST feature and 256-D LBP feature.
  • The Yale (http://vision.ucsd.edu/content/yale-face-database (accessed on 10 September 2023)) dataset has 165 samples with 15 people. These samples have different conditions, such as occlusion changes, facial expression and with or without glasses. In the experiment, it has three different views, i.e., 3304-D LBP feature, 6750-D Gabor feature and 4096-D intensity feature.
  • The ORL (http://www.uk.research.att.com/facedatabase.html (accessed on 10 September 2023)) dataset consists of 400 samples with 40 different people. These samples have different conditions, such as facial expression and occlusion changes. In the experiment, we use three views, i.e., 6750-D Gabor feature, 4096-D intensity feature, and 3340-D LBP feature.
  • Scene-15 (https://www.kaggle.com/datasets/zaiyankhan/15scene-dataset (accessed on 10 September 2023)) [48] is a scene dataset with 15 different natural scenes. It contains 4485 samples with three different views, i.e., 1800-D PHOW featue, 1240-D CENTRIST feature and 1180-D PRI-CoLBP feature.
  • The ESP-GAME (https://www.kaggle.com/datasets/parhamsalar/espgame (accessed on 10 September 2023)) dataset contains 11,032 samples over seven categories. It has two different views, each of them is 100 dimensions.

6.1.2. Comparisons

We herein compare our proposed method with 11 methods, including two representative tensor-based clustering methods (T-SVD-MSC [1] and ETLMSC [34]), three spectral clustering methods (Co-Reg [7], LTCSPC [9] and spectral clustering (SC) [49]), and six multi-graph fusion spectral clustering methods (AMGL [8], consistent and specific multi-view subspace clustering (CSMSC) [50], MLAN [14], MCGC [35], RMSC [31], and MVGL [10]). For our method, we tune the parameters k (number of neighbors) from [ 8 , 9 , 10 ] , ρ (penalty factor) from [ 0.0001 , 0.001 , 0.01 , 0.1 ] , and β from [ 0.0001 , 0.001 , 0.01 , 0.1 ] , and the metric weight ω is set within the range of ( 0 , 100 ] . Table 1 reports the values of the parameters on each dataset. To effectively estimate the performance our method with the aforementioned 11 methods, we repeat the experiments 50 times on each database in our paper.

6.2. Experimental Results and Analysis

6.2.1. Performance Comparison

Table 2 lists the ACC, NMI Purity and ARI of 11 methods on the aforementioned six datasets, where SC-best refers to the best results of SC among all views. From two tables, the following observations can be obtained:
  • Multi-view clustering methods are overall superior to SC with the best performance. The reason may be that, compared with single-view representation, multi-view representation can provide some useful complementary information, and these multi-view methods make good use of these complementary information. In the Scene-15 database, except for CSMSC, non-tensor multi-view clustering methods are overall inferior to SC. The reason may be that there is a large difference between the views for clustering, and in these methods, weights for different views are independent and neglect the relationship between views for clustering. It also indicates that it is very important to design a suitable weighted scheme for improving multi-view clustering.
  • Among the non-tensor clustering methods, the adaptive weighted multi-view spectral clustering method AMGL and coregularized spectral clustering Co-reg are overall inferior to the other methods. The reason may be that the performance results of AMGL and Co-reg heavily depend on the predefined graph, and it is difficult to manually select a suitable graph in real applications due to the complex distribution of data.
  • Tensor-based clustering methods, such as t-SVD-MSC, ETLMSC, LTCSPC and our proposed method, are superior to the other methods. This is probably because tensor-based clustering methods effectively exploit the complementary information and spatial structure embedded in the graphs or affinity matrices of different views. It also indicates that the difference between graphs or affinity matrices of different view helps provide useful complementary information for clustering.
  • Our proposed method is superior to the other clustering methods. This is probably because our proposed method explicitly considers the effect of different views for clustering, and the assigned weights for different views are related. Moreover, our method directly obtains a discrete label matrix for data, while the other methods require extra post-processing, which results in a sub-optimal discrete solution. It is worth noting that after adjusting the parameters to achieve optimal results, we observed that the standard deviation of our algorithm across all datasets is 0. This indicates that our algorithm exhibits excellent robustness.
  • LTCSPC utilizes weighted tensor nuclear norm regularization to minimize the discrepancy between the view indicator matrices, making full use of the complementary information hidden in the views. Therefore, its clustering performance is better than the adaptive weighted multi-view spectral clustering method (AMGL) and the adaptive graph learning clustering method (MVGL). But LTCSPC is inferior to our proposed method. It indicates that spectral rotation helps further improve the clustering performance results and is a reasonable scheme to obtain a discrete solution for spectral clustering.
  • Our model achieved relatively good clustering accuracy on various types of multi-view datasets, including Caltech101, MSRC-v1, Yale, ORL, Scene-15, and ESP-GAME. This consistent high performance demonstrates the strong generalization ability of our model, enabling it to adapt to the characteristics of different datasets and produce robust clustering results.

6.2.2. Parameter Analysis

We analyze the effect of weighted vector ω for our method. Figure 3 plots the performance results (ACC, NMI, and purity) of our proposed method with varying weighted vector ω on the Cal101 and Yale datasets, where x-axis denotes ω , and y-axis is the clustering performance results. On the Cal101 database, our algorithm achieved optimal performance with ω = [ 5 , 0.1 , 1 ] . On the Yale database, the optimal performance was observed with ω = [ 1 , 5 , 0.1 ] , both significantly outperforming the equal-weight configuration ω = [ 1 , 1 , 1 ] . This is attributed to the noticeable differences in the singular values of tensor F , indicating that they should not be considered equally important. But it is very difficult to manually select a suitable weighted vector ω , which exploits the salient difference in real applications. We will study it in our future work.
We also analyze the effect of parameter β for our method. Figure 4 shows the clustering performance versus β on Yale and MSRC-V1 databases. When β = 0 , our method becomes multi-view spectral clustering with a tensor low-rank constraint. From Figure 4, we have that our method has large fluctuation with varying β , and when β = 0 , our method is inferior to the best performance with β = 0.01 in the Yale and MSRC-V1 databases. It also indicates that spectral rotation helps improve the clustering performance.

6.2.3. Ablation and Visualization

We analyze the ablation of parameters with α ( v ) = 1 , λ ( v ) = 1 and ω = [ 1 , 1 , 1 ] in Table 3, Table 4 and Table 5. It can be observed that setting the parameters to 1 represents a regular non-weighted scheme, treating each view matrix F(v) as equally important, and the results are worse. This emphasizes the importance of considering the distinctiveness of each view and assigning different weight values. This underscores the necessity of selecting an adaptive weighting scheme. What is more, Figure 5 shows the T-SNE of Scene-15, where, with the iteration growth, which means the results are near optimal, the data points exhibit a more compact distribution.

7. Conclusions

In this paper, we present tensorized discrete multi-view spectral clustering, which makes spectral embedding collaborate with spectral rotation in a unified framework. It leverages the weighted tensor nuclear norm regularizer to exploit the complementary information embedded in the indicator matrices of different views. Moreover, we present an adaptively weighted scheme that takes the relationship between views into consideration for clustering. Finally, an effective and efficient algorithm is proposed to solve the discrete label matrix. Extensive experimental results on different real-world datasets show that the proposed models outperform several multi-view methods. The weighted nuclear norm we employ relies on the significant differences in singular values among slices of the tensor formed by indicator matrices and the manual selection of weights. However, manually choosing an appropriate weighting vector is challenging in practice. We will study this in our future work.

Author Contributions

Conceptualization, Q.L.; methodology, Q.L. and G.Y.; writing—original draft preparation, Q.L.; writing—review and editing, Y.Y. and Y.L.; supervision, Q.L. and J.Y.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Natural Science Foundation of Guangdong Province under Grant 2023A1515011845, and in part by 2022 Project of Shenzhen Education Science “14th Five Year Plan” under Grant zdzz22004.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: Caltech101, https://tensorflow.google.cn/datasets/catalog/caltech101 (accessed on 10 September 2023); MSRC-v1, https://mldta.com/dataset/msrc-v1/ (accessed on 10 September 2023); Yale, http://vision.ucsd.edu/content/yale-face-database (accessed on 10 September 2023); ORL, http://www.uk.research.att.com/facedatabase.html (accessed on 10 September 2023); Scene-15, https://www.kaggle.com/datasets/zaiyankhan/15scene-dataset (accessed on 10 September 2023); ESP-GAME, https://www.kaggle.com/datasets/parhamsalar/espgame (accessed on 10 September 2023).

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Xie, Y.; Tao, D.; Zhang, W.; Yan, L.; Lei, Z.; Qu, Y. On Unifying Multi-view Self-Representations for Clustering by Tensor Multi-rank Minimization. Proc. IJCV 2018, 126, 1157–1179. [Google Scholar] [CrossRef]
  2. Xie, D.; Zhang, X.; Gao, Q.; Han, J.; Xiao, S.; Gao, X. Multiview Clustering by Joint Latent Representation and Similarity Learning. IEEE Trans. Cybern. 2020, 50, 4848–4854. [Google Scholar] [CrossRef]
  3. Liu, X.; Zhu, X.; Li, M.; Wang, L.; Tang, C.; Yin, J.; Shen, D.; Wang, H.; Gao, W. Late Fusion Incomplete Multi-View Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2410–2423. [Google Scholar] [CrossRef]
  4. Hu, P.; Peng, X.; Zhu, H.; Zhen, L.; Lin, J.; Yan, H.; Peng, D. Deep Semisupervised Multiview Learning with Increasing Views. IEEE Trans. Cybern. 2021, 52, 12954–12965. [Google Scholar] [CrossRef]
  5. Wang, Q.; Lian, H.; Sun, G.; Gao, Q.; Jiao, L. iCmSC: Incomplete Cross-Modal Subspace Clustering. IEEE Trans. Image Process. 2021, 30, 305–317. [Google Scholar] [CrossRef]
  6. Lu, H.; Xu, H.; Wang, Q.; Gao, Q.; Yang, M.; Gao, X. Efficient Multi-View -Means for Image Clustering. IEEE Trans. Image Process. 2024, 33, 273–284. [Google Scholar] [CrossRef]
  7. Kumar, A.; Rai, P. Co-regularized multi-view spectral clustering. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11), Granada, Spain, 12–15 December 2011; Curran Associates Inc.: Red Hook, NY, USA, 2011; Volume 24, pp. 1413–1421. [Google Scholar]
  8. Nie, F.; Li, J.; Li, X. Parameter-Free Auto-Weighted Multiple Graph Learning: A Framework for Multiview Clustering and Semi-Supervised Classification. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), New York, NY, USA, 9–15 July 2016; pp. 1881–1887. [Google Scholar]
  9. Xu, H.; Zhang, X.; Xia, W.; Gao, Q.; Gao, X. Low-rank tensor constrained co-regularized multi-view spectral clustering. Neural Netw. 2020, 132, 245–252. [Google Scholar] [CrossRef]
  10. Zhan, K.; Zhang, C.; Guan, J.; Wang, J. Graph Learning for Multiview Clustering. IEEE Trans. Cybern. 2018, 48, 2887–2895. [Google Scholar] [CrossRef]
  11. Yun, Y.; Li, J.; Gao, Q.; Yang, M.; Gao, X. Low-rank discrete multi-view spectral clustering. Neural Netw. 2023, 166, 137–147. [Google Scholar] [CrossRef]
  12. Liu, X.; Zhu, X.; Li, M.; Tang, C.; Gao, W. Efficient and Effective Incomplete Multi-View Clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4392–4399. [Google Scholar]
  13. Jiang, T.; Gao, Q. Fast multiple graphs learning for multi-view clustering. Neural Netw. 2022, 155, 348–359. [Google Scholar] [CrossRef]
  14. Nie, F.; Cai, G.; Li, X. Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 2408–2414. [Google Scholar]
  15. Nie, F.; Li, J.; Li, X. Self-weighted Multiview Clustering with Multiple Graphs. In Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2564–2570. [Google Scholar]
  16. Yu, S.X.; Shi, J. Multiclass Spectral Clustering. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; Volume 1, pp. 313–319. [Google Scholar]
  17. Pang, Y.; Xie, J.; Nie, F.; Li, X. Spectral Clustering by Joint Spectral Embedding and Spectral Rotation. IEEE Trans. Cybern 2020, 50, 247–258. [Google Scholar] [CrossRef]
  18. Wang, Z.; Li, Z.; Wang, R.; Nie, F.; Li, X. Large Graph Clustering with Simultaneous Spectral Embedding and Discretization. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4426–4440. [Google Scholar] [CrossRef]
  19. Yang, Y.; Shen, F.; Huang, Z.; Shen, H.T. A Unified Framework for Discrete Spectral Clustering. In Proceedings of the International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 2273–2279. [Google Scholar]
  20. Zhong, G.; Pun, C. A Unified Framework for Multi-view Spectral Clustering. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 1854–1857. [Google Scholar]
  21. Wan, Z.; Xu, H.; Gao, Q. Multi-view clustering by joint spectral embedding and spectral rotation. Neurocomputing 2021, 462, 123–131. [Google Scholar] [CrossRef]
  22. Shi, S.; Nie, F.; Wang, R.; Li, X. Auto-weighted multi-view clustering via spectral embedding. Neurocomputing 2020, 399, 369–379. [Google Scholar] [CrossRef]
  23. Gao, Q.; Zhang, P.; Xia, W.; Xie, D.; Gao, X.; Tao, D. Enhanced Tensor RPCA and its Application. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2133–2140. [Google Scholar] [CrossRef]
  24. Xia, W.; Zhang, X.; Gao, Q.; Shu, X.; Han, J.; Gao, X. Multi-view Subspace Clustering by an Enhanced Tensor Nuclear Norm. IEEE Trans. Cybern. 2021, 52, 8962–8975. [Google Scholar] [CrossRef]
  25. Xie, Y.; Zhang, W.; Qu, Y.; Dai, L.; Tao, D. Hyper-Laplacian Regularized Multilinear Multiview Self-Representations for Clustering and Semisupervised Learning. IEEE Trans. Cybern. 2020, 50, 572–586. [Google Scholar] [CrossRef]
  26. Wen, J.; Xu, Y.; Liu, H. Incomplete Multiview Spectral Clustering With Adaptive Graph Learning. IEEE Trans. Cybern. 2020, 50, 1418–1429. [Google Scholar] [CrossRef]
  27. Wen, J.; Zhang, Z.; Zhang, Z.; Fei, L.; Wang, M. Generalized Incomplete Multiview Clustering with Flexible Locality Structure Diffusion. IEEE Trans. Cybern. 2020, 51, 101–114. [Google Scholar] [CrossRef]
  28. Zong, L.; Zhang, X.; Liu, X.; Yu, H. Weighted Multi-view Spectral Clustering based on Spectral Perturbation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 4621–4628. [Google Scholar]
  29. Li, Z.; Tang, C.; Liu, X.; Zheng, X.; Yue, G.; Zhang, W.; Zhu, E. Consensus Graph Learning for Multi-view Clustering. IEEE Trans. Multimed. 2021, 24, 2461–2472. [Google Scholar] [CrossRef]
  30. Tan, Y.; Liu, Y.; Huang, S.; Feng, W.; Lv, J. Sample-Level Multi-View Graph Clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 23966–23975. [Google Scholar]
  31. Xia, R.; Pan, Y.; Du, L.; Yin, J. Robust multi-view spectral clustering via low-rank and sparse decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 2149–2155. [Google Scholar]
  32. Tang, C.; Zhu, X.; Liu, X.; Li, M.; Wang, P.; Zhang, C.; Wang, L. Learning a Joint Affinity Graph for Multiview Subspace Clustering. IEEE Trans. Multimed. 2019, 21, 1724–1736. [Google Scholar] [CrossRef]
  33. Tang, C.; Liu, X.; Zhu, X.; Zhu, E.; Gao, W. CGD: Multi-View Clustering via Cross-View Graph Diffusion. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 5924–5931. [Google Scholar]
  34. Wu, J.; Lin, Z.; Zha, H. Essential Tensor Learning for Multi-view Spectral Clustering. IEEE Trans. Image Process. 2019, 28, 5910–5922. [Google Scholar] [CrossRef]
  35. Zhan, K.; Nie, F.; Wang, J.; Yang, Y. Multiview Consensus Graph Clustering. IEEE Trans. Image Process. 2019, 28, 1261–1270. [Google Scholar] [CrossRef]
  36. Ding, C.H.Q.; Li, T.; Jordan, M.I. Convex and Semi-Nonnegative Matrix Factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 45–55. [Google Scholar] [CrossRef]
  37. Shi, S.; Nie, F.; Wang, R.; Li, X. Multi-View Clustering via Nonnegative and Orthogonal Graph Reconstruction. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 201–214. [Google Scholar] [CrossRef]
  38. Yang, H.; Gao, Q.; Xia, W.; Yang, M.; Gao, X. Multiview Spectral Clustering With Bipartite Graph. IEEE Trans. Image Process. 2022, 31, 3591–3605. [Google Scholar] [CrossRef]
  39. Nie, F.; Tian, L.; Li, X. Multiview Clustering via Adaptively Weighted Procrustes. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, 19–23 August 2018; Guo, Y., Farooq, F., Eds.; ACM: New York, NY, USA, 2018; pp. 2022–2030. [Google Scholar]
  40. Han, Y.; Zhu, L.; Cheng, Z.; Li, J.; Liu, X. Discrete Optimal Graph Clustering. IEEE Trans. Cybern. 2020, 50, 1697–1710. [Google Scholar] [CrossRef]
  41. Xia, W.; Gao, Q.; Wang, Q.; Gao, X.; Ding, C.; Tao, D. Tensorized Bipartite Graph Learning for Multi-View Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 5187–5202. [Google Scholar] [CrossRef]
  42. Kilmer, M.E.; Martin, C.D. Factorization strategies for third-order tensors. Linear Algebra Its Appl. 2011, 435, 641–658. [Google Scholar] [CrossRef]
  43. Lin, Z.; Chen, M.; Ma, Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
  44. Gao, Q.; Xu, S.; Chen, F.; Ding, C.; Gao, X.; Li, Y. R1-2-DPCA and Face Recognition. IEEE Trans. Cybern. 2019, 49, 1212–1223. [Google Scholar] [CrossRef] [PubMed]
  45. Lin, T.; Ma, S.; Zhang, S. Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. 2016, 69, 52–81. [Google Scholar] [CrossRef]
  46. Fei-Fei, L.; Fergus, R.; Perona, P. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 2007, 106, 59–70. [Google Scholar] [CrossRef]
  47. Winn, J.; Jojic, N. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 17–21 October 2005; Volume 1, pp. 756–763. [Google Scholar]
  48. Oliva, A.; Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
  49. Ng, A.Y.; Jordan, M.I.; Weiss, Y. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01), Vancouver, BC, Canada, 3–8 December 2001; Volume 14, pp. 849–856. [Google Scholar]
  50. Luo, S.; Zhang, C.; Zhang, W.; Cao, X. Consistent and Specific Multi-View Subspace Clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 3730–3737. [Google Scholar]
Figure 1. Construction of tensor F R N × m × K . Δ ( i ) denotes the i-th frontal slice of F ( i { 1 , 2 , , K } ).
Figure 1. Construction of tensor F R N × m × K . Δ ( i ) denotes the i-th frontal slice of F ( i { 1 , 2 , , K } ).
Electronics 13 00491 g001
Figure 2. The convergence curves of our method on the MSRC-V1 and Yale datasets. (a) MSRC-V1. (b) Yale.
Figure 2. The convergence curves of our method on the MSRC-V1 and Yale datasets. (a) MSRC-V1. (b) Yale.
Electronics 13 00491 g002
Figure 3. Analysis of the parameter ω on the MSRC-v1 and Yale datasets. (a) Cal101. (b) Yale.
Figure 3. Analysis of the parameter ω on the MSRC-v1 and Yale datasets. (a) Cal101. (b) Yale.
Electronics 13 00491 g003
Figure 4. Analysis of the parameter β on the MSRC-v1 and Yale datasets. (a) MSRC-v1. (b) Yale.
Figure 4. Analysis of the parameter β on the MSRC-v1 and Yale datasets. (a) MSRC-v1. (b) Yale.
Electronics 13 00491 g004
Figure 5. t-SNE visualizations on Scene dataset, where different colors represent different categories: (a) 3 epoch (Acc = 0.369), (b) 6 epoch (Acc = 0.592), (c) 9 epoch (Acc = 0.836), (d) 12 epoch (Acc = 0.909).
Figure 5. t-SNE visualizations on Scene dataset, where different colors represent different categories: (a) 3 epoch (Acc = 0.369), (b) 6 epoch (Acc = 0.592), (c) 9 epoch (Acc = 0.836), (d) 12 epoch (Acc = 0.909).
Electronics 13 00491 g005
Table 1. Parameters under best results on six datasets.
Table 1. Parameters under best results on six datasets.
DatasetYaleMSRCCal101ORLScene-15ESP-IMG
k8888880
β 0.010.010.010.010.0010.01
ω [1, 5, 0.1][1, 100, 10, 0.1, 5][5, 0.1, 1][1, 10, 100][1, 5, 100][10, 1]
Table 2. The clustering performance results on the Caltech101, MSRC-V1, Yale, ORL, SCENE-15 and ESP-GAME datasets.
Table 2. The clustering performance results on the Caltech101, MSRC-V1, Yale, ORL, SCENE-15 and ESP-GAME datasets.
DatasetCal-101MSRC-V1
MetricACCNMIPurityARIACCNMIPurityARI
SC-best0.545 ± 0.030.431 ± 0.020.624 ± 0.020.390 ± 0.020.663 ± 0.040.534 ± 0.020.674 ± 0.030.441 ± 0.03
AMGL0.481 ± 0.030.345 ± 0.020.540 ± 0.020.184 ± 0.030.732 ± 0.040.669 ± 0.020.740 ± 0.020.543 ± 0.02
CSMSC0.567 ± 0.000.480 ± 0.000.633 ± 0.000.395 ± 0.000.742 ± 0.010.597 ± 0.010.742 ± 0.010.532 ± 0.01
MLAN0.587 ± 0.000.462 ± 0.000.655 ± 0.000.347 ± 0.000.743 ± 0.000.746 ± 0.000.805 ± 0.000.661 ± 0.00
Co-Reg0.452 ± 0.010.283 ± 0.010.502 ± 0.010.234 ± 0.010.744 ± 0.020.632 ± 0.010.750 ± 0.010.635 ± 0.01
RMSC0.529 ± 0.030.286 ± 0.020.565 ± 0.010.313 ± 0.020.742 ± 0.050.644 ± 0.030.764 ± 0.040.624 ± 0.02
MCGC0.501 ± 0.000.376 ± 0.020.587 ± 0.000.301 ± 0.010.852 ± 0.000.724 ± 0.000.852 ± 0.000.749 ± 0.00
MVGL0.483 ± 0.000.372 ± 0.000.571 ± 0.000.279 ± 0.000.852 ± 0.000.754 ± 0.000.852 ± 0.000.738 ± 0.02
T-SVD-MSC0.828 ± 0.000.859 ± 0.000.868 ± 0.000.636 ± 0.000.999 ± 0.180.998 ± 0.400.999 ± 0.020.987 ± 0.02
LTCSPC0.829 ± 0.010.822 ± 0.010.882 ± 0.010.643 ± 0.010.999 ± 0.000.999 ± 0.000.999 ± 0.000.989 ± 0.01
ETLMSC0.642 ± 0.000.607 ± 0.000.739 ± 0.000.539 ± 0.010.995 ± 0.000.989 ± 0.000.995 ± 0.000.988 ± 0.01
Ours0.832 ± 0.000.881 ± 0.000.912 ± 0.000.839 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.00
DatasetYaleORL
MetricACCNMIPurityARIACCNMIPurityARI
SC-best0.556 ± 0.040.586 ± 0.040.567 ± 0.040.361 ± 0.040.727 ± 0.020.868 ± 0.010.762 ± 0.020.645 ± 0.03
AMGL0.655 ± 0.020.654 ± 0.010.657 ± 0.020.394 ± 0.000.777 ± 0.020.883 ± 0.010.820 ± 0.020.633 ± 0.05
CSMSC0.750 ± 0.000.776 ± 0.000.750 ± 0.000.615 ± 0.000.857 ± 0.020.935 ± 0.010.882 ± 0.010.813 ± 0.02
MLAN0.641 ± 0.000.682 ± 0.000.641 ± 0.000.413 ± 0.000.684 ± 0.000.786 ± 0.010.735 ± 0.010.557 ± 0.01
Co-Reg0.628 ± 0.010.660 ± 0.010.637 ± 0.020.521 ± 0.010.668 ± 0.010.824 ± 0.000.713 ± 0.010.600 ± 0.00
RMSC0.703 ± 0.040.717 ± 0.020.710 ± 0.040.533 ± 0.030.747 ± 0.020.866 ± 0.010.760 ± 0.010.662 ± 0.02
MCGC0.715 ± 0.000.677 ± 0.000.667 ± 0.000.534 ± 0.000.800 ± 0.000.895 ± 0.000.823 ± 0.010.679 ± 0.00
MVGL0.709 ± 0.000.692 ± 0.000.709 ± 0.000.650 ± 0.000.765 ± 0.000.871 ± 0.000.815 ± 0.000.663 ± 0.00
T-SVD-MSC0.932 ± 0.060.942 ± 0.050.932 ± 0.060.946 ± 0.030.962 ± 0.010.988 ± 0.000.973 ± 0.010.958 ± 0.01
LTCSPC0.976 ± 0.010.982 ± 0.450.979 ± 0.700.965 ± 0.000.989 ± 0.010.994 ± 0.000.983 ± 0.010.978 ± 0.00
ETLMSC0.659 ± 0.040.693 ± 0.040.659 ± 0.040.500 ± 0.050.958 ± 0.020.988 ± 0.010.970 ± 0.020.959 ± 0.02
Ours1.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.00
DatasetSCENE-15ESP-GAME
MetricACCNMIPurityARIACCNMIPurityARI
SC-best0.483 ± 0.030.456 ± 0.010.534 ± 0.020.328 ± 0.060.512 ± 0.010.367 ± 0.010.539 ± 0.010.245 ± 0.00
AMGL0.417 ± 0.030.473 ± 0.040.438 ± 0.030.285 ± 0.030.526 ± 0.000.354 ± 0.000.526 ± 0.000.264 ± 0.00
CSMSC0.597 ± 0.000.573 ± 0.000.641 ± 0.000.439 ± 0.000.437 ± 0.000.284 ± 0.000.445 ± 0.000.221 ± 0.00
MLAN0.340 ± 0.030.381 ± 0.040.351 ± 0.030.167 ± 0.030.476 ± 0.010.384 ± 0.000.496 ± 0.000.200 ± 0.01
Co-Reg0.487 ± 0.000.466 ± 0.000.530 ± 0.000.324 ± 0.000.466 ± 0.010.375 ± 0.010.469 ± 0.010.181 ± 0.01
RMSC0.451 ± 0.020.451 ± 0.010.490 ± 0.020.292 ± 0.020.446 ± 0.010.309 ± 0.010.468 ± 0.010.221 ± 0.01
MCGC0.424 ± 0.000.406 ± 0.000.483 ± 0.000.378 ± 0.000.419 ± 0.000.240 ± 0.000.414 ± 0.000.125 ± 0.00
MVGL0.449 ± 0.000.443 ± 0.000.464 ± 0.000.356 ± 0.000.473 ± 0.000.322 ± 0.000.478 ± 0.000.214 ± 0.00
T-SVD-MSC0.816 ± 0.020.848 ± 0.010.867 ± 0.010.783 ± 0.010.569 ± 0.000.409 ± 0.000.579 ± 0.000.356 ± 0.00
LTCSPC0.869 ± 0.010.863 ± 0.000.879 ± 0.010.813 ± 0.000.987 ± 0.000.963 ± 0.010.987 ± 0.000.971 ± 0.01
ETLMSC0.873 ± 0.000.887 ± 0.000.905 ± 0.000.842 ± 0.000.730 ± 0.020.744 ± 0.010.682 ± 0.020.640 ± 0.02
Ours0.909 ± 0.000.924 ± 0.000.912 ± 0.000.887 ± 0.000.993 ± 0.000.979 ± 0.000.993 ± 0.000.983 ± 0.00
Table 3. Clustering results with/without adaptive weighting strategy α ( v ) , where ✗means without an adaptive weighting strategy, and ✓ means with an adaptive weighting strategy. ↑ indicates that the higher the value of the indicator, the better the performance.
Table 3. Clustering results with/without adaptive weighting strategy α ( v ) , where ✗means without an adaptive weighting strategy, and ✓ means with an adaptive weighting strategy. ↑ indicates that the higher the value of the indicator, the better the performance.
DatasetYale
MethodACC (↑)NMI (↑)Purity (↑)ARI (↑)
0.9940.9930.9940.987
1.001.001.001.00
DatasetCal-101
MethodACC (↑)NMI (↑)Purity (↑)ARI (↑)
0.6580.7410.7640.601
0.8320.881f0.9120.839
Table 4. Clustering results with/without adaptive weighting strategy λ ( v ) , where ✗ means without an adaptive weighting strategy, and ✓ means with an adaptive weighting strategy. ↑ indicates that the higher the value of the indicator, the better the performance.
Table 4. Clustering results with/without adaptive weighting strategy λ ( v ) , where ✗ means without an adaptive weighting strategy, and ✓ means with an adaptive weighting strategy. ↑ indicates that the higher the value of the indicator, the better the performance.
DatasetYale
MethodACC (↑)NMI (↑)Purity (↑)ARI (↑)
0.9940.9930.9940.987
1.001.001.001.00
DatasetCal-101
MethodACC (↑)NMI (↑)Purity (↑)ARI (↑)
0.7730.7910.8480.698
0.8320.8810.9120.839
Table 5. Clustering results with/without weighted strategy ω , where ✗means without an adaptive weighting strategyv, and ✓ means with an adaptive weighting strategy. ↑ indicates that the higher the value of the indicator, the better the performance.
Table 5. Clustering results with/without weighted strategy ω , where ✗means without an adaptive weighting strategyv, and ✓ means with an adaptive weighting strategy. ↑ indicates that the higher the value of the indicator, the better the performance.
DatasetYale
MethodACC (↑)NMI (↑)Purity (↑)ARI (↑)
0.7150.7490.7150.559
1.001.001.001.00
DatasetCal-101
MethodACC (↑)NMI (↑)Purity (↑)ARI (↑)
0.6460.6550.7460.512
0.8320.8810.9120.839
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Q.; Yang, G.; Yun, Y.; Lei, Y.; You, J. Tensorized Discrete Multi-View Spectral Clustering. Electronics 2024, 13, 491. https://doi.org/10.3390/electronics13030491

AMA Style

Li Q, Yang G, Yun Y, Lei Y, You J. Tensorized Discrete Multi-View Spectral Clustering. Electronics. 2024; 13(3):491. https://doi.org/10.3390/electronics13030491

Chicago/Turabian Style

Li, Qin, Geng Yang, Yu Yun, Yu Lei, and Jane You. 2024. "Tensorized Discrete Multi-View Spectral Clustering" Electronics 13, no. 3: 491. https://doi.org/10.3390/electronics13030491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop