Next Article in Journal
Simultaneous Confidence Intervals for the Ratios of the Means of Zero-Inflated Gamma Distributions and Its Application
Previous Article in Journal
Robust Switching Regressions Using the Laplace Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation

1
School of Mathematical Sciences, Guizhou Normal University, Guiyang 550025, China
2
School of Mathematical Sciences, Xiamen University, Xiamen 361005, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(24), 4723; https://doi.org/10.3390/math10244723
Submission received: 7 November 2022 / Revised: 5 December 2022 / Accepted: 8 December 2022 / Published: 12 December 2022

Abstract

:
Nonnegative Tucker decomposition (NTD) is an unsupervised method and has been extended in many applied fields. However, NTD does not make use of the label information of sample data, even though such label information is available. To remedy the defect, in this paper, we propose a label constraint NTD method, namely Discriminative NTD (DNTD), which considers a fraction of the label information of the sample data as a discriminative constraint. Differing from other label-based methods, the proposed method enforces the sample data, with the same label to be aligned on the same axis or line. Combining the NTD and the label-discriminative constraint term, DNTD can not only extract the part-based representation of the data tensor but also boost the discriminative ability of the NTD. An iterative updating algorithm is provided to solve the objective function of DNTD. Finally, the proposed DNTD method is applied to image clustering. Experimental results on ORL, COIL20, Yale datasets show the clustering accuracy of DNTD is improved by 8.47–32.17% and the normalized mutual information is improved by 10.43–29.64% compared with the state-of-the-art approaches.

1. Introduction

In recent years, data analysis has attracted more and more attention in many application fields, such as machine learning and artificial intelligence, as well as computer vision. For instance, in Ref. [1], cluster learning of data is analysed by tensor low-rank representation. In Ref. [2], routing recommendations are implemented for heterogeneous databy tensor-based frameworks. Long et al. [3] recover the missing entries of visual data by tensor completion. Bernardi et al. [4] utilize secant varieties and tensor decomposition to generate a hitchhiker’s guide. The real-world sample data are often high dimensional, while the important information and structures are laid in a low-dimensional representation space of the sample data. Thus, many dimensionality reduction methods have been proposed to seek an appropriate low-dimensional representation of the original sample data.
Nonnegative matrix factorization (NMF) [5] as a traditional dimensionality reduction method has been widely used for different purposes, such as image processing [6], community detection [7], clustering [8], etc. The NMF of a nonnegative data matrix X R + m × n is to find two low-rank nonnegative matrices, namely U R + m × s and V R + n × s , so that X U V . For example, if X is an image sample data matrix whose column includes the pixel data of an image sample, then m can represent the number of image sample pixel data, n can represent the number of all image samples, s can denote the dimension of the low-dimensional representation of the image sample data matrix, U is called the basis matrix, and V is called the encoding matrix and is also regarded as the low-dimensional representation of the image sample data matrix. Generally, s is chosen as a number much less than m or n. However, a weakness of the NMF is that it fails to consider to preserve the geometrical information of the sample data. Thus, Cai et al. proposed a graph-regularized NMF (GNMF), which encodes the geometrical information of the sample data in the low-dimensional representation space by constructing a nearest-neighbor graph [9]. Using the hypergraph regularization and the correntropy instead of the Euclidean norm in the loss term of NMF, Yu et al. proposed a correntropy-based hypergraph-regularized NMF (CHNMF) [10]. The CHNMF considers the high-order geometric relationships inherent in the sample data and reduces the influence of noises and outliers.
When the label information of sample data is available, naturally, it can also be used to construct a graph. Generally speaking, sample data can be separated into fully labeled, partially labeled, and unlabeled, correspondingly, and the algorithms using these data are categorized into supervised, semi-supervised, and unsupervised algorithms [11]. On the one hand, in practice, a mass of labeled sample data are hard to obtain; however, little-labeled sample data are readily achieved. So, it is not difficult to generate a semi-supervised method. On the other hand, the semi-supervised method can utilize all sample data and simultaneously use the labels of partial sample data so that the semi-supervised method is able to obtain the performance of the unsupervised method, simultaneously propagating labels with the guidance of the supervisory information. Restricting the data to satisfy the prior label information is a common way for using label information to develop a semi-supervised method [12]. For example, Liu et al. introduced a constrained NMF (CNMF) by incorporating the label information of some sample data into the objective function of NMF [13]. In CNMF, sample data with the same labels are merged into a single point. Babaee et al. studied a discriminative NMF (DNMF) by utilizing a fraction of the label information of sample data in a regularization term [14]. Differing from CNMF, DNMF enforces the sample data with the same label to be aligned on the same axis to construct a label matrix, and then produces a label-discriminative regularizer by bridging the label matrix and labeled sample data. However, when these NMF-based methods deal with several-image sample data, the data are first vectorized, forming a matrix-like X of the example mentioned above, and then are represented by a low-rank approximation, which often destroys the internal structure of sample data. In Ref. [15], the study indicates the tensor can partly solve the problem.
In fact, in real-world examples, there are many sample data represented by a tensor (i.e., multiway array), e.g., color images, video clips, multichannel electroencephalography (EEG), etc. For these reasons, tensor factorization techniques were proposed to deal with data tensors. The Tucker decomposition [16], which is one of the tensor factorization methods, has been widely applied to face recognition [17], image processing [18], signal processing [15], etc. In Ref. [19], Kim et al. introduced a nonnegative Tucker decomposition (NTD) by combining Tucker decomposition with nonnegativity constraints on the core tensor and factor matrices. Since then, NTD has been extended in many applications, such as clustering [20], pattern extraction [21], and signal analysis [22], and several variants of NTD from diverse perspectives have been proposed. For example, Qiu et al. studied a graph-regularized nonnegative Tucker Decomposition (GNTD), which incorporates graph regularization into NTD to preserve the geometrical information from a data tensor [23]. Pan et al. introduced an orthogonal nonnegative Tucker decomposition (ONTD) by considering the orthogonality on each factor matrix [24]. However, NTD and its variants mentioned above are unsupervised algorithms, meaning they do not use the available label information of sample data.
Recently, many researchers have considered cases when the label information of sample data is available. Therefore, to make better use of the label information of sample data, many works have been introduced by incorporating the label information constraints into the framework of NMF or NTD, such as CNMF [13], DNMF [14], graph-based Discriminative NMF (GDNMF) [25], graph-regularized and sparse NMF with hard constraints (GSNMFC) [26], semi-supervised robust distribution-based NMF (SRDNMF) [27], and semi-supervised NTD (SNTD) [28]. In these methods, CNMF and DNMF both incorporate the label information into NMF; however, they do not consider the geometric structure between sample data. Thus, GDNMF was proposed, which incorporated the graph regularization and label information into NMF. Using this trick, GSNMFC was proposed by jointly incorporating a graph regularizer, label information, and a sparseness constraint. However, while these mentioned methods utilize the label information, they do not regard the robustness. To improve the robustness, SRDNMF introduced a Kullback–Leibler divergence and label discriminative constraint into NMF. However, these semi-supervised methods are learning-algorithms-based NMF, which may destroy the inherent structure of sample data. To alleviate the drawbacks, based on NTD, SNTD was introduced by jointly propagating the limited label information and learning the nonnegative tensor representation [28]. Although the SNTD uses the label constraint, it fails to consider that the sample data with the same label could be aligned on the same axis or line.
Motivated by recent progress [14,19], in this paper, we propose a label-constrained NTD, called Discriminative NTD, or DNTD for short. We incorporate a fraction of the label information of the sample data into NTD, and the key idea is that the sample data that belong to the same class should be aligned on the same axis or line in the low-dimensional representation. We also discuss how to efficiently solve the corresponding optimization problem, and we provide the optimization scheme and its convergence proof. The main contributions of the proposed approach are:
  • By constructing the label matrix and coupling the label-discriminative regularizer to the objective function of NTD, the DNTD method can not only extract the part-based representation from the data tensor but also boost the discriminative ability of the NTD. Furthermore, the key idea of the label-discriminative term is that the sample data belonging to the same label are very close or aligned on the same axis or line in the low-dimensional representation;
  • An efficient updating algorithm is developed to solve the optimization problem and the convergence proof is provided;
  • Numerical examples from real-world applications are provided to demonstrate the effectiveness of the proposed method.
The rest of the paper is organized as follows: In Section 2, we briefly review the NTD. In Section 3, the DNTD method is proposed, and the detailed algorithm and proof of convergence of the algorithm are provided. In Section 4, experiments for clustering tasks are presented. Finally, in Section 5, conclusions are drawn.

2. Nonnegative Tucker Decomposition

Given a nonnegative data tensor X R + I 1 × I 2 × × I N 1 × I N , where X j R + I 1 × I 2 × × I N 1 ( j = 1 , 2 , , I N ) is a data sample, nonnegative Tucker decomposition (NTD) aims at decomposing the nonnegative tensor X into a nonnegative core tensor G R + J 1 × J 2 × × J N , which is multiplied by N nonnegative factor matrices A ( r ) R + I r × J r ( r = 1 , 2 , , N ) along each mode [19]. To achieve this goal, NTD minimizes the sum of squared residues between the data tensor X and the multilinear product of the core tensor G and factor matrices A ( r ) , which can be formulated as
min G , A ( 1 ) , , A ( N ) O 1 = X G × 1 A ( 1 ) × 2 A ( 2 ) × N A ( N ) 2 s . t . G 0 , A ( r ) 0 , r = 1 , 2 , , N ,
where, and in the following, the operator × r is referred to as the r-mode product [16]. For example, the r-mode product of a tensor Y R J 1 × J 2 × × J N and a matrix U R I r × J r , denoted by Y × r U , is of size J 1 × × J r 1 × I r × J r + 1 × × J N and ( Y × r U ) j 1 j r 1 i r j r + 1 j N = j r = 1 J r y j 1 j r 1 j r j r + 1 j N u i r j r .
Equation (1) can be represented as a matrix form:
min G ( N ) , A ( 1 ) , , A ( N ) O 1 = X ( N ) A ( N ) G ( N ) ( p N A ( p ) ) F 2 s . t . G ( N ) 0 , A ( r ) 0 , r = 1 , 2 , , N ,
where, and in the following, X ( N ) R + I N × I 1 I 2 I N 1 and G ( N ) R + J N × J 1 J 2 J N 1 are the mode-N unfolding matrices of the data tensor X and core tensor G [16], respectively, A ( N ) = [ a 1 , a 2 , , a I N ] R + I N × J N ( a j denotes the j-th row of A ( N ) ) , and p N A ( p ) = A ( N 1 ) A ( N 2 ) A ( 1 ) and ⊗ denote the Kronecker product. Therefore, the NTD can be transformed to the NMF with the encoding matrix A ( N ) R + I N × J N , where I N and J N can be regarded as the number of all the samples and the dimension of low-dimensional representation of the data tenor X with respect to the basis matrix G ( N ) ( p N A ( p ) ) , respectively.

3. Discriminative Nonnegative Tucker Decomposition

To our knowledge, the NTD is an unsupervised method that fails to take the label information of the sample data into account even though the label information is available. However, actually, the label information of sample data has been widely used for increasing supervisory performance together with unsupervised priors [14,25,26,28].
Given the partial labels of sample data, naturally, it is reasonable to assume that sample data, which belong to the same class, should be very close or aligned on the same axis or line. In NTD, therefore, we would expect each of the sample data classes to be placed in a clearly separated cluster in the low-dimensional representation matrix A ( N ) . To achieve these properties, based on the available label information, we introduce the label matrix Q R k × I N (where k is the number of data classes) as follows [14]:
Q i j = 1 if sample data X j is labeled and belongs to i - th category , 0 otherwise .
For example, consider the case of I N = 10 sample data, out of which q = 7 are labeled with the categories l 1 = 1 , l 2 = 3 , l 3 = 1 , l 4 = 2 , l 5 = 2 , l 6 = 1 , and l 7 = 4 ; then, the label matrix Q would be defined as:
Q = 1 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 .
Based on the introduced label matrix Q , we assume that there are l labeled sample data. Without loss of generality, if the first l sample data are labeled, then the label-discriminative constraint term can be introduced as follows:
R = Q BA l ( N ) F 2 ,
where A l ( N ) = [ a 1 , , a l , 0 , , 0 ] R I N × J N , and matrix B transforms and scales the vectors in the part-based low-dimensional representation to obtain the best fit for matrix Q . Matrix B is allowed to take negative values; therefore, combining the label-discriminative constraint term and the objective function (1) of NTD, the DNTD is obtained by minimizing the following objective function:
min G , A ( 1 ) , , A ( N ) , B O 2 = X G × 1 A ( 1 ) × 2 A ( 2 ) × N A ( N ) 2 + λ Q BA l ( N ) F 2 s . t . G 0 , A ( r ) 0 , r = 1 , 2 , , N ,
where λ is a nonnegative parameter for balancing the regularization term. Equivalently, Equation (2) can be rewritten in matrix form:
min G ( n ) , A ( 1 ) , , A ( N ) , B O 2 = X ( n ) A ( n ) G ( n ) ( p n A ( p ) ) F 2 + λ Q BA l ( N ) F 2 s . t . G ( n ) 0 , A ( r ) 0 , r = 1 , 2 , , N ,
where, and in the following, p n A ( p ) = A ( N ) A ( n + 1 ) A ( n 1 ) A ( 1 ) . n is one of numbers of set { 1 , 2 , , N } .
By using Lagrange multipliers and considering (3), we change the objective function in (2) into the following Lagrange function:
L = X ( n ) A ( n ) G ( n ) ( p n A ( p ) ) F 2 + λ Q BA l ( N ) F 2 + T r ( Φ n G ( n ) T ) + r = 1 N T r ( Ψ r A ( r ) ) ,
where Φ n and Ψ r are the Lagrange multipliers matrices of G ( n ) and A ( r ) , respectively, and T r (·) denotes the trace of a matrix. The function (4) can be rewritten as
L = T r ( X ( n ) X ( n ) ) 2 T r ( X ( n ) ( p n A ( p ) ) G ( n ) A ( n ) ) + T r ( A ( n ) G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) T A ( n ) ) + λ T r ( QQ ) 2 λ T r ( QA l ( N ) B ) + λ T r ( BA l ( N ) A l ( N ) B ) + T r ( Φ n G ( n ) ) + r = 1 N T r ( Ψ r A ( r ) ) .
Obviously, the objective function in (2) is not convex in all the variables together. Thus, it is very difficult to find the global optimal solution. In the following, we develop an iterative updating algorithm, which updates one of core tensors, factor matrices, and B each time while fixing the others to obtain the local minimum.

3.1. Updating Rules

3.1.1. Solutions of Factor Matrices A ( n ) , ( n = 1 , 2 , , N 1 )

The partial derivatives of L in (5) with respect to A ( n ) , ( n = 1 , 2 , , N 1 ) , are
L A ( n ) = 2 X ( n ) ( p n A ( p ) ) G ( n ) + 2 A ( n ) G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) + Ψ n .
By using the Karush–Kuhn–Tucker (KKT) condition, i.e., L / A ( n ) = 0 and A ( n ) Ψ n = 0 , where, and in the following, ⊙ denotes the Hadamard product, according to L / A ( n ) = 0 , we obtain
Ψ n = 2 X ( n ) ( p n A ( p ) ) G ( n ) 2 A ( n ) G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) .
By calculating
A ( n ) Ψ n = A ( n ) ( 2 X ( n ) ( p n A ( p ) ) G ( n ) ) A ( n ) ( 2 A ( n ) G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ) ,
which together with A ( n ) Ψ n = 0 yields the following updating rule for A ( n ) ( n = 1 , 2 , , N 1 ) :
A i j ( n ) A i j ( n ) [ X ( n ) ( p n A ( p ) ) G ( n ) ] i j [ A ( n ) G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] i j .

3.1.2. Solutions of Factor Matrices A ( N )

The partial derivative of L in (5) with respect to A ( N ) is
L A ( N ) = 2 X ( N ) ( p N A ( p ) ) G ( N ) + 2 A ( N ) G ( N ) ( p N A ( p ) A ( p ) ) G ( N ) 2 λ Q B + 2 λ A l ( N ) B B + Ψ N .
Similarly, we consider the KKT condition L / A ( N ) = 0 and A ( N ) Ψ N = 0 . As a result, we obtain the following updating rule for A ( N ) :
A i j ( N ) A i j ( N ) [ X ( N ) ( p N A ( p ) ) G ( N ) + λ ( Q B ) + + λ ( A l ( N ) B B ) ] i j [ A ( N ) G ( N ) ( p N A ( p ) A ( p ) ) G ( N ) + λ ( Q B ) + λ ( A l ( N ) B B ) + ] i j ,
where, and in the following, for a matrix W = ( W i j ) , we let W = ( W i j ) and define W + = ( W + W ) / 2 and W = ( W W ) / 2 , so W = W + W .

3.1.3. Solutions of Core Tensor G

The objective function in (4) can be changed into
L = v e c ( X ) F v e c ( G ) 2 2 + λ Q BA l ( N ) F 2 + v e c ( G ) T v e c ( Φ ) + r = 1 N T r ( Ψ r A ( r ) ) ,
where, and in the following, F = A ( N ) A ( N 1 ) A ( 1 ) R I 1 I 2 I N × J 1 J 2 J N , v e c ( Φ ) represents the Lagrange multiplier of v e c ( G ) . The partial derivative of L in (8) with respect to v e c ( G ) is
L v e c ( G ) = 2 F F v e c ( G ) 2 F v e c ( X ) + v e c ( Φ ) .
Similarly, by applying the KKT condition L / v e c ( G ) = 0 and ( v e c ( G ) ) i ( v e c ( Φ ) ) i = 0 , we obtain the following updating rule:
( v e c ( G ) ) i ( v e c ( G ) ) i ( F v e c ( X ) ) i ( F F v e c ( G ) ) i .

3.1.4. Solutions of Matrix B

The partial derivative of L in (5) with respect to B is
L B = 2 λ QA l ( N ) + 2 λ B A l ( N ) A l ( N ) .
For B , since there is no Lagrange multiplier, directly, we derive by setting L / B = 0 and solve B to obtain the following updating rule:
B i j [ QA l ( N ) ( A l ( N ) A l ( N ) ) 1 ] i j .
Theorem 1. 
The objective function in (2) is nonincreasing under the updating rules in (6), (7), (9), and (10). The objective function is invariant under these updates if and only if A ( r ) , r = 1 , 2 , , N , G , and B are at a stationary point.

3.2. Proof of Convergence

To prove Theorem 1, we first give a definition and several lemmas.
Definition 1 
([5]). G ( u , u ) is an auxiliary function for F ( u ) if the conditions G ( u , u ) F ( u ) and G ( u , u ) = F ( u ) are satisfied.
Lemma 1 
([5]). If G ( u , u ) is an auxiliary function for F ( u ) , then F ( u ) is nonincreasing under the updating rule
u t + 1 = a r g m i n u G ( u , u t ) .
Proof. 
F( u t + 1 ) ≤ G( u t + 1 , u t ) ≤ G( u t , u t ) ≤ F( u t ). □
The equality F ( u t + 1 ) = F ( u t ) holds only if u t is a local minimum of G ( u , u t ) . By iterating the update rule (11), u t converges to the local minimum of G ( u , u t ) .
For any element A i j ( n ) in A ( n ) ( n = 1 , 2 , , N 1 ) , let F i j ( A i j ( n ) ) , denote the part of the objective function O 2 in (3) relevant to A i j ( n ) . Essentially, the updating rule is element wise, so it is sufficient to prove each F i j ( A i j ( n ) ) is nonincreasing under the update rule. The first derivative of the F i j ( A i j ( n ) ) with respect to A i j ( n ) is
F i j ( A i j ( n ) ) = [ 2 X ( n ) ( p n A ( p ) ) G ( n ) + 2 A ( n ) G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] i j .
Now, we have
Lemma 2. 
The function
G ( A i j ( n ) , A i j ( n ) t ) = F i j ( A i j ( n ) t ) + F i j ( A i j ( n ) t ) ( A i j ( n ) A i j ( n ) t ) + [ A ( n ) t G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] i j A i j ( n ) t ( A i j ( n ) A i j ( n ) t ) 2
is an auxiliary function for F i j ( A i j ( n ) ) , where the matrix A ( n ) t = ( A i j ( n ) t ) .
Proof. 
Obviously, G ( A i j ( n ) t , A i j ( n ) t ) = F i j ( A i j ( n ) t ) . According to Definition 1, we only need to show that G ( A i j ( n ) , A i j ( n ) t ) F i j ( A i j ( n ) ) . The Taylor series expansion of F i j ( A i j ( n ) ) at A i j ( n ) t is
F i j ( A i j ( n ) ) = F i j ( A i j ( n ) t ) + F i j ( A i j ( n ) t ) ( A i j ( n ) A i j ( n ) t ) + 1 2 F i j ( A i j ( n ) t ) ( A i j ( n ) A i j ( n ) t ) 2 ,
where F i j ( A i j ( n ) t ) = [ 2 G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] j j is the second-order derivative of the F i j ( A i j ( n ) ) relevant to A i j ( n ) t . Comparing (12) with (13), we can see that G ( A i j ( n ) , A i j ( n ) t ) F i j ( A i j ( n ) ) is equivalent to
[ A ( n ) t G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] i j A i j ( n ) t [ G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] j j .
To prove this inequality, we have
[ A ( n ) t G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] i j = l = 1 J n A i l ( n ) t [ G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] l j A i j ( n ) t [ G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] j j , / ;
therefore, the inequality G ( A i j ( n ) , A i j ( n ) t ) F i j ( A i j ( n ) ) holds. □
Similarly, let F i j ( A i j ( N ) ) denote the part of the objective function O 2 in (3) relevant to A i j ( N ) in A ( N ) . Then
Lemma 3. 
The function
G ( A i j ( N ) , A i j ( N ) t ) = F i j ( A i j ( N ) t ) + F i j ( A i j ( N ) t ) ( A i j ( N ) A i j ( N ) t ) + [ A ( N ) t G ( N ) ( p N A ( p ) A ( p ) ) G ( N ) + λ ( Q B ) + λ ( A l ( N ) t B B ) + ] i j A i j ( N ) t × ( A i j ( N ) A i j ( N ) t ) 2
is an auxiliary function for F i j ( A i j ( N ) ) .
Lemma 4. 
Let g i denote the element of v e c ( G ) and F i ( g i ) denote the part of the objective function O 2 in (3) relevant to g i . The function
G ( g i , g i t ) = F i ( g i t ) + F i ( g i t ) ( g i g i t ) + ( F F v e c ( G t ) ) i g i t ( g i g i t ) 2
is an auxiliary function for F i ( g i ) .
Since the proofs of Lemmas 3 and 4 are essentially similar to the proof of Lemma 2, they are omitted here.
Proof 
(Proof of Theorem 1). Replacing G ( u , u t ) in (11) by (12), the minimum is obtained by setting
G ( A i j ( n ) , A i j ( n ) t ) A i j ( n ) = F i j ( A i j ( n ) t ) + 2 [ A ( n ) t G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] i j A i j ( n ) t ( A i j ( n ) A i j ( n ) t ) = 0 ,
which yields
A i j ( n ) t + 1 = A i j ( n ) t [ X ( n ) ( p n A ( p ) ) G ( n ) ] i j [ A ( n ) t G ( n ) ( p n A ( p ) A ( p ) ) G ( n ) ] i j .
According to Lemma 2, F i j ( A i j ( n ) ) is nonincreasing under the updating rules (6) for A ( n ) .
Similarly, putting G ( A i j ( N ) , A i j ( N ) t ) of (14) into (11), we obtain
A i j ( N ) t + 1 = A i j ( N ) t [ X ( N ) ( p N A ( p ) ) G ( N ) + λ ( Q B ) + + λ ( A l ( N ) t B B ) ] i j [ A ( N ) t G ( N ) ( p N A ( p ) A ( p ) ) G ( N ) + λ ( Q B ) + λ ( A l ( N ) t B B ) + ] i j .
It is noted that the above equation is the same as (7). By Lemma 3, (14) is an auxiliary function of F i j ( A i j ( N ) ) , which means that F i j ( A i j ( N ) ) is nonincreasing under the updating rules (7) for A ( N ) .
Furthermore, putting G ( g i , g i t ) of (15) into (11), we obtain
g i t + 1 = g i t ( F v e c ( X ) ) i ( F F v e c ( G t ) ) i .
Use the same method as the proof that F i j ( A i j ( N ) ) is nonincreasing under the update rules (7), we obtain F i ( g i ) is nonincreasing under the updating rule (9).
Finally, we prove the convergence of the updating rule in (10) for B . Since the updating rule is derived by setting the derivative of Lagrangian with respect to B to 0 and B is unconstrained, it follows from the convexity of the objective function (2), which is equivalent to minimizing the objective function relevant to B in each iteration. Therefore, Theorem 1 is true. □

4. Experiments Section

In this Section, we show the use of our proposed DNTD scheme and two metrics, namely accuracy (AC) and normalized mutual information (NMI) [29], which are used to evaluate the performance for clustering on three datasets.

4.1. Datasets

To evaluate the effectiveness of the proposed DNTD method, three datasets are adopted to perform experiments in the following subsection. The descriptions of these datasets are given as follows:

4.1.1. ORL Dataset

The ORL (https://github.com/saeid436/Face-Recognition-MLP/tree/main/ORL, accessed on 11 December 2022) dataset collects 400 grayscale 112 × 92 faces images, which consist of 40 different subjects with 10 distinct images. For some subjects, these images were taken at different times, varying the lightning, facial expressions, and facial details. All images were taken against a dark homogeneous background, with subjects in an upright, frontal position. For each image, we resized it to be 32 × 32 pixels in our experiment. These images constructed a tensor X R 32 × 32 × 400 .

4.1.2. COIL20 Dataset

The COIL20 (http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html, accessed on 5 April 2022) dataset contains a total of 1440 grayscale images of 20 subjects, each of which has 72 images taken from diverse poses. Specifically, the subjects were placed on a motorized turntable rotating through 360 degrees to change subject pose with respect to a fixed camera, and the images of each subject were taken 5 degrees apart. In this experiment, each image was resized to 32 × 32 pixels and all images were stacked into a tensor X R 32 × 32 × 1440 .

4.1.3. Yale Dataset

The Yale (http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html, accessed on 5 April 2022) dataset consists of 165 images of 15 individuals. Each individual includes 11 images, which were taken in different facial expression or configurations, such as sad, sleepy, surprised, left-light, right-light, wearing glasses, no glasses, and so on, and each image was resized into 32 × 32 pixels. All images formed a tensor X R 32 × 32 × 165 .

4.2. Compared Algorithms and Experimental Setting

In order to verify the proposed algorithm is efficient and can enhance the clustering performance on these datasets, we compared it with the following algorithms:
  • K-means [30]: K-means algorithm aims to divide n sample data with m dimensions to K clusters so that the within-cluster sum of squares is minimized. It is a traditional clustering method. In our experiment, we use the command “kmeans” in Matlab 2016b to execute K-means algorithm;
  • Agglomerative hierarchical clustering (AHC) [31]: It is a bottom-up clustering method. It treats single-sample data as a class and then merges these sample data so that the number of classes gradually reduce, until finally they cluster to the required number of classes;
  • NMF [5]: NMF algorithm is one of the typical clustering algorithms. In our experiment, we adopted the Frobenius-norm formulation;
  • NTD [19]: NTD algorithm is considered a generalization of NMF;
  • DNMF [14]: DNMF algorithm incorporates the label information of sample data into the objective function of NMF and enforces the samples with the same label aligned on the same axis.
In each experiment, we randomly selected k categories as the evaluated data. Each experiment was operated 10 times and we applied K-means 10 times in the low-dimensional representation of each time. We randomly selected 30% sample data of each category as the labeled data in semi-supervised methods, such as DNMF and DNTD. For all compared methods, we set the dimension of the encoding matrix in the low-dimensional representation space as the number of the selected categories of each time. For NTD and the proposed method, we tested the clustering performance of NTD method and the proposed method when the size of core tensor is 1 4 I 1 × 1 4 I 2 × k , 1 2 I 1 × 1 2 I 2 × k , and 3 4 I 1 × 3 4 I 2 × k on subdatasets of three datasets. As a result, the clustering performance is better in most cases when the core tensor size is 1 2 I 1 × 1 2 I 2 × k on COIL20 dataset, 3 4 I 1 × 3 4 I 2 × k on ORL dataset, and 3 4 I 1 × 3 4 I 2 × k on Yale dataset; so, the sizes of the core tensors of NTD method and the proposed DNTD method are set as 1 2 I 1 × 1 2 I 2 × k on COIL20 dataset and 3 4 I 1 × 3 4 I 2 × k on others, respectively. Regarding the regularization parameter of DNMF and the proposed method, empirically, we set λ = 10 1 on COIL20 dataset and λ = 10 6 on others.

4.3. Clustering Results

Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 represent the average clustering performance and standard deviation on the ORL dataset, COIL20 dataset, and Yale dataset, respectively. Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 show that the proposed algorithm achieves the better performance for clustering on three datasets. On ORL dataset, the proposed algorithm achieves 14.41 % , 32.17 % , 8.86 % , 6.88 % , and 4.81 % improvements in AC and 12.09 % , 27.50 % , 8.33 % , 5.60 % , and 4.22 % in NMI by comparing with K-means, AHC, NMF, NTD, and DNMF, respectively. For COIL20 dataset, the proposed algorithm attains 7.32 % , 8.47 % , 4.02 % , and 6.11 % improvements corresponding to AC and 5.47 % , 10.43 % , 4.35 % , and 8.10 % improvements corresponding to NMI in comparison with K-means, NMF, NTD, and DNMF, respectively. Compared with AHC algorithm, AC of the proposed algorithm is improved by 1.59%; however, NMI is reduced by 2.54%. On Yale dataset, the proposed algorithm increases 6.53 % , 26.86 % , 4.55 % , 4.66 % , and 1.29 % in AC and 6.40 % , 29.64 % , 5.30 % , 5.56 % , and 1.27 % in NMI in contrast to K-means, AHC, NMF, NTD, and DNMF, respectively. Furthermore, we also applied AHC algorithm in the new representation generated by DNTD while other parameter settings are kept unaltered, and the average of clustering performance is 62.58%, 69.93% on ORL dataset, 74.32%, and 73.33% on COIL20 dataset, and 26.41% and 21.00% on Yale dataset with respect to AC and NMI, respectively. Comprehensively, the clustering performance of DNTD with AHC algorithm is lower than that with K-means algorithm in most cases; therefore, we utilized K-means algorithm together with the proposed DNTD method to test the clustering performance.
As we can see, the semi-supervised DNTD and DNMF methods are superior to the unsupervised methods, such as NTD, NMF, K-means, and AHC on the ORL and Yale datasets, which means that taking the label information of sample data into account is useful for improving the performances of NTD and NMF. Particularly, the improvements in the clustering performance of the DNTD method is obvious and attractive on the ORL dataset. Moreover, the changes between NTD and DNTD in AC and NMI exceed that between NMF and DNMF on all datasets, which indicates that incorporating the label-discriminative constraint term into NTD is more effective than the combination of the label-discriminative term and NMF. Furthermore, the DNTD outdoes all the compared methods.The DNTD is more powerful when boosting the discriminative ability by the available label information of sample data than the DNMF and is capable of effectively extracting the low-dimensional representation of the data tensor. All in all, the proposed DNTD method owns the best average clustering performance among the compared methods thanks to the label-discriminative information of sample data in conjunction with the NTD of data tensor representation.
Our proposed method is a semi-supervised algorithm, so there exists a close relationship between the clustering performance and the number of labeled sample data. Figure 1 represents the clustering performance with a varied number of labeled sample data on the ORL dataset and Yale dataset. From Figure 1, we can discover that the clustering performance of the DNTD method can be enhanced with the increase in the number of labeled sample data; furthermore, the DNTD method surpasses the other compared methods as well.
In addition, we also observed that the clustering performance of the proposed method is better on the ORL and COIL20 datasets than on the Yale dataset. In our view, there exist two main factors lowering the performance of the proposed method on the Yale dataset. Firstly, some of the images in the Yale dataset are taken in poor light, which leads to the black and blurred backgrounds in these images. Secondly, some individuals are wearing sunglasses, that is, the faces of these individuals are partly shielded by the glass. These two factors result in bad image quality, which directly leads the obtained data to lose many true values, while also simultaneously generating many incorrect values. So, the proposed algorithm is largely interfered with, so that the performance is lowered.

4.4. Parameter Selection

In our experiment, there exists one parameter λ to be decided. The parameter λ measures the degree of the discrimination of the label information of the sample data. In this subsection, we study the importance of the parameter λ in the DNTD method by clustering performance on the above three datasets. The parameter is selected from the set { 10 2 , 10 1 , 10 0 , 10 1 , 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , 10 7 }. On each dataset, we fix the number of categories as 10 for simplicity. Firstly, we randomly select 10 categories to perform the experiment and apply K-means 10 times in the low-dimensional representation; then, we repeat the above operations 10 times and calculate the average. The results with the effect of the parameter are displayed in Figure 2. From Figure 2, we discover the DNTD scheme displays a good stable performance when λ is ranging from 10 2 to 10 5 on all datasets; that is, it is not sensitive to the parameter λ , which helps to promote the stability of DNTD. Furthermore, we observe the DNTD scheme gains the best performance when λ = 10 6 on the ORL and Yale dataset, empirically; therefore, λ is selected to be 10 6 on the ORL and Yale dataset, and 10 1 on the COIL20 dataset.

5. Conclusions and Future Work

In this paper, we have introduced a label constraint nonnegative Tucker decomposition method for tensor data representation, called discriminative nonnegative Tucker decomposition, or DNTD for short, which specifically takes the label information of the sample data into account by constructing a label matrix and further constructing a label-discriminative constraint term. Moreover, an updating rule and the proof of its convergence have been provided. As a result, numerical experiments on three datasets have demonstrated the effectiveness of the proposed method for clustering performance.
Although the proposed DNTD method has shown good clustering performance, it still has limitations from a geometric perspective. In the future, to preserve the geometric structures of sample data on manifold, an investigation will be conducted. Furthermore, as an important clustering algorithm, NTD does not completely show an excellent and robust performance when dealing with some outliers. In the future, generating a robust and efficient learning method will be considerable work.

Author Contributions

Conceptualization, W.J. and L.L.; methodology, L.L. and Q.L.; software W.J. and Q.L.; writing—original draft preparation W.J., L.L. and Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the National Natural Science Foundation of China under Grant 12161020, 12061025, and partially funded by the Natural Science Foundation of Educational Commission of Guizhou Province under Grant Qian-Jiao-He KY Zi [2021]298.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NTDNonnegative Tucker decomposition
DNTDDiscriminative NTD
NMFNonnegative matrix factorization
GNMFGraph-regularized NMF
CNMFConstrained NMF
EEGMultichannel electroencephalography
GNTDGraph-regularized NTD
ONTDOrthogonal NTD
GDNMFGraph-based Discriminative NMF
GSNMFCGraph-regularized and sparse NMF with hard constraints
SRDNMFSemi-supervised robust distribution-based NMF
SNTDSemi-supervised NTD
KKTKarush–Kuhn–Tucker
ACAccuracy
NMINormalized mutual information

References

  1. Cai, B.; Lu, G.F. Tensor subspace clustering using consensus tensor low-rank representation. Inform. Sci. 2022, 609, 46–59. [Google Scholar] [CrossRef]
  2. Wang, X.; Yang, L.T.; Kuang, L.; Liu, X.; Zhang, Q.; Deen, M.J. A tensor-based big-data-driven routing recommendation approach for heterogeneous networks. IEEE Netw. 2019, 33, 64–69. [Google Scholar] [CrossRef]
  3. Long, Z.; Liu, Y.; Chen, L.; Zhu, C. Low rank tensor completion for multiway visual data. Signal Process. 2019, 155, 301–316. [Google Scholar] [CrossRef] [Green Version]
  4. Bernardi, A.; Carlini, E.; Catalisano, M.V.; Gimigliano, A.; Oneto, A. The hitchhiker guide to: Secant varieties and tensor decomposition. Mathematics 2018, 6, 314. [Google Scholar] [CrossRef] [Green Version]
  5. Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 2000, 13, 556–562. [Google Scholar]
  6. Li, X.; Wang, L.; Cheng, Q.; Wu, P.; Gan, W.; Fang, L. Cloud removal in remote sensing images using nonnegative matrix factorization and error correction. ISPRS J. Photogramm. Remote Sens. 2019, 148, 103–113. [Google Scholar] [CrossRef]
  7. Ye, F.; Chen, C.; Zheng, Z. Deep autoencoder-like nonnegative matrix factorization for community detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 1393–1402. [Google Scholar]
  8. Yang, Z.; Liang, N.; Yan, W.; Li, Z.; Xie, S. Uniform distribution non-negative matrix factorization for multiview clustering. IEEE Trans. Cybern. 2020, 51, 3249–3262. [Google Scholar] [CrossRef]
  9. Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 1548–1560. [Google Scholar]
  10. Yu, N.; Wu, M.J.; Liu, J.X.; Zheng, C.H.; Xu, Y. Correntropy-based hypergraph regularized NMF for clustering and feature selection on multi-cancer integrated data. IEEE Trans. Cybern. 2020, 51, 3952–3963. [Google Scholar] [CrossRef]
  11. Chong, Y.; Ding, Y.; Yan, Q.; Pan, S. Graph-based semi-supervised learning: A review. Neurocomputing 2020, 408, 216–230. [Google Scholar] [CrossRef]
  12. Song, Z.; Yang, X.; Xu, Z.; King, I. Graph-based semi-supervised learning: A comprehensive review. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–21. [Google Scholar] [CrossRef] [PubMed]
  13. Liu, H.; Wu, Z.; Li, X.; Cai, D.; Huang, T.S. Constrained nonnegative matrix factorization for image representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1299–1311. [Google Scholar] [CrossRef] [PubMed]
  14. Babaee, M.; Tsoukalas, S.; Babaee, M.; Rigoll, G.; Datcu, M. Discriminative nonnegative matrix factorization for dimensionality reduction. Neurocomputing 2016, 173, 212–223. [Google Scholar] [CrossRef]
  15. Cichocki, A.; Mandic, D.; De Lathauwer, L.; Zhou, G.; Zhao, Q.; Caiafa, C.; Phan, H.A. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Process. Mag. 2015, 32, 145–163. [Google Scholar] [CrossRef] [Green Version]
  16. Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
  17. Zaorálek, L.; Prílepok, M.; Snáel, V. Recognition of face images with noise based on Tucker decomposition. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, China, 9–12 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 2649–2653. [Google Scholar]
  18. Yang, L.; Zhou, J.; Jing, J.; Wei, L.; Li, Y.; He, X.; Feng, L.; Nie, B. Compression of hyperspectral images based on Tucker decomposition and CP decomposition. J. Opt. Soc. Am. A 2022, 39, 1815–1822. [Google Scholar] [CrossRef] [PubMed]
  19. Kim, Y.D.; Choi, S. Nonnegative Tucker decomposition. In Proceedings of the 2007 IEEE conference on computer vision and pattern recognition, Minneapolis, MN, USA, 17–22 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–8. [Google Scholar]
  20. Qiu, Y.; Zhou, G.; Wang, Y.; Zhang, Y.; Xie, S. A generalized graph regularized non-negative Tucker decomposition framework for tensor data representation. IEEE Trans. Cybern. 2020, 594–607, 594–607. [Google Scholar] [CrossRef]
  21. Marmoret, A.; Cohen, J.E.; Bertin, N.; Bimbot, F. Uncovering audio patterns in music with nonnegative Tucker decomposition for structural segmentation. arXiv 2021, arXiv:2104.08580. [Google Scholar]
  22. Cohen, J.E.; Comon, P.; Gillis, N. Some theory on non-negative Tucker decomposition. In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, Grenoble, France, 21–23 February 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 152–161. [Google Scholar]
  23. Qiu, Y.; Zhou, G.; Zhang, Y.; Xie, S. Graph regularized nonnegative Tucker decomposition for tensor data representation. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8613–8617. [Google Scholar]
  24. Pan, J.; Ng, M.K.; Liu, Y.; Zhang, X.; Yan, H. Orthogonal nonnegative Tucker decomposition. SIAM J. Sci. Comput. 2021, 43, B55–B81. [Google Scholar] [CrossRef]
  25. Li, H.; Zhang, J.; Shi, G.; Liu, J. Graph-based discriminative nonnegative matrix factorization with label information. Neurocomputing 2017, 266, 91–100. [Google Scholar] [CrossRef]
  26. Sun, F.; Xu, M.; Hu, X.; Jiang, X. Graph regularized and sparse nonnegative matrix factorization with hard constraints for data representation. Neurocomputing 2016, 173, 233–244. [Google Scholar] [CrossRef]
  27. Peng, X.; Xu, D.; Chen, D. Robust distribution-based nonnegative matrix factorizations for dimensionality reduction. Inform. Sci. 2021, 552, 244–260. [Google Scholar] [CrossRef]
  28. Qiu, Y.; Zhou, G.; Chen, X.; Zhang, D.; Zhao, X.; Zhao, Q. Semi-supervised non-negative Tucker decomposition for tensor data representation. Sci. China Tech. Sci. 2021, 64, 1881–1892. [Google Scholar] [CrossRef]
  29. Xu, W.; Liu, X.; Gong, Y. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada, 28 July–1 August 2003; pp. 267–273. [Google Scholar]
  30. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of california press: Oakland, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
  31. Sasirekha, K.; Baby, P. Agglomerative hierarchical clustering algorithm-a review. Int. J. Sci. Res. Publ. 2013, 83, 83. [Google Scholar]
Figure 1. The clustering performance with varied labeled sample data on ORL and Yale dataset.
Figure 1. The clustering performance with varied labeled sample data on ORL and Yale dataset.
Mathematics 10 04723 g001
Figure 2. The clustering performance of DNTD method in term of AC and NMI with varied parameter λ on ORL, COIL20 and Yale dataset.
Figure 2. The clustering performance of DNTD method in term of AC and NMI with varied parameter λ on ORL, COIL20 and Yale dataset.
Mathematics 10 04723 g002
Table 1. AC (%) ± standard deviation (%) of different algorithms on ORL dataset.
Table 1. AC (%) ± standard deviation (%) of different algorithms on ORL dataset.
kK-MeansAHCNMFNTDDNMFDNTD
380.43 ± 14.6574.67 ± 16.6377.93 ± 16.0682.70 ± 14.3280.97 ± 18.2285.43 ± 14.20
570.80 ± 11.8358.60 ± 17.2071.56 ± 12.1377.22 ± 12.9972.94 ± 13.8384.28 ± 12.25
765.67 ± 9.8449.43 ± 8.9973.40 ± 9.8972.23 ± 9.9274.29 ± 10.0176.61 ± 8.78
962.96 ± 9.0146.11 ± 12.1569.66 ± 8.8371.23 ± 6.5474.57 ± 8.4875.23 ± 8.07
1160.40 ± 7.4836.27 ± 6.8966.83 ± 5.9570.90 ± 6.3471.78 ± 7.4974.89 ± 7.58
1359.22 ± 7.6236.62 ± 8.3665.50 ± 7.1868.16 ± 7.3169.82 ± 7.3477.46 ± 6.03
1556.54 ± 7.2037.27 ± 11.1165.81 ± 7.6664.81 ± 7.0369.29 ± 7.0874.47 ± 6.12
1756.03 ± 5.1836.06 ± 8.4765.38 ± 6.3365.05 ± 5.9170.18 ± 6.1275.02 ± 6.07
1955.30 ± 4.9132.47 ± 6.5161.25 ± 4.9062.86 ± 5.7669.89 ± 5.2573.67 ± 6.19
Avg.63.0445.2868.5970.5772.6477.45
Table 2. NMI (%) ± standard deviation (%) of different algorithms on ORL dataset.
Table 2. NMI (%) ± standard deviation (%) of different algorithms on ORL dataset.
kK-MeansAHCNMFNTDDNMFDNTD
369.99 ± 18.0769.55 ± 15.7464.55 ± 20.2171.89 ± 19.6570.26 ± 25.5577.17 ± 18.14
570.01 ± 11.6065.40 ± 19.0668.39 ± 12.4077.80 ± 10.8070.72 ± 12.8082.51 ± 10.12
769.07 ± 9.5458.85 ± 14.8875.07 ± 7.4875.77 ± 7.7277.23 ± 8.3978.44 ± 7.57
970.16 ± 8.1854.92 ± 12.9574.89 ± 7.1675.83 ± 4.2179.54 ± 5.3778.97 ± 6.34
1168.99 ± 6.8547.84 ± 10.7974.13 ± 4.1578.09 ± 5.2478.89 ± 5.6381.25 ± 5.71
1368.89 ± 6.6546.77 ± 11.3874.02 ± 5.6876.91 ± 6.1178.60 ± 5.5884.32 ± 3.76
1567.74 ± 5.7649.23 ± 14.2675.55 ± 4.9875.16 ± 5.3078.80 ± 4.1282.76 ± 3.93
1769.79 ± 4.3348.58 ± 11.4877.20 ± 4.3876.01 ± 4.5879.89 ± 4.2284.77 ± 3.25
1969.72 ± 3.4144.46 ± 7.9074.40 ± 3.4475.24 ± 3.9081.24 ± 3.8582.95 ± 4.01
Avg.69.3753.9673.1375.8677.2481.46
Table 3. AC (%) ± standard deviation (%) of different algorithms on COIL20 dataset.
Table 3. AC (%) ± standard deviation (%) of different algorithms on COIL20 dataset.
kK-MeansAHCNMFNTDDNMFDNTD
295.65 ± 9.3595.07 ± 14.8791.71 ± 17.1090.72 ± 14.3986.62 ± 16.6995.97 ± 5.27
375.53 ± 16.9586.90 ± 21.8772.86 ± 15.9282.00 ± 14.5179.03 ± 18.5486.99 ± 13.63
471.69 ± 14.5684.79 ± 17.0174.70 ± 14.9675.86 ± 12.9074.87 ± 16.1278.62 ± 15.95
571.44 ± 14.5572.11 ± 23.9869.23 ± 8.0776.58 ± 12.9870.61 ± 15.1879.75 ± 16.22
665.70 ± 10.5474.84 ± 20.1963.49 ± 14.3968.66 ± 12.7876.55 ± 12.7376.74 ± 16.02
766.12 ± 10.1967.30 ± 21.1565.48 ± 13.4971.00 ± 9.8862.48 ± 16.5571.43 ± 11.18
864.60 ± 10.3468.84 ± 16.1366.78 ± 7.0063.84 ± 12.7769.30 ± 9.4969.95 ± 8.33
958.87 ± 7.7665.57 ± 20.0656.18 ± 16.1967.31 ± 6.6959.81 ± 11.8968.70 ± 8.16
Avg.71.2076.9370.0574.5072.4178.52
Table 4. NMI (%) ± standard deviation (%) of different algorithms on COIL20 dataset.
Table 4. NMI (%) ± standard deviation (%) of different algorithms on COIL20 dataset.
kK-MeansAHCNMFNTDDNMFDNTD
287.22 ± 25.1990.07 ± 29.9481.96 ± 36.5273.55 ± 31.4964.39 ± 39.5684.54 ± 19.65
361.70 ± 26.1581.82 ± 31.5955.21 ± 24.2870.08 ± 18.7164.08 ± 27.5277.23 ± 21.34
466.47 ± 15.2484.09± 18.7666.05 ± 19.2672.07 ± 11.9569.89 ± 18.9573.27 ± 19.75
569.42 ± 15.2071.54 ± 26.3764.40 ± 11.4173.30 ± 11.0465.31 ± 16.6176.51 ± 16.13
665.45 ± 9.4476.57 ± 21.2860.64 ± 16.3869.88 ± 12.3576.08 ± 11.8874.67 ± 16.34
771.45 ± 8.4871.97 ± 19.8563.53 ± 16.2971.13 ± 10.2661.31 ± 20.8471.19 ± 9.99
871.30 ± 9.0475.48 ± 12.9569.98 ± 6.4866.41 ± 10.7873.21 ± 7.4973.74 ± 8.36
966.87 ± 5.7572.47 ± 19.3458.49 ± 19.0972.45 ± 6.0564.64 ± 13.0572.51 ± 6.48
Avg.69.9978.0065.0371.1167.3675.46
Table 5. AC (%) ± standard deviation (%) of different algorithms on Yale dataset.
Table 5. AC (%) ± standard deviation (%) of different algorithms on Yale dataset.
kK-MeansAHCNMFNTDDNMFDNTD
554.07 ± 11.3331.82 ± 5.5655.45 ± 9.8054.98 ± 9.2057.15 ± 9.1856.95 ± 11.15
647.12 ± 7.5328.64 ± 3.8249.61 ± 7.3547.79 ± 8.9855.39 ± 8.4255.64 ± 8.03
745.77 ± 7.3222.34 ± 3.2948.61 ± 7.0251.09 ± 7.1253.17 ± 5.9454.25 ± 7.61
845.89 ± 6.8524.32 ± 2.0646.47 ± 4.3745.43 ± 6.5952.00 ± 9.3650.48 ± 5.35
942.07 ± 6.2121.92 ± 3.1544.54 ± 5.1842.59 ± 5.6648.90 ± 5.1749.91 ± 5.84
1041.95 ± 6.5721.73 ± 2.6944.80 ± 5.0445.40 ± 4.6646.74 ± 5.4551.62 ± 5.80
1140.67 ± 5.4820.66 ± 1.4443.30 ± 4.3644.92 ± 4.9345.35 ± 5.2747.36 ± 5.17
1241.77 ± 4.7620.76 ± 2.1042.57 ± 4.2641.47 ± 4.3043.53 ± 5.1245.64 ± 5.02
1338.31 ± 3.6220.42 ± 1.7240.37 ± 4.1741.07 ± 4.0743.24 ± 5.9144.97 ± 4.16
1438.36 ± 3.7020.13 ± 1.3140.11 ± 3.4839.99 ± 3.8542.94 ± 4.3644.45 ± 4.36
Avg.43.6023.2745.5845.4748.8450.13
Table 6. NMI (%) ± standard deviation (%) of different algorithms on Yale dataset.
Table 6. NMI (%) ± standard deviation (%) of different algorithms on Yale dataset.
kK-MeansAHCNMFNTDDNMFDNTD
541.76 ± 13.0317.45 ± 9.7340.71 ± 12.3241.55 ± 9.8243.60 ± 12.0644.53 ± 13.03
636.12 ± 8.3821.19 ± 8.8738.04 ± 9.0034.32 ± 9.2845.26 ± 9.0145.62 ± 8.16
739.05 ± 8.3110.92 ± 4.9940.99 ± 8.1444.28 ± 8.2945.45 ± 6.4447.82 ± 6.90
841.90 ± 7.0819.39 ± 4.0440.97 ± 4.2240.12 ± 6.7047.67 ± 10.4846.66 ± 5.74
938.90 ± 6.8417.03 ± 6.1841.76 ± 4.6339.36 ± 5.7347.31 ± 4.3247.40 ± 5.08
1041.20 ± 6.7517.71 ± 4.0144.25 ± 4.6744.52 ± 4.2247.50 ± 4.6950.65 ± 5.64
1142.05 ± 5.5018.12 ± 1.7943.27 ± 3.4244.77 ± 4.9945.84 ± 4.2847.99 ± 4.67
1244.05 ± 4.6218.62 ± 3.0144.44 ± 3.5942.90 ± 4.2245.32 ± 5.0147.50 ± 4.58
1342.31 ± 3.5318.89 ± 2.7143.43 ± 3.2743.59 ± 3.1646.82 ± 6.2947.94 ± 3.24
1443.46 ± 3.3819.05 ± 2.2043.93 ± 2.8043.79 ± 3.4947.35 ± 3.3548.73 ± 3.82
Avg.41.0817.8442.1841.9246.2147.48
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jing, W.; Lu, L.; Liu, Q. Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation. Mathematics 2022, 10, 4723. https://doi.org/10.3390/math10244723

AMA Style

Jing W, Lu L, Liu Q. Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation. Mathematics. 2022; 10(24):4723. https://doi.org/10.3390/math10244723

Chicago/Turabian Style

Jing, Wenjing, Linzhang Lu, and Qilong Liu. 2022. "Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation" Mathematics 10, no. 24: 4723. https://doi.org/10.3390/math10244723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop