Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation

Jing, Wenjing; Lu, Linzhang; Liu, Qilong

doi:10.3390/math10244723

Open AccessArticle

Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation

by

Wenjing Jing

^1,†,

Linzhang Lu

^1,2,*,†

and

Qilong Liu

^1,†

¹

School of Mathematical Sciences, Guizhou Normal University, Guiyang 550025, China

²

School of Mathematical Sciences, Xiamen University, Xiamen 361005, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(24), 4723; https://doi.org/10.3390/math10244723

Submission received: 7 November 2022 / Revised: 5 December 2022 / Accepted: 8 December 2022 / Published: 12 December 2022

Download

Browse Figures

Versions Notes

Abstract

Nonnegative Tucker decomposition (NTD) is an unsupervised method and has been extended in many applied fields. However, NTD does not make use of the label information of sample data, even though such label information is available. To remedy the defect, in this paper, we propose a label constraint NTD method, namely Discriminative NTD (DNTD), which considers a fraction of the label information of the sample data as a discriminative constraint. Differing from other label-based methods, the proposed method enforces the sample data, with the same label to be aligned on the same axis or line. Combining the NTD and the label-discriminative constraint term, DNTD can not only extract the part-based representation of the data tensor but also boost the discriminative ability of the NTD. An iterative updating algorithm is provided to solve the objective function of DNTD. Finally, the proposed DNTD method is applied to image clustering. Experimental results on ORL, COIL20, Yale datasets show the clustering accuracy of DNTD is improved by 8.47–32.17% and the normalized mutual information is improved by 10.43–29.64% compared with the state-of-the-art approaches.

Keywords:

tensor factorization; nonnegative Tucker decomposition; label information; dimension reduction; clustering

MSC:

68U10; 62H30; 15A69

1. Introduction

In recent years, data analysis has attracted more and more attention in many application fields, such as machine learning and artificial intelligence, as well as computer vision. For instance, in Ref. [1], cluster learning of data is analysed by tensor low-rank representation. In Ref. [2], routing recommendations are implemented for heterogeneous databy tensor-based frameworks. Long et al. [3] recover the missing entries of visual data by tensor completion. Bernardi et al. [4] utilize secant varieties and tensor decomposition to generate a hitchhiker’s guide. The real-world sample data are often high dimensional, while the important information and structures are laid in a low-dimensional representation space of the sample data. Thus, many dimensionality reduction methods have been proposed to seek an appropriate low-dimensional representation of the original sample data.

Nonnegative matrix factorization (NMF) [5] as a traditional dimensionality reduction method has been widely used for different purposes, such as image processing [6], community detection [7], clustering [8], etc. The NMF of a nonnegative data matrix

X \in R_{+}^{m \times n}

is to find two low-rank nonnegative matrices, namely

U \in R_{+}^{m \times s}

and

V \in R_{+}^{n \times s}

, so that

X \approx U V^{⊤}

. For example, if

X

is an image sample data matrix whose column includes the pixel data of an image sample, then m can represent the number of image sample pixel data, n can represent the number of all image samples, s can denote the dimension of the low-dimensional representation of the image sample data matrix,

U

is called the basis matrix, and

V

is called the encoding matrix and is also regarded as the low-dimensional representation of the image sample data matrix. Generally, s is chosen as a number much less than m or n. However, a weakness of the NMF is that it fails to consider to preserve the geometrical information of the sample data. Thus, Cai et al. proposed a graph-regularized NMF (GNMF), which encodes the geometrical information of the sample data in the low-dimensional representation space by constructing a nearest-neighbor graph [9]. Using the hypergraph regularization and the correntropy instead of the Euclidean norm in the loss term of NMF, Yu et al. proposed a correntropy-based hypergraph-regularized NMF (CHNMF) [10]. The CHNMF considers the high-order geometric relationships inherent in the sample data and reduces the influence of noises and outliers.

When the label information of sample data is available, naturally, it can also be used to construct a graph. Generally speaking, sample data can be separated into fully labeled, partially labeled, and unlabeled, correspondingly, and the algorithms using these data are categorized into supervised, semi-supervised, and unsupervised algorithms [11]. On the one hand, in practice, a mass of labeled sample data are hard to obtain; however, little-labeled sample data are readily achieved. So, it is not difficult to generate a semi-supervised method. On the other hand, the semi-supervised method can utilize all sample data and simultaneously use the labels of partial sample data so that the semi-supervised method is able to obtain the performance of the unsupervised method, simultaneously propagating labels with the guidance of the supervisory information. Restricting the data to satisfy the prior label information is a common way for using label information to develop a semi-supervised method [12]. For example, Liu et al. introduced a constrained NMF (CNMF) by incorporating the label information of some sample data into the objective function of NMF [13]. In CNMF, sample data with the same labels are merged into a single point. Babaee et al. studied a discriminative NMF (DNMF) by utilizing a fraction of the label information of sample data in a regularization term [14]. Differing from CNMF, DNMF enforces the sample data with the same label to be aligned on the same axis to construct a label matrix, and then produces a label-discriminative regularizer by bridging the label matrix and labeled sample data. However, when these NMF-based methods deal with several-image sample data, the data are first vectorized, forming a matrix-like

X

of the example mentioned above, and then are represented by a low-rank approximation, which often destroys the internal structure of sample data. In Ref. [15], the study indicates the tensor can partly solve the problem.

In fact, in real-world examples, there are many sample data represented by a tensor (i.e., multiway array), e.g., color images, video clips, multichannel electroencephalography (EEG), etc. For these reasons, tensor factorization techniques were proposed to deal with data tensors. The Tucker decomposition [16], which is one of the tensor factorization methods, has been widely applied to face recognition [17], image processing [18], signal processing [15], etc. In Ref. [19], Kim et al. introduced a nonnegative Tucker decomposition (NTD) by combining Tucker decomposition with nonnegativity constraints on the core tensor and factor matrices. Since then, NTD has been extended in many applications, such as clustering [20], pattern extraction [21], and signal analysis [22], and several variants of NTD from diverse perspectives have been proposed. For example, Qiu et al. studied a graph-regularized nonnegative Tucker Decomposition (GNTD), which incorporates graph regularization into NTD to preserve the geometrical information from a data tensor [23]. Pan et al. introduced an orthogonal nonnegative Tucker decomposition (ONTD) by considering the orthogonality on each factor matrix [24]. However, NTD and its variants mentioned above are unsupervised algorithms, meaning they do not use the available label information of sample data.

Recently, many researchers have considered cases when the label information of sample data is available. Therefore, to make better use of the label information of sample data, many works have been introduced by incorporating the label information constraints into the framework of NMF or NTD, such as CNMF [13], DNMF [14], graph-based Discriminative NMF (GDNMF) [25], graph-regularized and sparse NMF with hard constraints (GSNMFC) [26], semi-supervised robust distribution-based NMF (SRDNMF) [27], and semi-supervised NTD (SNTD) [28]. In these methods, CNMF and DNMF both incorporate the label information into NMF; however, they do not consider the geometric structure between sample data. Thus, GDNMF was proposed, which incorporated the graph regularization and label information into NMF. Using this trick, GSNMFC was proposed by jointly incorporating a graph regularizer, label information, and a sparseness constraint. However, while these mentioned methods utilize the label information, they do not regard the robustness. To improve the robustness, SRDNMF introduced a Kullback–Leibler divergence and label discriminative constraint into NMF. However, these semi-supervised methods are learning-algorithms-based NMF, which may destroy the inherent structure of sample data. To alleviate the drawbacks, based on NTD, SNTD was introduced by jointly propagating the limited label information and learning the nonnegative tensor representation [28]. Although the SNTD uses the label constraint, it fails to consider that the sample data with the same label could be aligned on the same axis or line.

Motivated by recent progress [14,19], in this paper, we propose a label-constrained NTD, called Discriminative NTD, or DNTD for short. We incorporate a fraction of the label information of the sample data into NTD, and the key idea is that the sample data that belong to the same class should be aligned on the same axis or line in the low-dimensional representation. We also discuss how to efficiently solve the corresponding optimization problem, and we provide the optimization scheme and its convergence proof. The main contributions of the proposed approach are:

By constructing the label matrix and coupling the label-discriminative regularizer to the objective function of NTD, the DNTD method can not only extract the part-based representation from the data tensor but also boost the discriminative ability of the NTD. Furthermore, the key idea of the label-discriminative term is that the sample data belonging to the same label are very close or aligned on the same axis or line in the low-dimensional representation;
An efficient updating algorithm is developed to solve the optimization problem and the convergence proof is provided;
Numerical examples from real-world applications are provided to demonstrate the effectiveness of the proposed method.

The rest of the paper is organized as follows: In Section 2, we briefly review the NTD. In Section 3, the DNTD method is proposed, and the detailed algorithm and proof of convergence of the algorithm are provided. In Section 4, experiments for clustering tasks are presented. Finally, in Section 5, conclusions are drawn.

2. Nonnegative Tucker Decomposition

Given a nonnegative data tensor

X \in R_{+}^{I_{1} \times I_{2} \times \dots \times I_{N - 1} \times I_{N}}

, where

X_{j} \in R_{+}^{I_{1} \times I_{2} \times \dots \times I_{N - 1}} (j = 1, 2, \dots, I_{N})

is a data sample, nonnegative Tucker decomposition (NTD) aims at decomposing the nonnegative tensor

X

into a nonnegative core tensor

G \in R_{+}^{J_{1} \times J_{2} \times \dots \times J_{N}}

, which is multiplied by N nonnegative factor matrices

A^{(r)} \in R_{+}^{I_{r} \times J_{r}} (r = 1, 2, \dots, N)

along each mode [19]. To achieve this goal, NTD minimizes the sum of squared residues between the data tensor

X

and the multilinear product of the core tensor

G

and factor matrices

A^{(r)}

, which can be formulated as

\begin{matrix} min_{G, A^{(1)}, \dots, A^{(N)}} & O_{1} = {∥\begin{matrix} X - G \times_{1} A^{(1)} \times_{2} A^{(2)} \dots \times_{N} A^{(N)} \end{matrix}∥}^{2} \\ s . t . G ⩾ 0, A^{(r)} ⩾ 0, r = 1, 2, \dots, N, \end{matrix}

(1)

where, and in the following, the operator

\times_{r}

is referred to as the r-mode product [16]. For example, the r-mode product of a tensor

Y \in R^{J_{1} \times J_{2} \times \dots \times J_{N}}

and a matrix

U \in R^{I_{r} \times J_{r}}

, denoted by

Y \times_{r} U

, is of size

J_{1} \times \dots \times J_{r - 1} \times I_{r} \times J_{r + 1} \times \dots \times J_{N}

and

{(Y \times_{r} U)}_{j_{1} \dots j_{r - 1} i_{r} j_{r + 1} \dots j_{N}} = \sum_{j_{r} = 1}^{J_{r}} y_{j_{1} \dots j_{r - 1} j_{r} j_{r + 1} \dots j_{N}} u_{i_{r} j_{r}}

.

Equation (1) can be represented as a matrix form:

\begin{matrix} \begin{matrix} min_{G_{(N)}, A^{(1)}, \dots, A^{(N)}} & O_{1} = {∥\begin{matrix} X_{(N)} - A^{(N)} G_{(N)} {(\otimes_{p \neq N} A^{(p)})}^{⊤} \end{matrix}∥}_{F}^{2} \\ s . t . G_{(N)} ⩾ 0, A^{(r)} ⩾ 0, r = 1, 2, \dots, N, \end{matrix} \end{matrix}

where, and in the following,

X_{(N)} \in R_{+}^{I_{N} \times I_{1} I_{2} \dots I_{N - 1}}

and

G_{(N)} \in R_{+}^{J_{N} \times J_{1} J_{2} \dots J_{N - 1}}

are the mode-N unfolding matrices of the data tensor

X

and core tensor

G

[16], respectively,

A^{(N)} = {[a_{1}, a_{2}, \dots, a_{I_{N}}]}^{⊤} \in R_{+}^{I_{N} \times J_{N}}

(a_{j}^{⊤}

denotes the j-th row of

A^{(N)})

, and

\otimes_{p \neq N} A^{(p)} = A^{(N - 1)} \otimes A^{(N - 2)} \otimes \dots \otimes A^{(1)}

and ⊗ denote the Kronecker product. Therefore, the NTD can be transformed to the NMF with the encoding matrix

A^{(N)} \in R_{+}^{I_{N} \times J_{N}}

, where

I_{N}

and

J_{N}

can be regarded as the number of all the samples and the dimension of low-dimensional representation of the data tenor

X

with respect to the basis matrix

G_{(N)} {(\otimes_{p \neq N} A^{(p)})}^{⊤}

, respectively.

3. Discriminative Nonnegative Tucker Decomposition

To our knowledge, the NTD is an unsupervised method that fails to take the label information of the sample data into account even though the label information is available. However, actually, the label information of sample data has been widely used for increasing supervisory performance together with unsupervised priors [14,25,26,28].

Given the partial labels of sample data, naturally, it is reasonable to assume that sample data, which belong to the same class, should be very close or aligned on the same axis or line. In NTD, therefore, we would expect each of the sample data classes to be placed in a clearly separated cluster in the low-dimensional representation matrix

A^{(N)}

. To achieve these properties, based on the available label information, we introduce the label matrix

Q \in R^{k \times I_{N}}

(where k is the number of data classes) as follows [14]:

Q_{i j} = \{\begin{matrix} 1 & if sample data X_{j} is labeled and belongs to i - th category, \\ 0 & otherwise . \end{matrix}

For example, consider the case of

I_{N} = 10

sample data, out of which

q = 7

are labeled with the categories

l_{1} = 1

,

l_{2} = 3

,

l_{3} = 1

,

l_{4} = 2

,

l_{5} = 2

,

l_{6} = 1

, and

l_{7} = 4

; then, the label matrix

Q

would be defined as:

Q = [\begin{matrix} 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \end{matrix}] .

Based on the introduced label matrix

Q

, we assume that there are l labeled sample data. Without loss of generality, if the first l sample data are labeled, then the label-discriminative constraint term can be introduced as follows:

R = {∥\begin{matrix} Q - {BA}_{l}^{(N)}^{⊤} \end{matrix}∥}_{F}^{2},

where

A_{l}^{(N)} = {[a_{1}, \dots, a_{l}, 0, \dots, 0]}^{⊤} \in R^{I_{N} \times J_{N}}

, and matrix

B

transforms and scales the vectors in the part-based low-dimensional representation to obtain the best fit for matrix

Q

. Matrix

B

is allowed to take negative values; therefore, combining the label-discriminative constraint term and the objective function (1) of NTD, the DNTD is obtained by minimizing the following objective function:

\begin{matrix} min_{G, A^{(1)}, \dots, A^{(N)}, B} O_{2} = & {∥\begin{matrix} X - G \times_{1} A^{(1)} \times_{2} A^{(2)} \dots \times_{N} A^{(N)} \end{matrix}∥}^{2} + λ {∥\begin{matrix} Q - {BA}_{l}^{(N)}^{⊤} \end{matrix}∥}_{F}^{2} \\ s . t . G ⩾ 0, A^{(r)} ⩾ 0, r = 1, 2, \dots, N, \end{matrix}

(2)

where

λ

is a nonnegative parameter for balancing the regularization term. Equivalently, Equation (2) can be rewritten in matrix form:

\begin{matrix} min_{G_{(n)}, A^{(1)}, \dots, A^{(N)}, B} & O_{2} = {∥\begin{matrix} X_{(n)} - A^{(n)} G_{(n)} {(\otimes_{p \neq n} A^{(p)})}^{⊤} \end{matrix}∥}_{F}^{2} + λ {∥\begin{matrix} Q - {BA}_{l}^{(N)}^{⊤} \end{matrix}∥}_{F}^{2} \\ s . t . G_{(n)} ⩾ 0, A^{(r)} ⩾ 0, r = 1, 2, \dots, N, \end{matrix}

(3)

where, and in the following,

\otimes_{p \neq n} A^{(p)} = A^{(N)} \otimes \dots \otimes A^{(n + 1)} \otimes A^{(n - 1)} \otimes \dots \otimes A^{(1)}

. n is one of numbers of set

{1, 2, \dots, N}

.

By using Lagrange multipliers and considering (3), we change the objective function in (2) into the following Lagrange function:

\begin{matrix} L = {∥\begin{matrix} X_{(n)} - A^{(n)} G_{(n)} {(\otimes_{p \neq n} A^{(p)})}^{⊤} \end{matrix}∥}_{F}^{2} + λ {∥\begin{matrix} Q - {BA}_{l}^{(N)}^{⊤} \end{matrix}∥}_{F}^{2} + T r (Φ_{n} G_{(n)}^{T}) + \sum_{r = 1}^{N} T r (Ψ_{r} {A^{(r)}}^{⊤}), \end{matrix}

(4)

where

Φ_{n}

and

Ψ_{r}

are the Lagrange multipliers matrices of

G_{(n)}

and

A^{(r)}

, respectively, and

T r

(·) denotes the trace of a matrix. The function (4) can be rewritten as

\begin{matrix} L = & T r (X_{(n)} X_{(n)}^{⊤}) - 2 T r (X_{(n)} (\otimes_{p \neq n} A^{(p)}) G_{(n)}^{⊤} {A^{(n)}}^{⊤}) + T r (A^{(n)} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{T} {A^{(n)}}^{⊤}) \\ + λ T r ({QQ}^{⊤}) - 2 λ T r ({QA}_{l}^{(N)} B^{⊤}) + λ T r ({BA}_{l}^{(N)}^{⊤} A_{l}^{(N)} B^{⊤}) + T r (Φ_{n} G_{(n)}^{⊤}) + \sum_{r = 1}^{N} T r (Ψ_{r} {A^{(r)}}^{⊤}) . \end{matrix}

(5)

Obviously, the objective function in (2) is not convex in all the variables together. Thus, it is very difficult to find the global optimal solution. In the following, we develop an iterative updating algorithm, which updates one of core tensors, factor matrices, and

B

each time while fixing the others to obtain the local minimum.

3.1. Updating Rules

3.1.1. Solutions of Factor Matrices $A^{(n)}$ , $(n = 1, 2, \dots, N - 1)$

The partial derivatives of

L

in (5) with respect to

A^{(n)}

,

(n = 1, 2, \dots, N - 1)

, are

\frac{\partial L}{\partial A^{(n)}} = - 2 X_{(n)} (\otimes_{p \neq n} A^{(p)}) G_{(n)}^{⊤} + 2 A^{(n)} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤} + Ψ_{n} .

By using the Karush–Kuhn–Tucker (KKT) condition, i.e.,

\partial L / \partial A^{(n)} = 0

and

A^{(n)} ⊙ Ψ_{n} = 0

, where, and in the following, ⊙ denotes the Hadamard product, according to

\partial L / \partial A^{(n)} = 0

, we obtain

Ψ_{n} = 2 X_{(n)} (\otimes_{p \neq n} A^{(p)}) G_{(n)}^{⊤} - 2 A^{(n)} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤} .

By calculating

A^{(n)} ⊙ Ψ_{n} = A^{(n)} ⊙ (2 X_{(n)} (\otimes_{p \neq n} A^{(p)}) G_{(n)}^{⊤}) - A^{(n)} ⊙ (2 A^{(n)} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}),

which together with

A^{(n)} ⊙ Ψ_{n} = 0

yields the following updating rule for

A^{(n)} (n = 1, 2, \dots, N - 1)

:

A_{i j}^{(n)} \leftarrow A_{i j}^{(n)} \frac{{[X_{(n)} (\otimes_{p \neq n} A^{(p)}) G_{(n)}^{⊤}]}_{i j}}{{[A^{(n)} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{i j}} .

(6)

3.1.2. Solutions of Factor Matrices $A^{(N)}$

The partial derivative of

L

in (5) with respect to

A^{(N)}

is

\begin{matrix} \frac{\partial L}{\partial A^{(N)}} = & - 2 X_{(N)} (\otimes_{p \neq N} A^{(p)}) G_{(N)}^{⊤} + 2 A^{(N)} G_{(N)} (\otimes_{p \neq N} {A^{(p)}}^{⊤} A^{(p)}) G_{(N)}^{⊤} - 2 λ Q^{⊤} B \\ + 2 λ A_{l}^{(N)} B^{⊤} B + Ψ_{N} . \end{matrix}

Similarly, we consider the KKT condition

\partial L / \partial A^{(N)} = 0

and

A^{(N)} ⊙ Ψ_{N} = 0

. As a result, we obtain the following updating rule for

A^{(N)}

:

A_{i j}^{(N)} \leftarrow A_{i j}^{(N)} \frac{{[X_{(N)} (\otimes_{p \neq N} A^{(p)}) G_{(N)}^{⊤} + λ {(Q^{⊤} B)}^{+} + λ {(A_{l}^{(N)} B^{⊤} B)}^{-}]}_{i j}}{{[A^{(N)} G_{(N)} (\otimes_{p \neq N} {A^{(p)}}^{⊤} A^{(p)}) G_{(N)}^{⊤} + λ {(Q^{⊤} B)}^{-} + λ {(A_{l}^{(N)} B^{⊤} B)}^{+}]}_{i j}},

(7)

where, and in the following, for a matrix

W = (W_{i j})

, we let

∣ W ∣ = (∣ W_{i j} ∣)

and define

W^{+} = (∣ W ∣ + W) / 2

and

W^{-} = (∣ W ∣ - W) / 2

, so

W = W^{+} - W^{-}

.

3.1.3. Solutions of Core Tensor $G$

The objective function in (4) can be changed into

L = {∥\begin{matrix} v e c (X) - F v e c (G) \end{matrix}∥}_{2}^{2} + λ {∥\begin{matrix} Q - {BA}_{l}^{(N)}^{⊤} \end{matrix}∥}_{F}^{2} + v e c {(G)}^{T} v e c (Φ) + \sum_{r = 1}^{N} T r (Ψ_{r} {A^{(r)}}^{⊤}),

(8)

where, and in the following,

F = A^{(N)} \otimes A^{(N - 1)} \otimes \dots \otimes A^{(1)} \in R^{I_{1} I_{2} \dots I_{N} \times J_{1} J_{2} \dots J_{N}}

,

v e c (Φ)

represents the Lagrange multiplier of

v e c (G)

. The partial derivative of

L

in (8) with respect to

v e c (G)

is

\frac{\partial L}{\partial v e c (G)} = 2 F^{⊤} F v e c (G) - 2 F^{⊤} v e c (X) + v e c (Φ) .

Similarly, by applying the KKT condition

\partial L / \partial v e c (G) = 0

and

{(v e c (G))}_{i} {(v e c (Φ))}_{i} = 0

, we obtain the following updating rule:

{(v e c (G))}_{i} \leftarrow {(v e c (G))}_{i} \frac{{(F^{⊤} v e c (X))}_{i}}{{(F^{⊤} F v e c (G))}_{i}} .

(9)

3.1.4. Solutions of Matrix $B$

The partial derivative of

L

in (5) with respect to

B

is

\frac{\partial L}{\partial B} = - 2 λ {QA}_{l}^{(N)} + 2 λ B {A_{l}^{(N)}}^{⊤} A_{l}^{(N)} .

For

B

, since there is no Lagrange multiplier, directly, we derive by setting

\partial L / \partial B = 0

and solve

B

to obtain the following updating rule:

B_{i j} \leftarrow {[{QA}_{l}^{(N)} {({A_{l}^{(N)}}^{⊤} A_{l}^{(N)})}^{- 1}]}_{i j} .

(10)

Theorem 1.

The objective function in (2) is nonincreasing under the updating rules in (6), (7), (9), and (10). The objective function is invariant under these updates if and only if

A^{(r)}

,

r = 1, 2, \dots, N

,

G

, and

B

are at a stationary point.

3.2. Proof of Convergence

To prove Theorem 1, we first give a definition and several lemmas.

Definition 1

([5]).

G (u, u^{^{'}})

is an auxiliary function for

F (u)

if the conditions

G (u, u^{^{'}}) \geq F (u)

and

G (u, u) = F (u)

are satisfied.

Lemma 1

([5]). If

G (u, u^{^{'}})

is an auxiliary function for

F (u)

, then

F (u)

is nonincreasing under the updating rule

u^{t + 1} = a r g \underset{u}{m i n} G (u, u^{t}) .

(11)

Proof.

F(

u^{t + 1}

) ≤ G(

u^{t + 1}

,

u^{t}

) ≤ G(

u^{t}

,

u^{t}

) ≤ F(

u^{t}

). □

The equality

F (u^{t + 1}) = F (u^{t})

holds only if

u^{t}

is a local minimum of

G (u, u^{t})

. By iterating the update rule (11),

u^{t}

converges to the local minimum of

G (u, u^{t})

.

For any element

A_{i j}^{(n)}

in

A^{(n)} (n = 1, 2, \dots, N - 1)

, let

F_{i j} (A_{i j}^{(n)})

, denote the part of the objective function

O_{2}

in (3) relevant to

A_{i j}^{(n)}

. Essentially, the updating rule is element wise, so it is sufficient to prove each

F_{i j} (A_{i j}^{(n)})

is nonincreasing under the update rule. The first derivative of the

F_{i j} (A_{i j}^{(n)})

with respect to

A_{i j}^{(n)}

is

F_{i j}^{^{'}} (A_{i j}^{(n)}) = {[- 2 X_{(n)} (\otimes_{p \neq n} A^{(p)}) G_{(n)}^{⊤} + 2 A^{(n)} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{i j} .

Now, we have

Lemma 2.

The function

\begin{matrix} G (A_{i j}^{(n)}, {A_{i j}^{(n)}}^{t}) = & F_{i j} ({A_{i j}^{(n)}}^{t}) + F_{i j}^{^{'}} ({A_{i j}^{(n)}}^{t}) (A_{i j}^{(n)} - {A_{i j}^{(n)}}^{t}) \\ + \frac{{[{A^{(n)}}^{t} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{i j}}{{A_{i j}^{(n)}}^{t}} {(A_{i j}^{(n)} - {A_{i j}^{(n)}}^{t})}^{2} \end{matrix}

(12)

is an auxiliary function for

F_{i j} (A_{i j}^{(n)})

, where the matrix

{A^{(n)}}^{t} = ({A_{i j}^{(n)}}^{t})

.

Proof.

Obviously,

G ({A_{i j}^{(n)}}^{t}, {A_{i j}^{(n)}}^{t}) = F_{i j} ({A_{i j}^{(n)}}^{t})

. According to Definition 1, we only need to show that

G (A_{i j}^{(n)}, {A_{i j}^{(n)}}^{t})

≥

F_{i j} (A_{i j}^{(n)})

. The Taylor series expansion of

F_{i j} (A_{i j}^{(n)})

at

{A_{i j}^{(n)}}^{t}

is

F_{i j} (A_{i j}^{(n)}) = F_{i j} ({A_{i j}^{(n)}}^{t}) + F_{i j}^{^{'}} ({A_{i j}^{(n)}}^{t}) (A_{i j}^{(n)} - {A_{i j}^{(n)}}^{t}) + \frac{1}{2} F_{i j}^{^{″}} ({A_{i j}^{(n)}}^{t}) {(A_{i j}^{(n)} - {A_{i j}^{(n)}}^{t})}^{2},

(13)

where

F_{i j}^{^{″}} ({A_{i j}^{(n)}}^{t}) = {[2 G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{j j}

is the second-order derivative of the

F_{i j} (A_{i j}^{(n)})

relevant to

{A_{i j}^{(n)}}^{t}

. Comparing (12) with (13), we can see that

G (A_{i j}^{(n)}, {A_{i j}^{(n)}}^{t})

≥

F_{i j} (A_{i j}^{(n)})

is equivalent to

\frac{{[{A^{(n)}}^{t} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{i j}}{{A_{i j}^{(n)}}^{t}} \geq {[G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{j j} .

To prove this inequality, we have

\begin{matrix} {[{A^{(n)}}^{t} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{i j} & = \sum_{l = 1}^{J_{n}} {A_{i l}^{(n)}}^{t} {[G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{l j} \\ \geq {A_{i j}^{(n)}}^{t} {[G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{j j}, /; \end{matrix}

therefore, the inequality

G (A_{i j}^{(n)}, {A_{i j}^{(n)}}^{t})

≥

F_{i j} (A_{i j}^{(n)})

holds. □

Similarly, let

F_{i j} (A_{i j}^{(N)})

denote the part of the objective function

O_{2}

in (3) relevant to

A_{i j}^{(N)}

in

A^{(N)}

. Then

Lemma 3.

The function

\begin{matrix} G (A_{i j}^{(N)}, {A_{i j}^{(N)}}^{t}) = & F_{i j} ({A_{i j}^{(N)}}^{t}) + F_{i j}^{^{'}} ({A_{i j}^{(N)}}^{t}) (A_{i j}^{(N)} - {A_{i j}^{(N)}}^{t}) \\ + \frac{{[{A^{(N)}}^{t} G_{(N)} (\otimes_{p \neq N} {A^{(p)}}^{⊤} A^{(p)}) G_{(N)}^{⊤} + λ {(Q^{⊤} B)}^{-} + λ {({A_{l}^{(N)}}^{t} B^{⊤} B)}^{+}]}_{i j}}{{A_{i j}^{(N)}}^{t}} \\ \times {(A_{i j}^{(N)} - {A_{i j}^{(N)}}^{t})}^{2} \end{matrix}

(14)

is an auxiliary function for

F_{i j} (A_{i j}^{(N)})

.

Lemma 4.

Let

g_{i}

denote the element of

v e c (G)

and

F_{i} (g_{i})

denote the part of the objective function

O_{2}

in (3) relevant to

g_{i}

. The function

G (g_{i}, g_{i}^{t}) = F_{i} (g_{i}^{t}) + F_{i}^{^{'}} (g_{i}^{t}) (g_{i} - g_{i}^{t}) + \frac{{(F^{⊤} F v e c (G^{t}))}_{i}}{g_{i}^{t}} {(g_{i} - g_{i}^{t})}^{2}

(15)

is an auxiliary function for

F_{i} (g_{i})

.

Since the proofs of Lemmas 3 and 4 are essentially similar to the proof of Lemma 2, they are omitted here.

Proof

(Proof of Theorem 1). Replacing

G (u, u^{t})

in (11) by (12), the minimum is obtained by setting

\frac{\partial G (A_{i j}^{(n)}, {A_{i j}^{(n)}}^{t})}{\partial A_{i j}^{(n)}} = F_{i j}^{^{'}} ({A_{i j}^{(n)}}^{t}) + 2 \frac{{[{A^{(n)}}^{t} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{i j}}{{A_{i j}^{(n)}}^{t}} (A_{i j}^{(n)} - {A_{i j}^{(n)}}^{t}) = 0,

which yields

{A_{i j}^{(n)}}^{t + 1} = {A_{i j}^{(n)}}^{t} \frac{{[X_{(n)} (\otimes_{p \neq n} A^{(p)}) G_{(n)}^{⊤}]}_{i j}}{{[{A^{(n)}}^{t} G_{(n)} (\otimes_{p \neq n} {A^{(p)}}^{⊤} A^{(p)}) G_{(n)}^{⊤}]}_{i j}} .

According to Lemma 2,

F_{i j} (A_{i j}^{(n)})

is nonincreasing under the updating rules (6) for

A^{(n)}

.

Similarly, putting

G (A_{i j}^{(N)}, {A_{i j}^{(N)}}^{t})

of (14) into (11), we obtain

{A_{i j}^{(N)}}^{t + 1} = {A_{i j}^{(N)}}^{t} \frac{{[X_{(N)} (\otimes_{p \neq N} A^{(p)}) G_{(N)}^{⊤} + λ {(Q^{⊤} B)}^{+} + λ {({A_{l}^{(N)}}^{t} B^{⊤} B)}^{-}]}_{i j}}{{[{A^{(N)}}^{t} G_{(N)} (\otimes_{p \neq N} {A^{(p)}}^{⊤} A^{(p)}) G_{(N)}^{⊤} + λ {(Q^{⊤} B)}^{-} + λ {({A_{l}^{(N)}}^{t} B^{⊤} B)}^{+}]}_{i j}} .

It is noted that the above equation is the same as (7). By Lemma 3, (14) is an auxiliary function of

F_{i j} (A_{i j}^{(N)})

, which means that

F_{i j} (A_{i j}^{(N)})

is nonincreasing under the updating rules (7) for

A^{(N)}

.

Furthermore, putting

G (g_{i}, g_{i}^{t})

of (15) into (11), we obtain

g_{i}^{t + 1} = g_{i}^{t} \frac{{(F^{⊤} v e c (X))}_{i}}{{(F^{⊤} F v e c (G^{t}))}_{i}} .

Use the same method as the proof that

F_{i j} (A_{i j}^{(N)})

is nonincreasing under the update rules (7), we obtain

F_{i} (g_{i})

is nonincreasing under the updating rule (9).

Finally, we prove the convergence of the updating rule in (10) for

B

. Since the updating rule is derived by setting the derivative of Lagrangian with respect to

B

to 0 and

B

is unconstrained, it follows from the convexity of the objective function (2), which is equivalent to minimizing the objective function relevant to

B

in each iteration. Therefore, Theorem 1 is true. □

4. Experiments Section

In this Section, we show the use of our proposed DNTD scheme and two metrics, namely accuracy (AC) and normalized mutual information (NMI) [29], which are used to evaluate the performance for clustering on three datasets.

4.1. Datasets

To evaluate the effectiveness of the proposed DNTD method, three datasets are adopted to perform experiments in the following subsection. The descriptions of these datasets are given as follows:

4.1.1. ORL Dataset

The ORL (https://github.com/saeid436/Face-Recognition-MLP/tree/main/ORL, accessed on 11 December 2022) dataset collects 400 grayscale 112 × 92 faces images, which consist of 40 different subjects with 10 distinct images. For some subjects, these images were taken at different times, varying the lightning, facial expressions, and facial details. All images were taken against a dark homogeneous background, with subjects in an upright, frontal position. For each image, we resized it to be 32 × 32 pixels in our experiment. These images constructed a tensor

X \in R^{32 \times 32 \times 400}

.

4.1.2. COIL20 Dataset

The COIL20 (http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html, accessed on 5 April 2022) dataset contains a total of 1440 grayscale images of 20 subjects, each of which has 72 images taken from diverse poses. Specifically, the subjects were placed on a motorized turntable rotating through 360 degrees to change subject pose with respect to a fixed camera, and the images of each subject were taken 5 degrees apart. In this experiment, each image was resized to 32 × 32 pixels and all images were stacked into a tensor

X \in R^{32 \times 32 \times 1440}

.

4.1.3. Yale Dataset

The Yale (http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html, accessed on 5 April 2022) dataset consists of 165 images of 15 individuals. Each individual includes 11 images, which were taken in different facial expression or configurations, such as sad, sleepy, surprised, left-light, right-light, wearing glasses, no glasses, and so on, and each image was resized into 32 × 32 pixels. All images formed a tensor

X \in R^{32 \times 32 \times 165}

.

4.2. Compared Algorithms and Experimental Setting

In order to verify the proposed algorithm is efficient and can enhance the clustering performance on these datasets, we compared it with the following algorithms:

K-means [30]: K-means algorithm aims to divide n sample data with m dimensions to K clusters so that the within-cluster sum of squares is minimized. It is a traditional clustering method. In our experiment, we use the command “kmeans” in Matlab 2016b to execute K-means algorithm;
Agglomerative hierarchical clustering (AHC) [31]: It is a bottom-up clustering method. It treats single-sample data as a class and then merges these sample data so that the number of classes gradually reduce, until finally they cluster to the required number of classes;
NMF [5]: NMF algorithm is one of the typical clustering algorithms. In our experiment, we adopted the Frobenius-norm formulation;
NTD [19]: NTD algorithm is considered a generalization of NMF;
DNMF [14]: DNMF algorithm incorporates the label information of sample data into the objective function of NMF and enforces the samples with the same label aligned on the same axis.

In each experiment, we randomly selected k categories as the evaluated data. Each experiment was operated 10 times and we applied K-means 10 times in the low-dimensional representation of each time. We randomly selected 30% sample data of each category as the labeled data in semi-supervised methods, such as DNMF and DNTD. For all compared methods, we set the dimension of the encoding matrix in the low-dimensional representation space as the number of the selected categories of each time. For NTD and the proposed method, we tested the clustering performance of NTD method and the proposed method when the size of core tensor is

\frac{1}{4} I_{1} \times \frac{1}{4} I_{2} \times k

,

\frac{1}{2} I_{1} \times \frac{1}{2} I_{2} \times k

, and

\frac{3}{4} I_{1} \times \frac{3}{4} I_{2} \times k

on subdatasets of three datasets. As a result, the clustering performance is better in most cases when the core tensor size is

\frac{1}{2} I_{1} \times \frac{1}{2} I_{2} \times k

on COIL20 dataset,

\frac{3}{4} I_{1} \times \frac{3}{4} I_{2} \times k

on ORL dataset, and

\frac{3}{4} I_{1} \times \frac{3}{4} I_{2} \times k

on Yale dataset; so, the sizes of the core tensors of NTD method and the proposed DNTD method are set as

\frac{1}{2} I_{1} \times \frac{1}{2} I_{2} \times k

on COIL20 dataset and

\frac{3}{4} I_{1} \times \frac{3}{4} I_{2} \times k

on others, respectively. Regarding the regularization parameter of DNMF and the proposed method, empirically, we set

λ = 10^{- 1}

on COIL20 dataset and

λ = 10^{6}

on others.

4.3. Clustering Results

Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 represent the average clustering performance and standard deviation on the ORL dataset, COIL20 dataset, and Yale dataset, respectively. Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 show that the proposed algorithm achieves the better performance for clustering on three datasets. On ORL dataset, the proposed algorithm achieves

14.41 %

,

32.17 %

,

8.86 %

,

6.88 %

, and

4.81 %

improvements in AC and

12.09 %

,

27.50 %

,

8.33 %

,

5.60 %

, and

4.22 %

in NMI by comparing with K-means, AHC, NMF, NTD, and DNMF, respectively. For COIL20 dataset, the proposed algorithm attains

7.32 %

,

8.47 %

,

4.02 %

, and

6.11 %

improvements corresponding to AC and

5.47 %

,

10.43 %

,

4.35 %

, and

8.10 %

improvements corresponding to NMI in comparison with K-means, NMF, NTD, and DNMF, respectively. Compared with AHC algorithm, AC of the proposed algorithm is improved by 1.59%; however, NMI is reduced by 2.54%. On Yale dataset, the proposed algorithm increases

6.53 %

,

26.86 %

,

4.55 %

,

4.66 %

, and

1.29 %

in AC and

6.40 %

,

29.64 %

,

5.30 %

,

5.56 %

, and

1.27 %

in NMI in contrast to K-means, AHC, NMF, NTD, and DNMF, respectively. Furthermore, we also applied AHC algorithm in the new representation generated by DNTD while other parameter settings are kept unaltered, and the average of clustering performance is 62.58%, 69.93% on ORL dataset, 74.32%, and 73.33% on COIL20 dataset, and 26.41% and 21.00% on Yale dataset with respect to AC and NMI, respectively. Comprehensively, the clustering performance of DNTD with AHC algorithm is lower than that with K-means algorithm in most cases; therefore, we utilized K-means algorithm together with the proposed DNTD method to test the clustering performance.

As we can see, the semi-supervised DNTD and DNMF methods are superior to the unsupervised methods, such as NTD, NMF, K-means, and AHC on the ORL and Yale datasets, which means that taking the label information of sample data into account is useful for improving the performances of NTD and NMF. Particularly, the improvements in the clustering performance of the DNTD method is obvious and attractive on the ORL dataset. Moreover, the changes between NTD and DNTD in AC and NMI exceed that between NMF and DNMF on all datasets, which indicates that incorporating the label-discriminative constraint term into NTD is more effective than the combination of the label-discriminative term and NMF. Furthermore, the DNTD outdoes all the compared methods.The DNTD is more powerful when boosting the discriminative ability by the available label information of sample data than the DNMF and is capable of effectively extracting the low-dimensional representation of the data tensor. All in all, the proposed DNTD method owns the best average clustering performance among the compared methods thanks to the label-discriminative information of sample data in conjunction with the NTD of data tensor representation.

Our proposed method is a semi-supervised algorithm, so there exists a close relationship between the clustering performance and the number of labeled sample data. Figure 1 represents the clustering performance with a varied number of labeled sample data on the ORL dataset and Yale dataset. From Figure 1, we can discover that the clustering performance of the DNTD method can be enhanced with the increase in the number of labeled sample data; furthermore, the DNTD method surpasses the other compared methods as well.

In addition, we also observed that the clustering performance of the proposed method is better on the ORL and COIL20 datasets than on the Yale dataset. In our view, there exist two main factors lowering the performance of the proposed method on the Yale dataset. Firstly, some of the images in the Yale dataset are taken in poor light, which leads to the black and blurred backgrounds in these images. Secondly, some individuals are wearing sunglasses, that is, the faces of these individuals are partly shielded by the glass. These two factors result in bad image quality, which directly leads the obtained data to lose many true values, while also simultaneously generating many incorrect values. So, the proposed algorithm is largely interfered with, so that the performance is lowered.

4.4. Parameter Selection

In our experiment, there exists one parameter

λ

to be decided. The parameter

λ

measures the degree of the discrimination of the label information of the sample data. In this subsection, we study the importance of the parameter

λ

in the DNTD method by clustering performance on the above three datasets. The parameter is selected from the set {

10^{- 2}

,

10^{- 1}

,

10^{0}

,

10^{1}

,

10^{2}

,

10^{3}

,

10^{4}

,

10^{5}

,

10^{6}

,

10^{7}

}. On each dataset, we fix the number of categories as 10 for simplicity. Firstly, we randomly select 10 categories to perform the experiment and apply K-means 10 times in the low-dimensional representation; then, we repeat the above operations 10 times and calculate the average. The results with the effect of the parameter are displayed in Figure 2. From Figure 2, we discover the DNTD scheme displays a good stable performance when

λ

is ranging from

10^{- 2}

to

10^{5}

on all datasets; that is, it is not sensitive to the parameter

λ

, which helps to promote the stability of DNTD. Furthermore, we observe the DNTD scheme gains the best performance when

λ = 10^{6}

on the ORL and Yale dataset, empirically; therefore,

λ

is selected to be

10^{6}

on the ORL and Yale dataset, and

10^{- 1}

on the COIL20 dataset.

5. Conclusions and Future Work

In this paper, we have introduced a label constraint nonnegative Tucker decomposition method for tensor data representation, called discriminative nonnegative Tucker decomposition, or DNTD for short, which specifically takes the label information of the sample data into account by constructing a label matrix and further constructing a label-discriminative constraint term. Moreover, an updating rule and the proof of its convergence have been provided. As a result, numerical experiments on three datasets have demonstrated the effectiveness of the proposed method for clustering performance.

Although the proposed DNTD method has shown good clustering performance, it still has limitations from a geometric perspective. In the future, to preserve the geometric structures of sample data on manifold, an investigation will be conducted. Furthermore, as an important clustering algorithm, NTD does not completely show an excellent and robust performance when dealing with some outliers. In the future, generating a robust and efficient learning method will be considerable work.

Author Contributions

Conceptualization, W.J. and L.L.; methodology, L.L. and Q.L.; software W.J. and Q.L.; writing—original draft preparation W.J., L.L. and Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the National Natural Science Foundation of China under Grant 12161020, 12061025, and partially funded by the Natural Science Foundation of Educational Commission of Guizhou Province under Grant Qian-Jiao-He KY Zi [2021]298.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NTD	Nonnegative Tucker decomposition
DNTD	Discriminative NTD
NMF	Nonnegative matrix factorization
GNMF	Graph-regularized NMF
CNMF	Constrained NMF
EEG	Multichannel electroencephalography
GNTD	Graph-regularized NTD
ONTD	Orthogonal NTD
GDNMF	Graph-based Discriminative NMF
GSNMFC	Graph-regularized and sparse NMF with hard constraints
SRDNMF	Semi-supervised robust distribution-based NMF
SNTD	Semi-supervised NTD
KKT	Karush–Kuhn–Tucker
AC	Accuracy
NMI	Normalized mutual information

References

Cai, B.; Lu, G.F. Tensor subspace clustering using consensus tensor low-rank representation. Inform. Sci. 2022, 609, 46–59. [Google Scholar] [CrossRef]
Wang, X.; Yang, L.T.; Kuang, L.; Liu, X.; Zhang, Q.; Deen, M.J. A tensor-based big-data-driven routing recommendation approach for heterogeneous networks. IEEE Netw. 2019, 33, 64–69. [Google Scholar] [CrossRef]
Long, Z.; Liu, Y.; Chen, L.; Zhu, C. Low rank tensor completion for multiway visual data. Signal Process. 2019, 155, 301–316. [Google Scholar] [CrossRef]
Bernardi, A.; Carlini, E.; Catalisano, M.V.; Gimigliano, A.; Oneto, A. The hitchhiker guide to: Secant varieties and tensor decomposition. Mathematics 2018, 6, 314. [Google Scholar] [CrossRef]
Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 2000, 13, 556–562. [Google Scholar]
Li, X.; Wang, L.; Cheng, Q.; Wu, P.; Gan, W.; Fang, L. Cloud removal in remote sensing images using nonnegative matrix factorization and error correction. ISPRS J. Photogramm. Remote Sens. 2019, 148, 103–113. [Google Scholar] [CrossRef]
Ye, F.; Chen, C.; Zheng, Z. Deep autoencoder-like nonnegative matrix factorization for community detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 1393–1402. [Google Scholar]
Yang, Z.; Liang, N.; Yan, W.; Li, Z.; Xie, S. Uniform distribution non-negative matrix factorization for multiview clustering. IEEE Trans. Cybern. 2020, 51, 3249–3262. [Google Scholar] [CrossRef]
Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 1548–1560. [Google Scholar]
Yu, N.; Wu, M.J.; Liu, J.X.; Zheng, C.H.; Xu, Y. Correntropy-based hypergraph regularized NMF for clustering and feature selection on multi-cancer integrated data. IEEE Trans. Cybern. 2020, 51, 3952–3963. [Google Scholar] [CrossRef]
Chong, Y.; Ding, Y.; Yan, Q.; Pan, S. Graph-based semi-supervised learning: A review. Neurocomputing 2020, 408, 216–230. [Google Scholar] [CrossRef]
Song, Z.; Yang, X.; Xu, Z.; King, I. Graph-based semi-supervised learning: A comprehensive review. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–21. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Wu, Z.; Li, X.; Cai, D.; Huang, T.S. Constrained nonnegative matrix factorization for image representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1299–1311. [Google Scholar] [CrossRef] [PubMed]
Babaee, M.; Tsoukalas, S.; Babaee, M.; Rigoll, G.; Datcu, M. Discriminative nonnegative matrix factorization for dimensionality reduction. Neurocomputing 2016, 173, 212–223. [Google Scholar] [CrossRef]
Cichocki, A.; Mandic, D.; De Lathauwer, L.; Zhou, G.; Zhao, Q.; Caiafa, C.; Phan, H.A. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Process. Mag. 2015, 32, 145–163. [Google Scholar] [CrossRef]
Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Zaorálek, L.; Prílepok, M.; Snáel, V. Recognition of face images with noise based on Tucker decomposition. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, China, 9–12 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 2649–2653. [Google Scholar]
Yang, L.; Zhou, J.; Jing, J.; Wei, L.; Li, Y.; He, X.; Feng, L.; Nie, B. Compression of hyperspectral images based on Tucker decomposition and CP decomposition. J. Opt. Soc. Am. A 2022, 39, 1815–1822. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.D.; Choi, S. Nonnegative Tucker decomposition. In Proceedings of the 2007 IEEE conference on computer vision and pattern recognition, Minneapolis, MN, USA, 17–22 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–8. [Google Scholar]
Qiu, Y.; Zhou, G.; Wang, Y.; Zhang, Y.; Xie, S. A generalized graph regularized non-negative Tucker decomposition framework for tensor data representation. IEEE Trans. Cybern. 2020, 594–607, 594–607. [Google Scholar] [CrossRef]
Marmoret, A.; Cohen, J.E.; Bertin, N.; Bimbot, F. Uncovering audio patterns in music with nonnegative Tucker decomposition for structural segmentation. arXiv 2021, arXiv:2104.08580. [Google Scholar]
Cohen, J.E.; Comon, P.; Gillis, N. Some theory on non-negative Tucker decomposition. In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation, Grenoble, France, 21–23 February 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 152–161. [Google Scholar]
Qiu, Y.; Zhou, G.; Zhang, Y.; Xie, S. Graph regularized nonnegative Tucker decomposition for tensor data representation. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8613–8617. [Google Scholar]
Pan, J.; Ng, M.K.; Liu, Y.; Zhang, X.; Yan, H. Orthogonal nonnegative Tucker decomposition. SIAM J. Sci. Comput. 2021, 43, B55–B81. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Shi, G.; Liu, J. Graph-based discriminative nonnegative matrix factorization with label information. Neurocomputing 2017, 266, 91–100. [Google Scholar] [CrossRef]
Sun, F.; Xu, M.; Hu, X.; Jiang, X. Graph regularized and sparse nonnegative matrix factorization with hard constraints for data representation. Neurocomputing 2016, 173, 233–244. [Google Scholar] [CrossRef]
Peng, X.; Xu, D.; Chen, D. Robust distribution-based nonnegative matrix factorizations for dimensionality reduction. Inform. Sci. 2021, 552, 244–260. [Google Scholar] [CrossRef]
Qiu, Y.; Zhou, G.; Chen, X.; Zhang, D.; Zhao, X.; Zhao, Q. Semi-supervised non-negative Tucker decomposition for tensor data representation. Sci. China Tech. Sci. 2021, 64, 1881–1892. [Google Scholar] [CrossRef]
Xu, W.; Liu, X.; Gong, Y. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada, 28 July–1 August 2003; pp. 267–273. [Google Scholar]
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of california press: Oakland, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
Sasirekha, K.; Baby, P. Agglomerative hierarchical clustering algorithm-a review. Int. J. Sci. Res. Publ. 2013, 83, 83. [Google Scholar]

Figure 1. The clustering performance with varied labeled sample data on ORL and Yale dataset.

Figure 2. The clustering performance of DNTD method in term of AC and NMI with varied parameter

λ

on ORL, COIL20 and Yale dataset.

Figure 2. The clustering performance of DNTD method in term of AC and NMI with varied parameter

λ

on ORL, COIL20 and Yale dataset.

Table 1. AC (%) ± standard deviation (%) of different algorithms on ORL dataset.

k	K-Means	AHC	NMF	NTD	DNMF	DNTD
3	80.43 ± 14.65	74.67 ± 16.63	77.93 ± 16.06	82.70 ± 14.32	80.97 ± 18.22	85.43 ± 14.20
5	70.80 ± 11.83	58.60 ± 17.20	71.56 ± 12.13	77.22 ± 12.99	72.94 ± 13.83	84.28 ± 12.25
7	65.67 ± 9.84	49.43 ± 8.99	73.40 ± 9.89	72.23 ± 9.92	74.29 ± 10.01	76.61 ± 8.78
9	62.96 ± 9.01	46.11 ± 12.15	69.66 ± 8.83	71.23 ± 6.54	74.57 ± 8.48	75.23 ± 8.07
11	60.40 ± 7.48	36.27 ± 6.89	66.83 ± 5.95	70.90 ± 6.34	71.78 ± 7.49	74.89 ± 7.58
13	59.22 ± 7.62	36.62 ± 8.36	65.50 ± 7.18	68.16 ± 7.31	69.82 ± 7.34	77.46 ± 6.03
15	56.54 ± 7.20	37.27 ± 11.11	65.81 ± 7.66	64.81 ± 7.03	69.29 ± 7.08	74.47 ± 6.12
17	56.03 ± 5.18	36.06 ± 8.47	65.38 ± 6.33	65.05 ± 5.91	70.18 ± 6.12	75.02 ± 6.07
19	55.30 ± 4.91	32.47 ± 6.51	61.25 ± 4.90	62.86 ± 5.76	69.89 ± 5.25	73.67 ± 6.19
Avg.	63.04	45.28	68.59	70.57	72.64	77.45

Table 2. NMI (%) ± standard deviation (%) of different algorithms on ORL dataset.

k	K-Means	AHC	NMF	NTD	DNMF	DNTD
3	69.99 ± 18.07	69.55 ± 15.74	64.55 ± 20.21	71.89 ± 19.65	70.26 ± 25.55	77.17 ± 18.14
5	70.01 ± 11.60	65.40 ± 19.06	68.39 ± 12.40	77.80 ± 10.80	70.72 ± 12.80	82.51 ± 10.12
7	69.07 ± 9.54	58.85 ± 14.88	75.07 ± 7.48	75.77 ± 7.72	77.23 ± 8.39	78.44 ± 7.57
9	70.16 ± 8.18	54.92 ± 12.95	74.89 ± 7.16	75.83 ± 4.21	79.54 ± 5.37	78.97 ± 6.34
11	68.99 ± 6.85	47.84 ± 10.79	74.13 ± 4.15	78.09 ± 5.24	78.89 ± 5.63	81.25 ± 5.71
13	68.89 ± 6.65	46.77 ± 11.38	74.02 ± 5.68	76.91 ± 6.11	78.60 ± 5.58	84.32 ± 3.76
15	67.74 ± 5.76	49.23 ± 14.26	75.55 ± 4.98	75.16 ± 5.30	78.80 ± 4.12	82.76 ± 3.93
17	69.79 ± 4.33	48.58 ± 11.48	77.20 ± 4.38	76.01 ± 4.58	79.89 ± 4.22	84.77 ± 3.25
19	69.72 ± 3.41	44.46 ± 7.90	74.40 ± 3.44	75.24 ± 3.90	81.24 ± 3.85	82.95 ± 4.01
Avg.	69.37	53.96	73.13	75.86	77.24	81.46

Table 3. AC (%) ± standard deviation (%) of different algorithms on COIL20 dataset.

k	K-Means	AHC	NMF	NTD	DNMF	DNTD
2	95.65 ± 9.35	95.07 ± 14.87	91.71 ± 17.10	90.72 ± 14.39	86.62 ± 16.69	95.97 ± 5.27
3	75.53 ± 16.95	86.90 ± 21.87	72.86 ± 15.92	82.00 ± 14.51	79.03 ± 18.54	86.99 ± 13.63
4	71.69 ± 14.56	84.79 ± 17.01	74.70 ± 14.96	75.86 ± 12.90	74.87 ± 16.12	78.62 ± 15.95
5	71.44 ± 14.55	72.11 ± 23.98	69.23 ± 8.07	76.58 ± 12.98	70.61 ± 15.18	79.75 ± 16.22
6	65.70 ± 10.54	74.84 ± 20.19	63.49 ± 14.39	68.66 ± 12.78	76.55 ± 12.73	76.74 ± 16.02
7	66.12 ± 10.19	67.30 ± 21.15	65.48 ± 13.49	71.00 ± 9.88	62.48 ± 16.55	71.43 ± 11.18
8	64.60 ± 10.34	68.84 ± 16.13	66.78 ± 7.00	63.84 ± 12.77	69.30 ± 9.49	69.95 ± 8.33
9	58.87 ± 7.76	65.57 ± 20.06	56.18 ± 16.19	67.31 ± 6.69	59.81 ± 11.89	68.70 ± 8.16
Avg.	71.20	76.93	70.05	74.50	72.41	78.52

Table 4. NMI (%) ± standard deviation (%) of different algorithms on COIL20 dataset.

k	K-Means	AHC	NMF	NTD	DNMF	DNTD
2	87.22 ± 25.19	90.07 ± 29.94	81.96 ± 36.52	73.55 ± 31.49	64.39 ± 39.56	84.54 ± 19.65
3	61.70 ± 26.15	81.82 ± 31.59	55.21 ± 24.28	70.08 ± 18.71	64.08 ± 27.52	77.23 ± 21.34
4	66.47 ± 15.24	84.09± 18.76	66.05 ± 19.26	72.07 ± 11.95	69.89 ± 18.95	73.27 ± 19.75
5	69.42 ± 15.20	71.54 ± 26.37	64.40 ± 11.41	73.30 ± 11.04	65.31 ± 16.61	76.51 ± 16.13
6	65.45 ± 9.44	76.57 ± 21.28	60.64 ± 16.38	69.88 ± 12.35	76.08 ± 11.88	74.67 ± 16.34
7	71.45 ± 8.48	71.97 ± 19.85	63.53 ± 16.29	71.13 ± 10.26	61.31 ± 20.84	71.19 ± 9.99
8	71.30 ± 9.04	75.48 ± 12.95	69.98 ± 6.48	66.41 ± 10.78	73.21 ± 7.49	73.74 ± 8.36
9	66.87 ± 5.75	72.47 ± 19.34	58.49 ± 19.09	72.45 ± 6.05	64.64 ± 13.05	72.51 ± 6.48
Avg.	69.99	78.00	65.03	71.11	67.36	75.46

Table 5. AC (%) ± standard deviation (%) of different algorithms on Yale dataset.

k	K-Means	AHC	NMF	NTD	DNMF	DNTD
5	54.07 ± 11.33	31.82 ± 5.56	55.45 ± 9.80	54.98 ± 9.20	57.15 ± 9.18	56.95 ± 11.15
6	47.12 ± 7.53	28.64 ± 3.82	49.61 ± 7.35	47.79 ± 8.98	55.39 ± 8.42	55.64 ± 8.03
7	45.77 ± 7.32	22.34 ± 3.29	48.61 ± 7.02	51.09 ± 7.12	53.17 ± 5.94	54.25 ± 7.61
8	45.89 ± 6.85	24.32 ± 2.06	46.47 ± 4.37	45.43 ± 6.59	52.00 ± 9.36	50.48 ± 5.35
9	42.07 ± 6.21	21.92 ± 3.15	44.54 ± 5.18	42.59 ± 5.66	48.90 ± 5.17	49.91 ± 5.84
10	41.95 ± 6.57	21.73 ± 2.69	44.80 ± 5.04	45.40 ± 4.66	46.74 ± 5.45	51.62 ± 5.80
11	40.67 ± 5.48	20.66 ± 1.44	43.30 ± 4.36	44.92 ± 4.93	45.35 ± 5.27	47.36 ± 5.17
12	41.77 ± 4.76	20.76 ± 2.10	42.57 ± 4.26	41.47 ± 4.30	43.53 ± 5.12	45.64 ± 5.02
13	38.31 ± 3.62	20.42 ± 1.72	40.37 ± 4.17	41.07 ± 4.07	43.24 ± 5.91	44.97 ± 4.16
14	38.36 ± 3.70	20.13 ± 1.31	40.11 ± 3.48	39.99 ± 3.85	42.94 ± 4.36	44.45 ± 4.36
Avg.	43.60	23.27	45.58	45.47	48.84	50.13

Table 6. NMI (%) ± standard deviation (%) of different algorithms on Yale dataset.

k	K-Means	AHC	NMF	NTD	DNMF	DNTD
5	41.76 ± 13.03	17.45 ± 9.73	40.71 ± 12.32	41.55 ± 9.82	43.60 ± 12.06	44.53 ± 13.03
6	36.12 ± 8.38	21.19 ± 8.87	38.04 ± 9.00	34.32 ± 9.28	45.26 ± 9.01	45.62 ± 8.16
7	39.05 ± 8.31	10.92 ± 4.99	40.99 ± 8.14	44.28 ± 8.29	45.45 ± 6.44	47.82 ± 6.90
8	41.90 ± 7.08	19.39 ± 4.04	40.97 ± 4.22	40.12 ± 6.70	47.67 ± 10.48	46.66 ± 5.74
9	38.90 ± 6.84	17.03 ± 6.18	41.76 ± 4.63	39.36 ± 5.73	47.31 ± 4.32	47.40 ± 5.08
10	41.20 ± 6.75	17.71 ± 4.01	44.25 ± 4.67	44.52 ± 4.22	47.50 ± 4.69	50.65 ± 5.64
11	42.05 ± 5.50	18.12 ± 1.79	43.27 ± 3.42	44.77 ± 4.99	45.84 ± 4.28	47.99 ± 4.67
12	44.05 ± 4.62	18.62 ± 3.01	44.44 ± 3.59	42.90 ± 4.22	45.32 ± 5.01	47.50 ± 4.58
13	42.31 ± 3.53	18.89 ± 2.71	43.43 ± 3.27	43.59 ± 3.16	46.82 ± 6.29	47.94 ± 3.24
14	43.46 ± 3.38	19.05 ± 2.20	43.93 ± 2.80	43.79 ± 3.49	47.35 ± 3.35	48.73 ± 3.82
Avg.	41.08	17.84	42.18	41.92	46.21	47.48

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jing, W.; Lu, L.; Liu, Q. Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation. Mathematics 2022, 10, 4723. https://doi.org/10.3390/math10244723

AMA Style

Jing W, Lu L, Liu Q. Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation. Mathematics. 2022; 10(24):4723. https://doi.org/10.3390/math10244723

Chicago/Turabian Style

Jing, Wenjing, Linzhang Lu, and Qilong Liu. 2022. "Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation" Mathematics 10, no. 24: 4723. https://doi.org/10.3390/math10244723

APA Style

Jing, W., Lu, L., & Liu, Q. (2022). Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation. Mathematics, 10(24), 4723. https://doi.org/10.3390/math10244723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation

Abstract

1. Introduction

2. Nonnegative Tucker Decomposition

3. Discriminative Nonnegative Tucker Decomposition

3.1. Updating Rules

3.1.1. Solutions of Factor Matrices $A^{(n)}$ , $(n = 1, 2, \dots, N - 1)$

3.1.2. Solutions of Factor Matrices $A^{(N)}$

3.1.3. Solutions of Core Tensor $G$

3.1.4. Solutions of Matrix $B$

3.2. Proof of Convergence

4. Experiments Section

4.1. Datasets

4.1.1. ORL Dataset

4.1.2. COIL20 Dataset

4.1.3. Yale Dataset

4.2. Compared Algorithms and Experimental Setting

4.3. Clustering Results

4.4. Parameter Selection

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Discriminative Nonnegative Tucker Decomposition for Tensor Data Representation

Abstract

1. Introduction

2. Nonnegative Tucker Decomposition

3. Discriminative Nonnegative Tucker Decomposition

3.1. Updating Rules

3.1.1. Solutions of Factor Matrices A ( n ) , ( n = 1 , 2 , … , N − 1 )

3.1.2. Solutions of Factor Matrices A ( N )

3.1.3. Solutions of Core Tensor G

3.1.4. Solutions of Matrix B

3.2. Proof of Convergence

4. Experiments Section

4.1. Datasets

4.1.1. ORL Dataset

4.1.2. COIL20 Dataset

4.1.3. Yale Dataset

4.2. Compared Algorithms and Experimental Setting

4.3. Clustering Results

4.4. Parameter Selection

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1.1. Solutions of Factor Matrices $A^{(n)}$ , $(n = 1, 2, \dots, N - 1)$

3.1.2. Solutions of Factor Matrices $A^{(N)}$

3.1.3. Solutions of Core Tensor $G$

3.1.4. Solutions of Matrix $B$