Next Article in Journal
Material Optimization for a High Power Thermoelectric Generator in Wearable Applications
Previous Article in Journal
A Comparative Study of Clustering Analysis Method for Driver’s Steering Intention Classification and Identification under Different Typical Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

2,1 Norm and Hessian Regularized Non-Negative Matrix Factorization with Discriminability for Data Representation

1
College of Information and Technology, Northwest University of China, Xi’an 710127, China
2
Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2017, 7(10), 1013; https://doi.org/10.3390/app7101013
Submission received: 4 September 2017 / Revised: 26 September 2017 / Accepted: 26 September 2017 / Published: 30 September 2017

Abstract

:
Matrix factorization based methods have widely been used in data representation. Among them, Non-negative Matrix Factorization (NMF) is a promising technique owing to its psychological and physiological interpretation of spontaneously occurring data. On one hand, although traditional Laplacian regularization can enhance the performance of NMF, it still suffers from the problem of its weak extrapolating ability. On the other hand, standard NMF disregards the discriminative information hidden in the data and cannot guarantee the sparsity of the factor matrices. In this paper, a novel algorithm called 2 , 1 norm and Hessian Regularized Non-negative Matrix Factorization with Discriminability ( 2 , 1 HNMFD), is developed to overcome the aforementioned problems. In 2 , 1 HNMFD, Hessian regularization is introduced in the framework of NMF to capture the intrinsic manifold structure of the data. 2 , 1 norm constraints and approximation orthogonal constraints are added to assure the group sparsity of encoding matrix and characterize the discriminative information of the data simultaneously. To solve the objective function, an efficient optimization scheme is developed to settle it. Our experimental results on five benchmark data sets have demonstrated that 2 , 1 HNMFD can learn better data representation and provide better clustering results.

1. Introduction

In many real-world applications, the input data is usually high-dimensional. On one hand, this is a serious challenge for storage and computation. On the other hand, it makes a lot of machine learning algorithms unworkable due to the curse of dimensionality [1]. Means of obtaining a concise and informative data representation for high-dimensional data has become a highly significant focus. Matrix factorization is one kind of popular and effective model of data representation, and finds two or more low-rank matrix factors and their product can well approximate the data matrix. Various matrix factorization methods have been proposed, adopting different constraints on matrix factors. The classical matrix factorization models include Principal Component Analysis (PCA) [2], Singular Value Decomposition (SVD), QR decomposition, vector quantization.
Among the various matrix factorization approaches, Non-negative Matrix Factorization (NMF) [3] is a promising one. In NMF, data matrix X is decomposed into a non-negative basic matrix U which reveals the latent semantic structure, and a non-negative encoding matrix V , which denotes a new representation with respect to the basis matrix. Because of the non-negative constraints, NMF only allows pure additive combinations, and leads to a parts-based representation. Due to its psychological and physiological interpretation, NMF and its variants have been widely used in computer vision [3], pattern recognition [4], image processing [5], document analysis [6].
Standard NMF performs factorization in Euclidean space. It is unable to discover geometrical structures in data space, which is critical in real-world applications. Therefore, lots of recent work has focused on preserving the intrinsic geometry of the data space by adding different constraints to the objective function of NMF. Cai et al. [7] proposed graph regularized NMF (GNMF) by constructing a nearest neighbor graph while preserving the local geometrical information of the data space. Lu et al. [8] proposed Manifold Regularized Sparse NMF for hyperspectral unmixing, in which manifold regularization was introduced into sparsity-constrained NMF for unmixing. Gu et al. [9] proposed Neighborhood-Preserving Non-negative Matrix Factorization, which imposed an additional constraint on NMF that each item be able to be represented as a linear combination of its neighbors. All the mentioned graph regularized NMF methods construct a graph to encode the geometrical information and use graph Laplacian as a smooth operator. Despite the successful application of graph Laplacian in semi-supervised and unsupervised learning, it still suffers from the problems that the solution is biased towards a constant, as well as its lack of extrapolating power [10].
Sparsity regularization methods that focus on selecting the input variables that best describe the output have been widely investigated. Hoyer [11] proposed a sparse constraint NMF and added the 1 norm constraint on the basis and encoding matrices, which were able to discover sparse representations better than those given by standard NMF. Cai et al. [12] proposed Unified Sparse Subspace Learning (USSL) for learning sparse projections by using a 1 norm regularizer. The limitation of the 1 norm penalty is that it is unable to guarantee successful models in cases of categorical predictors, for the reason that each dummy variable is selected independently [13]. So 1 norm is not feasible for conducting feature selection. To settle this issue, Nie et al. [14] proposed a robust feature selection approach by imposing 2 , 1 norm on loss functions. Yang et al. [15] proposed 2 , 1 norm regularized discriminative feature selection for unsupervised learning. Gu et al. [16] combined feature selection and subspace learning simultaneously in a joint framework, which is based on using 2 , 1 norm on the projection matrix and achieves the goal of feature selection. The 2 , 1 norm penalty term encourages row sparsity as well as the correlations of all the features. Recently, some researchers proposed 1 / 2 norm [17] regularized NMF [18,19], and low-rank regularized NMF [20,21] with improved performance for special purposes. The 1 / 2 norm can usually induce sparser solutions than its 1 counterpart, but it is usually unstable. The limitation of the low rank constraint is that it is not suited to feature selection in general.
What’s more, discriminative information is very important for learning a better representation. For example, by exploiting the partial label information as hard constraints of NMF, Liu [22] developed a semi-supervised Constrained NMF (CNMF), which obtained better discriminating power. Li et al. [23] proposed robust structured NMF a semi-supervised NMF learning algorithm, which learns a robust discriminative data representation by pursuing the block-diagonal structure and the 2 , p norm loss function. But under unsupervised scenario, we cannot have the label information. In fact, we could add approximate orthogonal constraints to obtain some discriminative information under unsupervised conditions. Unfortunately, standard NMF ignores this important information.
To address these flaws, a novel NMF algorithm, called 2 , 1 norm and Hessian Regularized Non-negative Matrix Factorization with Discriminability ( 2 , 1 HNMFD), is developed in this paper, which is designed to include local geometrical structure preservation, row sparsity and to exploit discriminative information at the same time. Firstly, Hessian regularization is introduced in the framework of NMF to preserve the intrinsic manifold of the data. Then, 2 , 1 norm constraints are added on the coefficient matrix to ensure that the representation vectors are row sparse. Furthermore, approximate orthogonal constraints are added to capture some discriminative informational in the data. An optimization scheme is developed to solve the objective function.
The rest of the paper is organized as follows: In Section 2, we give a brief review of related works. In Section 3, we introduce our 2 , 1 HNMFD algorithm and the optimization scheme. Experimental results are presented in Section 4. Finally, we draw a conclusion and point out future work in Section 5.

2. Related Works

This section presents a brief review of related works. At first, we describe the notations used throughout the paper.

2.1. Common Notations

In this paper, we use lowercase boldface letters and uppercase boldface letters denote vectors and matrices, respectively. For matrix M , we denote its ( i , j ) -th element by M i j . The i -th element of a vector b is denoted by b i . Given a set of N items, we use matrix X R + M × N to represent the non-negative original data matrix where the i -th column vector is according to the feature vector for the i -th item. Throughout this paper, | | M | | F denotes the Frobenius norm of matrix M .

2.2. NMF

NMF is an effective decomposition for multivariate non-negative data. Given a non-negative matrix X = { x 1 , , x N } R M × N , each column of X is a data vector. The goal of NMF is to find two low-rank matrices U and V that minimize the following objective function [3]:
J N M F =   | | X U V | | F 2 , s . t . U i k 0 , V k j 0 , i , j , k .
It is easy to see that when both U and V are taken as variables simultaneously, the objective function J N M F is not convex. But when V is fixed, J N M F is convex in U and vice versa. So Lee and Seung [24] developed an iterative multiplicative updating rule as follows:
U i k t + 1 =   U i k t ( X ( V t ) T ) i k ( U t V t ( V t ) T ) i k V k j t + 1 =   V k j t ( ( U t + 1 ) T X ) k j ( ( U t + 1 ) T U t + 1 V t ) k j .
By constructing auxiliary functions, J N M F is proved to be non-increasing under the above update rules [24].

2.3. GNMF

In [7], Cai et al. developed a graph regularized non-negative matrix factorization (GNMF) method to obtain a compact data representation that discovers hidden concept, and respects the intrinsic geometric structure simultaneously. GNMF minimizes the objective function as follows:
J G N M F =   | | X U V | | F 2 + λ T r ( V L V T ) s . t .   U i k 0 , V k j 0 , i , j , k . ,
where L = D W is called graph Laplacian, W denotes the weight matrix constructed by finding the k nearest neighbors for each data point, and D is a diagonal matrix whose entries are column sums of W , i.e., D i i = j W i j .
The objective function J G N M F is also not convex when both U and V are taken as variables simultaneously. Therefore it is unlikely to find the global minima. The Using the following update rules [7], local minima of the objective function J G N M F can be obtained:
U i k t + 1 =   U i k t ( X ( V t ) T ) i k ( U t V t ( V t ) T ) i k V k j t + 1 =   V k j t ( ( U t + 1 ) T X + λ V t W ) k j ( ( U t + 1 ) T U t + 1 V t + λ V t D ) k j ,
Cai et al. [7] has proved that the objective function J G N M F is non-increasing under the above updating rules.

3. 2 , 1 Norm and Hessian Regularized Non-Negative Matrix Factorization with Discriminability

In this section, a novel 2 , 1 norm and Hessian Regularized Non-negative Matrix Factorization with Discriminability ( 2 , 1 HNMFD) model is developed, which performs Hessian regularized Non-negative Matrix Factorization (HNMF) and preserves discriminative information, as well as maintaining row sparsity for encoding matrices simultaneously. Then, an alternating optimization scheme is developed to solve its objective function.

3.1. Hessian Regularized Non-Negative Matrix Factorization

Hessian energy is motivated by Eells-energy for mapping between manifolds [25]. Given a smooth manifold M R n and a map function f : M R r , the Eells-energy of f can be written as [10]:
S E e l l s ( f ) = M | | a b f | | T x M T x M 2 d V ( x )
where a b f is the second covariant derivation of f , T x M is the tangent space at point x M and d V ( x ) is the natural volume element. Using normal coordinate, M | | a b f | | T x M T x M 2 can be written as:
M | | a b f | | T x M T x M 2   = r , s = 1 d ( 2 f C r C s ) 2
where C r and C s are normal coordinates. So given point x i , the norm of the second covariant derivative is just the Frobenius norm of the Hessian of f in standard coordinate. Thus the resulting functional is called Hessian regularizer S H e s s ( f ) :
S H e s s ( f ) = i = 1 n r , s = 1 d 2 f C r C s | x i 2
Let N k ( X i ) represent the set of k nearest neighbors of X i , the Hessian of f ( X i ) on N k ( X i ) can be approximated as follows:
2 f C r C s | X i j = 1 k H r s j ( i ) f ( X j )
The operator H can be computed by fitting a second-order polynomial p ( X ) in normal coordinates to { f ( X j ) } j = 1 k . Let V k i = f k ( X i ) and X k = ( X k 1 , , X k p ) , the estimate of the Frobenius norm of the Hessian of f at x i is thus given by
| | a b f | | 2 r , s = 1 m ( α = 1 k H r s α ( i ) f ( α ) ) 2 = α , β = 1 k f ( α ) f ( β ) B α β ( i )
where B α β ( i ) = r , s = 1 m H r s α ( i ) H r s β ( i ) and the total estimated Hessian energy S ^ H e s s ( f ) is the sum over all data points as follows:
S ^ H e s s ( f ) = i = 1 n r , s = 1 m ( 2 f C r C s | x i ) 2 = i = 1 n α N k ( X i ) β N k ( X j ) f α f β B α β ( i )   =   < f , B f >
where B is denoted as the Hessian regularization matrix, and is the accumulated matrix summing up all the matrices B ( i ) .
Applying Hessian energy as the regularization term in NMF to estimate the local manifold structure, the Hessian regularized NMF (HNMF) can be formulated as:
min U , V 1 2 | | X U V | | F 2 + λ t r ( V B V T ) U i k 0 , V k j 0 , i , j , k . ,
where λ is the regularization parameter.

3.2. Sparseness Constraints

To distinguish the importance of different features, we try to encourage the significant features to be non-zero values, and the insignificant features to be zero, after the iterative update. Since each row of encoding matrix V corresponds to a feature in the original space, we add 2 , 1 norm regularization to the encoding matrix V , which can enforce some rows in V to tend to zero. For new representation matrix V , a row sparseness regularizer is introduced into the objective function to shrink some row vectors in V to be zero. In this way, we are able to preserve the important features and remove the irrelevant features. The 2 , 1 norm of matrix V is defined as:
| | V | | 2 , 1 = j = 1 K | v j . | ,
where v j . represents the j -th row of V .

3.3. Discriminative Constraints

To characterize some discriminative information in the learned representation matrix V , we follow the works done in [26,27], in which a scaled indicator matrices were developed. Given an indicated matrix Y = { 0 , 1 } N × K , where Y i j = 1 if the i -th data point belongs to the j -th category. The scaled indicated matrix is defined as F = Y ( Y T Y ) 1 2 , where each column of F is:
F . j = [ 0 , , 0 , 1 , , 1 n j , 0 , , 0 ] T / n j
where n j is the number of samples in the j -th group. We encourage the new representation V to capture the discriminative information in F . Intuitively, we only need V to approximate F T , i.e., | | V F T | | F 2 ε , where ε is any small constant. Unfortunately, in unsupervised scenarios, we cannot obtain any label information in advance. However, we find that the scaled indicator matrix is strictly orthogonal
F T F = ( Y T Y ) 1 2 Y T Y ( Y T Y ) 1 2 = I k ,
where I k is a k × k identity matrix. Since F is orthogonal, V should be orthogonal too. However, this constraint is too strict. So we relax the orthogonal constraint and let V be approximately orthogonal, i.e.,
| | V T V I k | | F 2 ε

3.4. Objective Function

By integrating (12) and (14) into Hessian regularized NMF, the objective function of 2 , 1 HNMFD is defined as:
min U , V 1 2 | | X U V | | F 2 + λ t r ( V B V T ) + μ | | V T V I k | | F 2 + γ | | V | | 2 , 1 , s . t . U i k 0 , V k j 0 , i , j , k .
where λ , μ and γ are regularization parameters.

3.5. Optimization

In this section, we will introduce an iterative algorithm which can give the solution to Equation (15). As far as we can see, the objective function of 2 , 1 HNMFD is not convex in both U and V , so we cannot result in a closed-form solution. In the following, we will present an alternative scheme which can obtain local minima. Firstly, the optimization problem of Equation (15) can be rewritten as follows:
O = 1 2 T r ( ( X X T 2 X V T U T + U V V T U T ) F 2 ) + λ t r ( V B V T ) + μ | | V T V I k | | F 2 + γ | | V | | 2 , 1 ,
let ψ i k and Φ k j be the Lagrange multiplier for constraint U i k 0 and V k j 0 , respectively, then the Lagrange function L can be written as follows:
L = 1 2 T r ( X X T 2 X V T U T + U V V T U T ) + λ t r ( V B V T ) + μ T r ( V T V V T V 2 V T V + I k ) + γ | | V | | 2 , 1 + T r ( ψ U T ) + T r ( Φ V T ) . ,

3.5.1. Updating U

The partial derivation of L with respect to U is:
L U = U V V T X V T + ψ
Using the Karush–Kuhn–Tucker (KKT) conditions, ψ i k U i k = 0 , we get
( U V V T X V T ) i k U i k = 0 .
The above equation leads to the following updating formula:
U i k = U i k ( X V T ) i k ( U V V T ) i k

3.5.2. Updating V

The partial derivation of L with respect to V is:
L V = U T U V U T X + 2 λ V B + 4 μ V V T V 4 μ V + γ R V + Φ .
where R is a diagonal matrix with the i -th diagonal element as R i i = 1 2 | | v i . | | 2 .
Using the KKT condition Φ k j V k j = 0 , we get
( U T U V U T X + 2 λ V B + 2 λ V B + 4 μ V V T V 4 μ V + γ R V ) k j = 0 .
where B = B + B , B + = | B | + B 2 , B = | B | B 2 . Equation (22) leads to the following updating formula:
V k j = V k j ( U T X + 2 λ V B + 4 μ V ) k j ( U T U V + 2 λ V B + + 4 μ V V T V + γ R V ) k j .
The algorithm is shown in Algorithm 1.
Algorithm 1: Optimization of 2 , 1 HNMFD
Input: X , λ , μ , γ
Output: U , V
1 Randomly initialize U 0 , V 0 ;
2 Repeat
3  Fixing V , updating U by Equation (20);
4  Fixing U , updating V by Equation (23);
5 Until Equation (15) converged or max no. iterations reached.

3.6. Computational Complexity Analysis

In this section, we discuss the extra computational cost of our proposed algorithm. 2 , 1 HNMFD needs Ο ( N 2 M ) to construct the neatest neighbor graph. Suppose the multiplicative updates stops after t iterations, the complexity for updating 2 , 1 HNMFD is Ο ( t N M K ) . Thus the overall complexity of 2 , 1 HNMFD is Ο ( t N M K + N 2 M ) , which is similar to that of GNMF.

3.7. Proof of Convergence

Theorem 1.
The function value in Equation (15) is non-increasing under the rules in Equations (20) and (23).
The updating rule for U is the same as in the classical NMF. Thus O in Equation (15) is non-increasing under Equation (20). In the next, we will prove that O is non-creasing under Equation (23). The proof uses the auxiliary function [18] defined as follows.
Definition 1.
G ( v , v ) is an auxiliary function for F ( v ) if
G ( v , v ) F ( v ) ,   G ( v , v ) = F ( v )
is satisfied.
Lemma 1.
If G is an auxiliary function for F , then F is non-increasing under the updating rule
v ( t + 1 ) = arg min v G ( v , v ( t ) )
Proof for Lemma 1.
F ( v ( t + 1 ) ) G ( v ( t + 1 ) , v ( t ) ) G ( v ( t ) , v ( t ) ) = F ( v ( t ) )
In this next section, we will show that the updating rule for V in Equation (23) is exactly the rule in Equation (24) with a proper auxiliary function. We use F a b to denote the part of O that is only relevant to v a b .
Lemma 2.
Function
G ( v , v a b ( t ) ) = F a b ( v a b ( t ) ) + F a b ( v a b ( t ) ) ( v v a b ( t ) ) + ( U T U V + 2 λ V B + + 4 μ V V T V + γ R V ) a b v a b ( t ) ( v v a b ( t ) ) 2
is an auxiliary function for F a b .
Proof for Lemma 2.
Since G ( v , v ) = F a b ( v a b ( t ) ) is evident, we only need show that G ( v , v a b ( t ) ) F a b ( v ) . By comparing G ( v , v a b ( t ) ) to Taylor series expansion of F a b ( v ) , we get G ( v , v a b ( t ) ) F a b ( v ) . Similar proof can be see in [7]. ☐
Proof for Theorem 1.
By substituting G ( v , v a b ( t ) ) in Equation (24) with Equation (25), we obtain the updating rule as below,
v a b ( t + 1 ) = v a b ( t + 1 ) v a b ( t ) F a b ( v a b ( t ) ) 2 ( U T U V + 2 λ V B + + 4 μ V V T V + γ R V ) a b ( v v a b ( t ) ) 2 = v a b ( t ) ( U T X + 2 λ V B + 4 μ V ) a b ( U T U V + 2 λ V B + + 4 μ V V T V + γ R V ) a b ( v v a b ( t ) ) 2
which is identical to Equation (23). Since G ( v , v a b ( t ) ) is the auxiliary function of F a b ( v ) , F a b ( v ) is non-increasing under this updating rule. So O in Equation (15) is non-increasing under Equation (23). ☐

4. Experiment

In this section, we evaluate the performance of 2 , 1 HNMFD. To demonstrate the advantages of the proposed method, we have compared the results of the proposed method with related state-of-the-art methods. All statistical significance tests were performed using a significance level of 0.05. We used Student’s t-tests in the experiments.
To perform data clustering for NMF-based method, the original data were firstly transformed by different NMF algorithms to generate new representations. Then, new representations were fed to Kmeans clustering algorithm to obtain the final clustering result.

4.1. Data Sets

We use five real-world data sets to evaluate the proposed method. These datasets are described below:
The Yale face dataset consists of 165 gray-scale face images of 15 persons. There are 11 images per subject, each with a different facial expression or configuration: center-light, with/without glasses, normal, right-light, sad, sleepy, surprised and wink.
The ORL face dataset contains 10 different face images for 40 different persons; each of the 400 images has been collected against a dark, homogeneous background, with the subjects in an upright, frontal position, with some tolerance for side movement.
The UMIST face dataset contains 575 images of 20 people, each covering a range of poses from profile to frontal views. Subjects cover a range in terms of race, sex and appearance.
The COIL20 data set contains 32 × 32 gray scale images of 20 objects, viewed from varying angles.
The CMU PIE face dataset contains 32 × 32 gray scale face images of 68 people. Each person has 42 facial images under various light and illumination conditions.
The important statistics of these datasets are summarized in Table 1.

4.2. Evaluation Metrics

In our experiments, we set the number of clusters equal to the number of classes for all algorithms. To evaluate the performance of clustering, we use Accuracy and Normalized Mutual Information (NMI) to measure the clustering results.
Accuracy is defined as follows:
A c c u r a c y = i = 1 n δ ( s i , m a p ( r i ) ) n ,
where r i and s i are cluster labels of item i in the clustering results and ground truth, respectively. If x = y , δ ( x , y ) equals 1 and otherwise equals 0, and m a p ( r i ) is the permutation mapping function which maps r i to the equivalent cluster label in ground truth.
The NMI is defined as follows:
N M I ( C , C ) = M I ( C , C ) m a x ( H ( C ) , H ( C ) ) ,
where M I ( C , C ) is the mutual information between C and C . If C is identical with C , N M I ( C , C ) = 1 . If the two cluster sets are completely independent, N M I ( C , C ) = 0 .

4.3. Baseline

To demonstrate how the clustering performance can be enhanced by 2 , 1 HNMFD, we compare the following state-of-the-art clustering algorithms:
(1)
Traditional Kmeans clustering algorithm (Kmeans).
(2)
Non-negative Matrix Factorization (NMF) [3].
(3)
Normalized Cut, one of the popular spectral clustering algorithms (NCut) [28].
(4)
Graph-regularized Non-negative Matrix Factorization (GNMF) [7].

4.4. Clustering Results

Table 2 presents the clustering accuracy of all of the algorithms on each of the three data sets, while Table 3 presents the normalized mutual information. The observations are as follows.
Firstly, NMF-based methods, including NMF, GNMF and 2 , 1 HNMFD, outperform the Kmeans method. This suggests the superiority of parts-based data representation for perceiving the hidden matrix factors.
Secondly, Ncut and GNMF exploit geometrical information, and achieve more superior performance than Kmeans and NMF methods. This suggests that geometrical information is very important in learning the hidden factors.
Finally, on all the data sets, 2 , 1 HNMFD always outperforms the other clustering methods. This demonstrates that by exploiting the power of Hessian regularization, group sparse regularization and discriminative information, new method can learn a more meaningful representation.

4.5. Parameter Sensitivity

2 , 1 HNMFD has three parameters, λ , μ and γ . We investigated their influence on 2 , 1 HNMFD’s performance by varying one parameter at a time while fixing the other two. For each specific setting, we run 2 , 1 HNMFD 10 times and record the average performance.
We plot the performance of 2 , 1 HNMFD with respect to λ in Figure 1a. Parameter λ measures the importance of the graph embedding regularization terms of 2 , 1 HNMFD. A too small λ may cause graph regularization so weak that the local geometrical information of data cannot be effectively characterize, while too big λ may cause a trivial solution. 2 , 1 HNMFD shows superior performance when λ equals 0.01, 0.001 and 0.1 for YALE, ORL and UMIST, respectively.
We plot the performance of 2 , 1 HNMFD with respect to μ in Figure 1b. Parameter μ controls the orthogonality of the learned representation. When μ is too small, the orthogonal constraint will be too weak, and 2 , 1 HNMFD may be ill-defined. When μ is too large, the constraint may dominate the objective function of 2 , 1 HNMFD, and the learned representation will be too sparse, which is also unfaithful to the real-world situation. We can observe that 2 , 1 HNMFD is able to achieve encouraging performance when μ equals 0.001, 0.001 and 0.1 for YALE, ORL and UMIST respectively.
We plot the performance of 2 , 1 HNMFD with respect to γ in Figure 1c. Parameter γ controls the degree of sparsity of the encoding matrix. Sparsity constraints that are too weak or too heavy will be bad for the learned representation. We find that 2 , 1 HNMFD consistently outperforms the best baseline methods on the three datasets when γ = 1 .

4.6. Convergence Analysis

The updating rules for minimizing the objective function of 2 , 1 HNMFD are essentially iterative. We have provided its convergence proof. Next, we analyze how fast the rules can converge.
We investigate the empirical convergence properties of both GNMF and 2 , 1 HNMFD on three datasets. For each figure, the x-axis denotes the iterative number and the y-axis is the value of objective function with log scale. Figure 2a–c show the objective function value against the number of iterations performed for data set YALE, ORL and UMIST, respectively. We observed that, at the beginning, the objective function values for both GNMF and 2 , 1 HNMFD dropped drastically, and were able to converge very fast, usually within 100 iterations.

5. Conclusions and Future Work

In this paper, we have discussed a novel matrix factorization method, called 2 , 1 norm and Hessian Regularized Non-negative Matrix Factorization with Discriminability ( 2 , 1 HNMFD), for data representation. On one hand, 2 , 1 HNMFD uses Hessian regularization to preserve the local manifold structures of data. On the other hand, 2 , 1 HNMFD exploits the 2 , 1 norm constraint to obtain sparse representation, and uses an approximation orthogonal constraint to characterize the discriminative information of the data. Experimental results on 5 real-world datasets suggest that 2 , 1 HNMFD is able to learn a better part-based representation. This paper only considers single-view cases. In the future, we will consider multi-view cases, and learn a meaningful representation for multi-view data.

Acknowledgments

This research was supported by the National High-tech R&D Program of China (863 Program) (No. 2014AA015201) and the Program for Changjiang Scholars and Innovative Research Team in University (No. IRT13090). The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

Author Contributions

All authors contributed equally to this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Verleysen, M.; François, D. The Curse of Dimensionality in Data Mining and Time Series Prediction. In Proceedings of the Computational Intelligence and Bioinspired Systems, Barcelona, Spain, 8–10 June 2005; pp. 758–770. [Google Scholar]
  2. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  3. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [PubMed]
  4. Guillamet, D.; Vitria, J. Classifying faces with nonnegative matrix factorization. In Proceedings of the 5th Catalan Conference for Artificial Intelligence, Castellón, Spain, 24–25 October 2002; pp. 24–31. [Google Scholar]
  5. Zafeiriou, S.; Petrou, M. Nonlinear non-negative component analysis algorithms. IEEE Trans. Image Process. 2010, 19, 1050–1066. [Google Scholar] [CrossRef] [PubMed]
  6. Xu, W.; Liu, X.; Gong, Y. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada, 28 July–1 August 2003; pp. 267–273. [Google Scholar]
  7. Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1548–1560. [Google Scholar] [PubMed]
  8. Lu, X.; Wu, H.; Yuan, Y.; Yan, P.; Li, X. Manifold regularized sparse NMF for hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2815–2826. [Google Scholar] [CrossRef]
  9. Gu, Q.; Zhou, J. Neighborhood Preserving Nonnegative Matrix Factorization. In Proceedings of the British Machine Vision Conference, London, UK, 7–10 September 2009; pp. 1–10. [Google Scholar]
  10. Kim, K.I.; Steinke, F.; Hein, M. Semi-supervised regression using Hessian energy with an application to semi-supervised dimensionality reduction. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 979–987. [Google Scholar]
  11. Hoyer, P.O. Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 2004, 5, 1457–1469. [Google Scholar]
  12. Cai, D.; He, X.; Han, J. Spectral regression: A unified approach for sparse subspace learning. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, 28–31 October 2007; pp. 73–82. [Google Scholar]
  13. Zou, H.; Yuan, M. The F∞-norm support vector machine. Stat. Sin. 2008, 18, 379–398. [Google Scholar]
  14. Nie, F.; Huang, H.; Cai, X.; Ding, C.H. Efficient and robust feature selection via joint ℓ2,1-norms minimization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; pp. 1813–1821. [Google Scholar]
  15. Yang, Y.; Shen, H.T.; Ma, Z.; Huang, Z.; Zhou, X. ℓ2,1-norm regularized discriminative feature selection for unsupervised learning. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI11), Barcelona, Spain, 16–22 July 2011; pp. 1589–1594. [Google Scholar]
  16. Gu, Q.; Li, Z.; Han, J. Joint feature selection and subspace learning. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI11), Barcelona, Spain, 16–22 July 2011; pp. 1294–1299. [Google Scholar]
  17. Xu, Z.; Chang, X.; Xu, F.; Zhang, H. L1/2 regularization: A thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1013–1027. [Google Scholar] [PubMed]
  18. Qian, Y.; Jia, S.; Zhou, J.; Robles-Kelly, A. Hyperspectral unmixing via L1/2 sparsity-constrained nonnegative matrix factorization. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4282–4297. [Google Scholar] [CrossRef]
  19. Wang, W.; Qian, Y. Adaptive L1/2 Sparsity-Constrained NMF With Half-Thresholding Algorithm for Hyperspectral Unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2618–2631. [Google Scholar] [CrossRef]
  20. Tsinos, C.G.; Rontogiannis, A.A.; Berberidis, K. Distributed Blind Hyperspectral Unmixing via Joint Sparsity and Low-Rank Constrained Non-Negative Matrix Factorization. IEEE Trans. Comput. Imaging 2017, 3, 160–174. [Google Scholar] [CrossRef]
  21. Li, X.; Cui, G.; Dong, Y. Graph regularized non-negative low-rank matrix factorization for image clustering. IEEE Trans. Cybern. 2016, 1–14. [Google Scholar] [CrossRef] [PubMed]
  22. Liu, H.; Wu, Z. Non-negative matrix factorization with constraints. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, Georgia, 11–15 July 2010; pp. 506–511. [Google Scholar]
  23. Li, Z.; Tang, J.; He, X. Robust Structured Nonnegative Matrix Factorization for Image Representation. IEEE Trans. Neural Netw. Learn. Syst. 2017, 1–14. [Google Scholar] [CrossRef] [PubMed]
  24. Lee, D.D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems 13 (NIPS 2000), Denver, CO, USA, 27 November–2 December 2000; pp. 556–562. [Google Scholar]
  25. Steinke, F.; Hein, M. Non-parametric regression between manifolds. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–10 December 2008; pp. 1561–1568. [Google Scholar]
  26. Ye, J.; Zhao, Z.; Wu, M. Discriminative k-means for clustering. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–10 December 2008; pp. 1649–1656. [Google Scholar]
  27. Yang, Y.; Xu, D.; Nie, F.; Yan, S.; Zhuang, Y. Image clustering using local discriminant models and global integration. IEEE Trans. Image Process. 2010, 19, 2761–2773. [Google Scholar] [CrossRef] [PubMed]
  28. Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar]
Figure 1. Influence of different parameter settings on the performance of 2 , 1 HNMFD in 3 datasets: (a) varying λ while fixing μ and γ ; (b) varying μ while fixing λ and γ ; and (c) varying γ while fixing λ and μ .
Figure 1. Influence of different parameter settings on the performance of 2 , 1 HNMFD in 3 datasets: (a) varying λ while fixing μ and γ ; (b) varying μ while fixing λ and γ ; and (c) varying γ while fixing λ and μ .
Applsci 07 01013 g001
Figure 2. Convergence curve of GNMF and 2 , 1 HNMFD. (a) YALE, (b) ORL and (c) UMIST.
Figure 2. Convergence curve of GNMF and 2 , 1 HNMFD. (a) YALE, (b) ORL and (c) UMIST.
Applsci 07 01013 g002
Table 1. Statistics of the datasets.
Table 1. Statistics of the datasets.
DatasetSizeCategoriesDimensionality
YALE165151024
ORL400401024
UMIST57520644
COIL201440201024
PIE2856681024
Table 2. Clustering Accuracy on the 5 datasets (%).
Table 2. Clustering Accuracy on the 5 datasets (%).
DatasetKmeansNMFNCutGNMFOurs
YALE37.85 ± 2.3640.15 ± 2.8940.73 ± 2.3941.42 ± 3.1042.94 ± 2.65
ORL52.15 ± 2.8654.17 ± 2.0057.60 ± 3.0057.95 ± 3.4159.22 ± 1.54
UMIST40.71 ± 1.9241.12 ± 2.7141.37 ± 1.7444.50 ± 2.5950.16 ± 1.16
COIL2063.19 ± 4.8563.25 ± 3.1770.19 ± 2.8075.92 ± 2.7978.03 ± 1.70
PIE24.22 ± 0.8551.08 ± 2.2766.60 ± 2.1475.61 ± 3.3277.81 ± 2.33
Table 3. Normalized Mutual Information on the 5 datasets (%).
Table 3. Normalized Mutual Information on the 5 datasets (%).
DatasetKmeansNMFNCutGNMFOurs
YALE43.58 ± 2.4245.00 ± 2.7145.91 ± 2.1546.08 ± 2.1246.88 ± 2.11
ORL70.93 ± 1.6973.36 ± 1.4675.13 ± 1.5075.52 ± 1.9376.09 ± 0.95
UMIST60.08 ± 1.6560.32 ± 0.8562.11 ± 1.7663.53 ± 1.2766.13 ± 1.26
COIL2074.32 ± 2.0072.65 ± 1.2178. 40 ± 1.5786.92 ± 2.7989.90 ± 1.79
PIE53.55 ± 1.0278.68 ± 10981.87 ± 1.6389.07 ± 0.8291.27 ± 2.57

Share and Cite

MDPI and ACS Style

Luo, P.; Peng, J.; Fan, J. ℓ2,1 Norm and Hessian Regularized Non-Negative Matrix Factorization with Discriminability for Data Representation. Appl. Sci. 2017, 7, 1013. https://doi.org/10.3390/app7101013

AMA Style

Luo P, Peng J, Fan J. ℓ2,1 Norm and Hessian Regularized Non-Negative Matrix Factorization with Discriminability for Data Representation. Applied Sciences. 2017; 7(10):1013. https://doi.org/10.3390/app7101013

Chicago/Turabian Style

Luo, Peng, Jinye Peng, and Jianping Fan. 2017. "ℓ2,1 Norm and Hessian Regularized Non-Negative Matrix Factorization with Discriminability for Data Representation" Applied Sciences 7, no. 10: 1013. https://doi.org/10.3390/app7101013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop