Next Article in Journal
A Comparative Sentiment Analysis of Airline Customer Reviews Using Bidirectional Encoder Representations from Transformers (BERT) and Its Variants
Previous Article in Journal
A New Two-Step Hybrid Block Method for the FitzHugh–Nagumo Model Equation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Locality-Constraint Discriminative Nonnegative Representation for Pattern Classification

1
School of Automation, Wuxi University, Wuxi 214105, China
2
School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China
3
School of Science, Wuxi University, Wuxi 214105, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(1), 52; https://doi.org/10.3390/math12010052
Submission received: 28 November 2023 / Revised: 18 December 2023 / Accepted: 21 December 2023 / Published: 23 December 2023

Abstract

:
Representation-based classification methods (RBCM) have recently garnered notable attention in the field of pattern classification. Diverging from conventional methods reliant on 1 or 2 -norms, the nonnegative representation-based classifier (NRC) enforces a nonnegative constraint on the representation vector, thus enhancing the representation capabilities of positively correlated samples. While NRC has achieved substantial success, it falls short in fully harnessing the discriminative information associated with the training samples and neglects the locality constraint inherent in the sample relationships, thereby limiting its classification power. In response to these limitations, we introduce the locality-constraint discriminative nonnegative representation (LDNR) method. LDNR extends the NRC framework through the incorporation of a competitive representation term. Recognizing the pivotal role played by the estimated samples in the classification process, we include estimated samples that involve discriminative information in this term, establishing a robust connection between representation and classification. Additionally, we assign distinct local weights to different estimated samples, augmenting the representation capacity of homogeneous samples and, ultimately, elevating the performance of the classification model. To validate the effectiveness of LDNR, extensive comparative experiments are conducted on various pattern classification datasets. The findings demonstrate the competitiveness of our proposed method.

1. Introduction

In the field of pattern classification, various classifiers have been developed to predict labels for query samples [1,2,3,4]. Among these, RBCM has recently garnered substantial attention. It is a method that involves representing a query sample through a linear combination of training samples, subsequently employing the estimated sample generated by the representation coefficients to predict the label of the query sample. One of the classic methods within the RBCM framework is sparse representation-based classifier (SRC) [5]. To represent the query sample using a reduced number of training samples, SRC employs sparsity property 1 -norm, similar to the 0 -norm, to constrain representation coefficients, thereby obtaining a sparse representation vector. It is noteworthy that SRC involves the minimization of the 1 -norm, which lacks an analytical solution. To overcome this limitation, Zhang et al. [6] argued that collaborative representation of all training samples, rather than focusing solely on sparsity, can enhance the model’s classification performance. They introduced the collaborative representation-based classifier (CRC) model, which incorporates the 2 -norm to regularize the representation vector. The collaborative representation (CR) problem is characterized as a typical ridge regression problem and possesses an analytical solution. Consequently, CRC achieves nearly the same level of accuracy as SRC while significantly reducing computational resources.
To further enhance the classification performance of CRC, Timofte et al. [7] proposed the weighted collaborative representation classifier (WCRC) model, which incorporates sample weights based on classification confidence and feature channel weights based on variance. This extension not only improves classification performance but also maintains simplicity and efficiency. Acknowledging the characteristics of intra-class residual in the nearest subspace classifier (NSC) [8] and inter-class residual in CRC, Chi et al. [9] introduced the collaborative representation optimized classifier (CROC) model, aiming to strike a balance between these two methods. Cai et al. [10] proposed a probabilistic collaborative representation-based classifier (ProCRC) model, which analyzes the principle of CRC from a probabilistic perspective. In the RBCM framework, query samples are represented by intricate addition and subtraction combinations of all the training samples, which can be well-explained mathematically but may be challenging to grasp at a physical level. Xu et al. [11] argued for the existence of only positive correlations among samples and, drawing inspiration from nonnegative matrix factorization (NMF) [12], introduced the NRC model. This model imposes a nonnegative constraint on the representation vector, thereby enhancing the representation capacity of positively correlated samples. Gou et al. [13] developed the weighted discriminative collaborative competitive representation (WDCCR) model, which incorporates class information to enhance the pattern discrimination between different classes. Wang et al. [14] argued that SRC and CRC primarily employ unsupervised learning in computing the representation vector. They introduced the discriminative representation-based classification (DRC) model, which utilizes label information from training samples to guide the representation and classification stages in a supervised manner, thereby enhancing the discriminative power of the representation vector. Unlike conventional representation methods, Zhang et al. [15] failed to directly constrain the representation vector but instead imposed constraints on representation components. They introduced the locality-constrained discriminative matrix regression (LDMR) model, which incorporates both local structure and label information, ultimately compelling homogeneous samples to contribute more to the representation and establishing a closer connection between representation and classification processes, respectively. Xu et al. [16] extended non-negative representation by introducing a scaled affine constraint, proposing the Simplex Representation model. Diverging from the utilization of affine constraint, this model suppresses the sum of entries in the representation vector to be equal to an adjustable scale s ( 0 < s < 1 ) , enhancing its discriminative capability. In cases where the original data is situated in a linearly inseparable space, the kernel nonnegative representation-based classifier (KNRC) [17] employs kernel methods to map samples into specific kernel feature spaces, enhancing the performance of NRC.
Although NRC and its variants have demonstrated impressive classification results, they do overlook certain aspects. NRC emphasizes the significance of acquiring the representation vector but falls short in fully exploiting the discriminative information of training samples. This essentially renders a disconnect between representation and classification, thereby hampering its classification capability. Additionally, NRC neglects the use of locality constraints within samples, which can result in an insufficient representation weight for homogeneous samples, potentially leading to a lack of discriminative power. To address these limitations, we introduce a novel method.
In this paper, we introduce the LDNR model. Specifically, we extend the NRC framework by introducing a competitive representation term that quantifies both the discriminative information and locality constraint of the samples. Recognizing the crucial role of estimated samples in classification, the competitive representation term includes all-class-estimated samples that involve sample discriminative information, emphasizing the significance of the connection between representation and classification. Simultaneously, we assign distinct weights to different estimated samples, quantified by the distance between the query sample and estimated samples, thereby strengthening the representation capabilities of homogeneous samples. Additionally, we employ the alternating direction method of multipliers (ADMM) [18] to address the LDNR problem and present a detailed iterative solution process. Finally, we conduct comprehensive comparative experiments with state-of-the-art RBCM and deep learning methods on various pattern classification datasets, and the results validate the competitiveness of our proposed method. As a result, the primary contributions of this paper are briefly outlined as follows:
  • We present the LDNR model, which extends the NRC framework by introducing a competitive representation term that quantifies both the discriminative information and locality constraint of samples, ultimately improving the performance of the classification model.
  • We employ the ADMM to address the LDNR problem and detail an iterative solution procedure.
  • Extensive comparative experiments are conducted on various pattern classification datasets, and the findings demonstrate the competitiveness of the LDNR.
The rest of this paper is organized as follows. Section 2 provides an overview of the related work. Section 3 details the proposed LDNR method. Section 4 presents extensive experiments to validate this method. Section 5 concludes the paper and summarizes its findings. For a clearer description, Table 1 shows some general mathematical notations.

2. Related Works

2.1. The NRC

In SRC [5] and CRC [6], along with their respective variants, it is widely acknowledged that they improve the classification performance of the model by considering the sparsity of the representation vector and the collaborative representation of all training samples. Nevertheless, negative coefficients are inevitable in these methods. NRC [11], on the other hand, considers a significant correlation between homogeneous samples, while asserting that there is no correlation between heterogeneous samples, rather than a negative one. It enforces a nonnegative constraint on the representation vector, and in this framework, only the training samples that exhibit a positive correlation with the query sample play a pivotal role in the representation process. NRC offers two key advantages. Firstly, it enhances the representation capability of homogeneous samples while weakening the ability of heterogeneous samples. Secondly, the nonnegative constrained representation vector tends to exhibit a certain sparsity, which can have important implications for improved classification accuracy. In summary, nonnegative representation (NR) can be formulated as the following optimization problem:
min α y X α 2 2 , s . t . α 0
This problem can be effectively addressed using ADMM. Subsequently, the class-specific reconstruction residual can be computed based on the obtained representation vector:
r i = y X i α i 2 .
Finally, the label of the query sample is determined by the class with the lowest reconstruction residual:
Label y = arg min i r i

2.2. The LDMR

In contrast to the 1-D linear representation method that primarily focuses on optimizing the 2 norm of the error vector, LDMR [15] adopts a different method by optimizing the nuclear norm to obtain low-rank information from the error matrix. Given a collection of n training sample matrices D = D 1 , D 2 , , D n , the query sample matrix Y is represented by all the training samples:
Y = D α + E
where E signifies error matrix, D α = α 1 D 1 + α 2 D 2 + + α n D n . In order to minimize the rank of the error matrix as much as possible, the following optimization problem is employed to obtain the representation vector α :
min α rank E , s . t . E = Y D α
Given the NP-hard nature of this problem, it is challenging to obtain an exact solution. To facilitate a solution, we transform the original problem into the following form:
min α E * , s . t . E = Y D α
where · * is operator of nuclear norm. Furthermore, LDMR recognizes that representation components X i α i play a significant role in the classification process. In particular, the loss between representation components associated with incorrect classes and the query sample, { L o s s y X i α i } i k , should be as large as possible, while L o s s y X k α k should be as small as possible, where k is ground-truth label of the query sample. To obtain a more discriminative representation vector, LDMR directly optimizes the 2 -norm of all-class representation components within the representation process. Simultaneously, it assigns distinct weights to each term, leading to the following optimization problem:
min α i = 1 C w i X i α i 2 2
where w i signifies the weight coefficients and possesses the following form:
w i = exp dist y , S i σ , dist y , S i = r i r min r max r min , r i = y X i α i 2
where S i represents the subspace spanned by samples of class i and r i denotes the distance between the query sample and subspace S i . Combining two problems (6) and (7), the overall optimization problem for LDMR is formulated as follows:
min α Mat y X α * + λ 2 i = 1 C w i X i α i 2 2 .
where Mat · operation signifies the transformation of vectors into matrices. The rest of this paper is not our point, if you wish to explore further details, we recommend referring to the original paper [15] for an in-depth investigation.

3. Locality-Constraint Discriminative Nonnegative Representation Method

3.1. Motivation

In the realm of RBCM, the key objective is to obtain the representation vector of the query sample effectively. Various methods have been developed to impose constraints on the representation vector, facilitating accurate classification of query samples. Actually, estimated samples play a pivotal role in the classification process, specifically, given an estimated sample for the i-th class, denoted as X i α i . From the perspective of classification, the term y X i α i 2 signifies the reconstruction residual for the i-th class. To obtain more discriminative classification results, y X k α k 2 should be made as small as possible, while i k y X i α i 2 should be made as large as possible, where k represents the label of the query sample. In other words, { X i α i } i k and the query samples should be treated as irrelevant. Therefore, it is reasonable to penalize misclassified estimated samples during the representation process. This introduces a novel insight to leverage the discriminative information of the query sample rather than impose constraints on the representation vector, thus establishing a robust connection between representation and classification. However, a challenge arises because the discriminative information of the query sample is not available in advance. Consequently, it is viable to consider penalizing all-class-estimated samples simultaneously, namely i = 1 C X i α i 2 2 . Notably, estimated samples with different classes should not be treated equally, as they should bear distinct weights. Samples from the same class reside in a specific subspace, while samples from different classes occupy separate subspaces. The Euclidean distance between samples in the same subspace is significantly smaller than that of samples in different subspaces. Hence, it is logical to view y X i α i 2 as a measure of locality constraint for quantifying the weights assigned to various estimated samples. In light of the considerations mentioned above, we propose locality-constraint discriminative nonnegative representation. A schematic illustration of this model is depicted in Figure 1. The flowchart is mainly divided into three steps: feature extraction, representation and classification. The representation process combines the locality constraint and the discriminative information between samples to calculate a more discriminative representation vector. The classification process determines the label of the query sample by residuals between the estimated samples and the query sample.

3.2. Representation

This work introduces the LDNR model that builds upon the NRC framework. Specifically, the LDNR model integrates both discriminative information and locality constraint among samples into the original objective function, resulting in the creation of a more discriminative model. The mathematical expression for LDNR is presented as follows:
min α y X α 2 2 + λ i = 1 C w i X i α i 2 2 , s . t . α 0
where λ represents the regularization parameter that balances two terms. Furthermore, w i plays the role of weight coefficient, which is responsible for determining the significance of each estimated sample. More specifically, w i characterizes the correlation between the query sample and the estimated sample for the i-th class. We quantify this correlation by adopting the class-specific reconstruction residual, and w i is expressed as follows:
w i = exp r i max r i / σ
where σ signifies the weight decay parameter and r i = y X i α i 2 corresponds to the class-specific reconstruction residual. In problem (10), the first term represents the representation error term, and the second term is seen as the competitive representation term. Notably, when λ = 0 , LDNR degenerates to the NRC model. Consequently, NRC can be regarded as a special case within the broader framework of LDNR.

3.3. Optimization

The LDNR problem can be effectively addressed by employing an alternating optimization strategy. Initially, this work introduces an auxiliary variable z , reforming the original problem (10) in the following form:
min α , z y X α 2 2 + λ i = 1 C w i X i α i 2 2 , s . t . z = α , z 0
The problem (12) seems to add a layer of complexity in obtaining the optimal representation vector. Fortunately, ADMM [18] can tackle this issue conveniently. The associated augmented Lagrangian function can be defined as follows:
L α , z , δ , μ = y X α 2 2 + λ i = 1 C w i X i α i 2 2 + δ , z α + μ 2 z α 2 2
where δ , and μ denote the Lagrange multiplier vector and penalty parameter, respectively. It is crucial to ensure that μ is assigned an appropriate and positive value. Additionally, the expression δ , z α represents the inner product of two vectors. To tackle the optimization problem described in (12), we employ the method of alternating optimization to optimize variables one by one, that is, updating one variable while keeping the others fixed. Therefore, this optimization problem can be transformed into solving the following distinct subproblems:
  • Update variable α with regular z and δ :
min α y X α 2 2 + λ i = 1 C w i X i α i 2 2 + δ t T z t α + μ 2 z t α 2 2
After sorting out the last two items, we can obtain the following form:
min α y X α 2 2 + λ i = 1 C w i X i α i 2 2 + μ 2 z t α + δ t μ 2 2
In order to facilitate optimization, a diagonal matrix H i = diag 0 , , 0 , 1 , , 1 , 0 , , 0 is created, in which all elements corresponding to the i class are 1, and the rest are 0. This is in light of the fact that X i α i 2 2 = X H i α 2 2 . Consequently, (15) can be transformed as follows:
min α y X α 2 2 + λ i = 1 C w i X H i α 2 2 + μ 2 z t α + δ t μ 2 2
By calculating the partial derivative of (16) with respect to α and subsequently setting it to zero, we can derive the analytical solution:
α y X α 2 2 + λ i = 1 C w i X H i α 2 2 + μ 2 z t α + δ t μ 2 2 = 0 ,
2 X T y X α + 2 λ i = 1 C w i X H i T X H i α μ z t α + δ t μ = 0 ,
X T y + X T X α + λ i = 1 C w i H i T X T X H i α μ 2 z t + μ 2 α δ t 2 = 0 ,
X T X + λ i = 1 C w i H i T X T X H i + μ 2 I α = X T y + μ z t + δ t 2
Consequently, we can obtain the solution to α :
α t + 1 = P X T y + μ z t + δ t 2
where P = X T X + λ i = 1 C w i H i T X T X H i + μ 2 I 1 , in which I denotes identity matrix.
  • Update variable z with regular α and δ :
min z z α t + 1 δ t μ 2 2 , s . t . z 0
Obviously, the solution to (22) is:
z t + 1 = max 0 , α t + 1 δ t μ
  • Update variable δ :
δ t + 1 = δ t + μ z t + 1 α t + 1
The variables α t , z t , and δ t mentioned above represent the values of α , z , and δ at each iteration step t = { 1 , 2 , , T } , where T refers to maximum number of iterations. Initially, at t = 0 , these variables α 0 , z 0 , and δ 0 are typically initialized to zero. The iteration process continues until one of two conditions is met: either it reaches the maximum iteration limit, or it satisfies the convergence conditions. These convergence conditions are defined as follows: α t z t 2 η , α t + 1 α t 2 η and z t + 1 z t 2 η . Here, η represents a predetermined threshold, typically set to 1 e 5 empirically. Importantly, all three of these convergence conditions must be satisfied simultaneously for the iteration to terminate. In summary, the procedure for solving the LDNR problem is outlined in Algorithm 1.
Algorithm 1 Solve LDNR problem via the use of ADMM.
Input: Query sample y , training samples X , λ , σ , μ > 0 , T;
1:
Initialization:  α 0 = 0 , z 0 = 0 , δ 0 = 0 ;
2:
for  t = 0 : T 1  do
3:
     α t + 1 = P X T y + μ z t + δ t 2 , P = X T X + λ i = 1 C w i H i T X T X H i + μ 2 I 1 ;
4:
     z t + 1 = max 0 , α t + 1 δ t μ ;
5:
     δ t + 1 = δ t + μ z t + 1 α t + 1 ;
6:
    if converged then
7:
        Stop;
8:
    end if
9:
end for
Output: Representation vector α t + 1 .

3.4. Convergence Analysis

Due to the convex nature of the objective function of LDNR, a globally optimal representation vector can be obtained. We conduct a convergence analysis on the AR dataset [19] and illustrate the variation of convergence conditions of Algorithm 1 with the number of iterations in Figure 2. We can observe a rapid decrease in the initial 10 iterations, with the convergence condition approaching zero by approximately the 120-th iteration.

3.5. Classification

In the classification process, LDNR-based classifiers typically follow these key steps: Solve the LDNR problem to obtain the representation vector. Calculate the reconstruction residual for each class. Assign the query sample to the class with the lowest reconstruction residual. Algorithm 2 succinctly outlines the entire classification procedure.
Algorithm 2 LDNR-based classifier.
Input: Query sample y , training samples X ;
1:
Normalize each column of X and y to have unit 2 -norm;
2:
Calculate the representation vector α by Algorithm 1:
         min α y X α 2 2 + λ i = 1 C w i X i α i 2 2 , s . t . α 0 ;
3:
Calculate the class-specific reconstruction residual:
        r i = y X i α i 2 ;
Output:  label y = arg min i r i .

3.6. Complexity Analysis

In our algorithmic solution, we iteratively update each parameter to converge to an optimal value. In Algorithm 1, the complexity of each subproblem is as follows. When solving α , the primary computational cost lies in computing the matrix P . Solving the matrix P involves a matrix inversion operation and some matrix multiplications, resulting in a computational complexity of O ( n 3 + m n 2 ) . Subsequently, the complexity of multiplying matrix P with the subsequent matrix is O ( n 2 ) . Therefore, the overall computational complexity of solving α is O ( n 3 + m n 2 + n 2 ) . Additionally, the complexities for solving z and δ are both O ( n ) .
In Algorithm 2, since the primary computational cost is matrix multiplication, the computational complexity for calculating the reconstruction residual is O ( m n ) . Finally, the overall algorithmic complexity is O ( n 3 + m n 2 + n 2 + m n + n ) .

4. Experiments

4.1. Experimental Settings

In this section, we validate the effectiveness of our proposed LDNR by conducting a series of comparative experiments on pattern classification datasets. These datasets are primarily divided into two categories: small-scale and large-scale datasets. In the small-scale datasets, we selected the AR face [19] and the USPS handwritten digit [20] datasets for our experiments. For large-scale datasets, we utilize four visual classification datasets, namely CUB-200-2011 [21], Oxford 102 Flowers [22], Aircraft [23], and Cars [24]. In the comparative experiments, we compare LDNR with diverse state-of-the-art methods, including NSC, SVM [25], SRC, CRC, CROC, ProCRC, NRC, DRC, Softmax, VGG16 [26], Symbiotic [27], FV-FGC [28], and B-CNN [29]. Furthermore, for fair comparison, we employ the same image features as NRC. Concretely, for the experiments conducted on the CUB-200-2011 and Oxford 102 Flowers datasets, we adopt the BOW-SIFT feature of samples as our training data, and on the Aircraft and Cars datasets, we use the VGG-verydeep-16 model to extract the features of samples.
The LDNR method relies on four pivotal parameters: the regularization parameter λ , weight decay parameter σ , penalty parameter μ , and maximum number of iterations T. Among these, we set T = 5 , while the remaining parameters are determined through a rigorous five-fold cross-validation process. The ranges for λ , μ , σ are appointed as { 0.0001 , 0.001 , 0.01 , 0.1 , 1 } , { 0.0001 , 0.001 , 0.01 , 0.1 , 1 } , {1, 10, 100, 1000, 10,000}, respectively. To ensure a fair comparison with the standard baseline, we replicate the NRC method on our platform and settings. Following the source code provided by the original author, we fine-tune the parameters to achieve its best experimental results. Furthermore, given that DRC represents a more advanced method than NRC, we apply the same method to attain the best classification results. As for the experimental outcomes of other methods, we derive them from the literature [11]. The results of our experiments are listed in the respective tables. Notably, the highest classification accuracy achieved in each experiment is highlighted in bold font.

4.2. Small-Scale Datasets

4.2.1. Experiments on AR Dataset

The AR dataset comprises a total of 4000 frontal color face images, encompassing 126 distinct individuals, of which 56 are male and 70 are female. These images exhibit diverse characteristics, including variations in lighting conditions, facial expressions, and occlusions. Furthermore, each individual’s images are captured in two separate sessions, with a 14-day interval between them. Figure 3 shows a selection of images from the AR dataset. In the context of our experimental design, we select a subset of 100 individuals (50 males and 50 females). Each of these selected individuals possesses fouteen images that demonstrate variations in both lighting and facial expressions. Among these fourteen images, seven images from session 1 are designated for training, while the remaining seven images from session 2 are treated for testing.
Before conducting the experiments, we crop the images in the size of 60 × 43 and normalize them to have unit 2 -norm. Subsequently, we employ PCA to project these images into feature subspaces of dimensions 54, 120, and 300, allowing for comparisons across different dimensions. The experimental results are concisely summarized in Table 2. It is evident that across all three dimensions, our LDNR outperforms all the other methods. Particularly in the 300-dimensional feature space, our algorithm achieves a remarkable accuracy of 94.3%, surpassing NRC by 1% and DRC, by 0.5%. This compelling result underscores the effectiveness of LDNR in this facial dataset.

4.2.2. Experiments on USPS Dataset

The USPS dataset constitutes a collection of grayscale handwritten digit images with a wide range of font styles. It encompasses a total of 9298 images, depicting digits from 0 to 9. The dataset is divided into two sets, with 7291 images designated for training and 2007 images reserved for testing. Notably, every image is standardized 16 × 16 pixels and consistently centered. We display some images from this dataset in Figure 4. In the experimental setup, we structured samples as follows. A total of 50, 100, and 200 images per digit are randomly organized, respectively, for training while the remaining samples are allocated for testing.
Similar to our experiments on the AR dataset, we conduct a comparative analysis of the LDNR method against several established methods, including NSC, SVM, SRC, CRC, CROC, ProCRC, NRC, and DRC. We repeat ten independent experiments and put the average results of all experiments in Table 3. As we can see, when the numbers of samples are 50 and 100, LDNR achieves the highest accuracy, outperforming NRC by 0.9% and 0.8%, respectively. However, with a training sample size of 200, LDNR exhibits a slightly lower performance than SRC, although it still maintains an advantage over NRC by 0.8%.

4.3. Large-Scale Datasets with BOW-SIFT Feature

4.3.1. Bow-Sift Feature

In this section, according to the settings in [10], we utilize VLFeat [30] to extract Bag-of-Words features [31] based on SIFT, referred to as BOW-SIFT. The size of the patch is set to 16 × 16 , and the stride is set to eight pixels. The codebook is trained using the k-means algorithm, and a two-level spatial pyramid representation [32] is employed. This results in the dimensions of each image’s features being 5120. Finally, all features are 2 -normalized.

4.3.2. Experiments on CUB-200-2011 Dataset

The Caltech-UCSD Birds-200-2011 (CUB-200-2011) is well recognized as a prominent dataset for fine-grained visual classification tasks. The dataset comprises 11,788 images depicting 200 unique bird species, with 5994 images allocated for training and 5794 for testing. Figure 5 exhibits a collection of images from this dataset. We followed the data-splitting method outlined in the original dataset, allocating around half of the data for training and the remaining for testing.
In this experiment, we employ the BOW-SIFT features of the images as training data and conduct a comparative analysis of LDNR against various methods, including Softmax, NSC, SRC, CRC, CROC, ProCRC, NRC, and DRC. The results of this comparison are detailed in Table 4. When combining the BOW-SIFT features, our LDNR achieves accuracy levels comparable to NRC and ProCRC, but outperforms Softmax by 1.7%, NSC by 1.5%, SRC by 2.2%, CRC by 0.5%, CROC by 0.8%, and DRC by 2.1%.

4.3.3. Experiments on Oxford 102 Flowers Dataset

The Oxford 102 Flowers dataset contains a total of 8198 images for 102 flower classes, with each class having between four to two hundred fifty-eight images. The dataset presents a notable challenge due to the diverse variations in image scale, pose, and illumination. Figure 6 displays some typical images from this dataset. For our experiment, we employ the data partitioning method as outlined in the original dataset, in which 2040 images are allocated for training and 6149 images are reserved for testing.
Consistent with settings on the CUB-200-2011 dataset, we apply the same set of comparative methods and summarize comparison results in Table 5. Notably, the LDNR method achieves the highest classification accuracy at 52.5%, which shows a significant improvement over other methods, surpassing Softmax by 6%, NSC by 5.8%, SRC by 5.3%, CRC by 2.6%, CROC by 3.1%, ProCRC by 1.3%, NRC by 0.4%, and DRC by an impressive 14.6%. The outcome further highlights the competitiveness of LDNR in fine-grained image classification tasks.

4.4. Large-Scale Datasets with VGG16 Feature

4.4.1. Vgg16 Feature

Based on the results of previous experiments, we have successfully demonstrated the superiority of LDNR over some excellent RBCM. To further solidify these findings, we engage in an unfair evaluation by executing a comparison between LDNR and various deep-learning methods. Following the same settings in [10], we utilize the VGG-Deep-16 model to extract CNN features, referred to as VGG16 features. We employ the activations from the penultimate layer as local features and extract them from five scales { 2 s , s = 1 , 0.5 , 0 , 0.5 , 1 }. Subsequently, all local features are pooled together without consideration of scales and locations. For all datasets, the final feature dimension for each image is 4096. Finally, all features are 2 -normalized.

4.4.2. Experiments on Aircraft Dataset

The Aircraft dataset is a pivotal benchmark dataset for fine-grained visual classification tasks. It comprises a total of 10,000 aircraft images, spanning 100 distinct aircraft models. These aircraft appear in different appearances, scales, and design structures, making this dataset highly challenging for image classification. Figure 7 showcases some representative images from this dataset. In order to organize the data, we follow the dataset’s data partitioning method, where 6667 images from the 100 aircraft classes are allocated for training, and the remaining 3333 images are reserved for testing.
In this experiment, we employ VGG16 features extracted from the images as training data and conduct a comparative analysis, comparing the LDNR against VGG16, Symbiotic, FV-FGC, and B-CNN. The results of this comparison are succinctly summarized in Table 6. It is observed in the table that CNN features tend to yield higher accuracy compared to BOW-SIFT features. Furthermore, our LDNR still emerges as the top-performing method, achieving the highest classification accuracy and surpassing VGG16 by 2.1%, Symbiotic by 15.2%, FV-FGC by 7%, and B-CNN by 3.6%.

4.4.3. Experiments on Cars Dataset

The Cars dataset is primarily used for fine-grained visual classification tasks. It comprises 16,185 images depicting various cars and encompasses frontal, lateral, and posterior perspectives of the cars, along with various angles and lighting conditions. Figure 8 displays some illustrative images from this dataset. Our experiment adheres to the data-splitting strategy prescribed by the dataset, wherein 8144 images are allocated for training and the remaining 8041 images are used for testing.
Consistent with settings from the Aircraft experiment, we employ the VGG16 feature extracted from images as training data and summarize the results of this comparative analysis in Table 7. Notably, the LDNR method once again achieves the highest classification accuracy at 91%. This accuracy surpassed VGG16 by 2.3%, Symbiotic by 13%, FV-FGC by 8.3%, and B-CNN by 0.4%.

4.5. Parameters Analysis

In this section, we present an analysis of the impact exerted by three key parameters, namely λ , μ , and σ , on the efficacy of the LDNR method. The experimental setup involves fixing two parameters at constant values while systematically varying the third one across a pre-defined range. These ranges were set as follows: { 0.0001 , 0.001 , 0.01 , 0.1 , 1 } for both λ and μ , while σ was varied across the interval {1, 10, 100, 1000, 10,000}. The experiments were carried out using the AR face dataset consisting of 120-dimensional data. Figure 9 illustrates how changes in each parameter affect classification accuracy.
It is observed that the highest accuracy is achieved when λ = 0.01 . Initially, accuracy shows a mild increase up until λ = 0.01 , beyond which it begins to decrease as λ continues to rise. This pattern suggests that λ plays a pivotal role in balancing the relative importance between the representation error term and the competitive representation term. As such, an excessively large λ can lead to reduced significance being attributed to the former, ultimately compromising the overall effectiveness of the method. Regarding μ , it becomes clear that the most favorable outcome occurs at μ = 0.1 The detrimental effect caused by values either too small or too large highlights its sensitivity and underscores the need for careful tuning. Finally, for the parameter σ , a more modest fluctuation in classification accuracy is observed across its entire tested range, denoting greater stability when compared with λ and μ . Hence, adjustment of σ may not require excessive attention.

5. Conclusions and Discussion

In this paper, we extend the NRC model and propose the LDNR model by introducing a competition representation term that quantifies the discriminative information and locality constraint of the sample. Specifically, we regularize all-class-estimated samples in the representation stage, thus strengthening the connection between representation and classification. Simultaneously, we assign different weights to each estimated sample, thereby augmenting the representation power of homogeneous samples, ultimately, resulting in an improved performance. Furthermore, we employ ADMM to efficiently address the LDNR problem and analyze its iterative steps. Finally, comprehensive comparison experiments with many excellent RBCM and deep learning methods are conducted on various challenging datasets to verify the effectiveness of LDNR, and the findings demonstrate the competitiveness of the LDNR.
Similar to other representation-based classification methods, our proposed LDNR relies on clean training data devoid of occlusion or corruption. The inclusion of such data can potentially lead to a degradation in LDNR’s performance. To address this challenge, various approaches such as low-rank matrix recovery techniques and image deblurring techniques may be explored as potential solutions.

Author Contributions

Methodology, Z.L.; software, Z.L.; validation, Z.L. and H.S.; data curation, H.S.; writing—original draft preparation, Z.L. and H.S.; writing—review and editing, H.Y., Y.Z. and G.Z.; visualization, H.S. and H.Y.; funding acquisition, Z.L., H.Y., Y.Z. and G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is part of the research supported by the National Key Research and Development Program of China (2021YFE0116900), the National Natural Science Foundation of China (42175157), the General Project of Natural Science Research of Jiangsu Higher Education Institutions (22KJB520037, 23KJB520036), the “Taihu Light” Science and Technology Project of Wuxi (K20231003, K20231010), the Wuxi University Research Start-up Fund for Introduced Talents (2021r032, 2023r046).

Data Availability Statement

The source codes that support the findings of this study will be openly available at https://github.com/li-zi-qi/LDNR (accessed on 20 November 2023).

Acknowledgments

We express our sincere gratitude to the Mathematics Research Group at the College of Science, Wuxi University, for their valuable assistance in the mathematical derivations presented in this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lin, J.; Wen, W.; Liao, J. A Novel Concept-Cognitive Learning Method for Bird Song Classification. Mathematics 2023, 11, 4298. [Google Scholar] [CrossRef]
  2. Cheng, C.S.; Chen, P.W.; Hsieh, Y.C.; Wu, Y.T. Multivariate Process Control Chart Pattern Classification Using Multi-Channel Deep Convolutional Neural Networks. Mathematics 2023, 11, 3291. [Google Scholar] [CrossRef]
  3. Erjiang, E.; Yu, M.; Tian, X.; Tao, Y. Dynamic Model Selection Based on Demand Pattern Classification in Retail Sales Forecasting. Mathematics 2022, 10, 3179. [Google Scholar]
  4. Ma, Z.; Li, Z.; Zhan, Y. Deep Large-Margin Rank Loss for Multi-Label Image Classification. Mathematics 2022, 10, 4584. [Google Scholar] [CrossRef]
  5. Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [PubMed]
  6. Zhang, L.; Yang, M.; Feng, X. Sparse representation or collaborative representation: Which helps face recognition? In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–12 November 2011; pp. 471–478. [Google Scholar]
  7. Timofte, R.; Van Gool, L. Weighted collaborative representation and classification of images. In Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan, 11–15 November 2012; pp. 1606–1610. [Google Scholar]
  8. Lee, K.C.; Ho, J.; Kriegman, D. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 684–698. [Google Scholar] [PubMed]
  9. Chi, Y.; Porikli, F. Classification and Boosting with Multiple Collaborative Representations. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1519–1531. [Google Scholar] [CrossRef] [PubMed]
  10. Cai, S.; Zhang, L.; Zuo, W.; Feng, X. A Probabilistic Collaborative Representation Based Approach for Pattern Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2950–2959. [Google Scholar]
  11. Xu, J.; An, W.; Zhang, L.; Zhang, D. Sparse, collaborative, or nonnegative representation: Which helps pattern classification? Pattern Recognit. 2019, 88, 679–688. [Google Scholar] [CrossRef]
  12. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
  13. Gou, J.; Wang, L.; Yi, Z.; Yuan, Y.; Ou, W.; Mao, Q. Weighted discriminative collaborative competitive representation for robust image classification. Neural Netw. 2020, 125, 104–120. [Google Scholar] [CrossRef] [PubMed]
  14. Wang, Y.; Tan, Y.P.; Tang, Y.Y.; Chen, H.; Zou, C.; Li, L. Generalized and Discriminative Collaborative Representation for Multiclass Classification. IEEE Trans. Cybern. 2022, 52, 2675–2686. [Google Scholar] [CrossRef] [PubMed]
  15. Zhang, C.; Li, H.; Qian, Y.; Chen, C.; Zhou, X. Locality-Constrained Discriminative Matrix Regression for Robust Face Identification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1254–1268. [Google Scholar] [CrossRef] [PubMed]
  16. Xu, J.; Yu, M.; Shao, L.; Zuo, W.; Meng, D.; Zhang, L.; Zhang, D. Scaled Simplex Representation for Subspace Clustering. IEEE Trans. Cybern. 2021, 51, 1493–1505. [Google Scholar] [CrossRef] [PubMed]
  17. Zhou, J.; Zeng, S.; Zhang, B. Kernel nonnegative representation-based classifier. Appl. Intell. 2022, 52, 2269–2289. [Google Scholar] [CrossRef]
  18. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  19. Martinez, A.; Benavente, R. The AR Face Database; CVC Technical Report 24; UAB: Barcelona, Spain, 1998. [Google Scholar]
  20. Hull, J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
  21. Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset; Technical Report; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
  22. Nilsback, M.E.; Zisserman, A. Automated Flower Classification over a Large Number of Classes. In Proceedings of the Sixth Indian Conference on Computer Vision, Graphics & Image Processing, Bhubaneswar, India, 16–19 December 2008; pp. 722–729. [Google Scholar]
  23. Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.B.; Vedaldi, A. Fine-Grained Visual Classification of Aircraft. arXiv 2013, arXiv:1306.5151. [Google Scholar]
  24. Krause, J.; Stark, M.; Deng, J.; Fei-Fei, L. 3D Object Representations for Fine-Grained Categorization. In Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 1–8 December 2013; pp. 554–561. [Google Scholar]
  25. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  26. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  27. Chai, Y.; Lempitsky, V.; Zisserman, A. Symbiotic Segmentation and Part Localization for Fine-Grained Categorization. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 321–328. [Google Scholar]
  28. Gosselin, P.H.; Murray, N.; Jégou, H.; Perronnin, F. Revisiting the Fisher vector for fine-grained classification. Pattern Recognit. Lett. 2014, 49, 92–98. [Google Scholar] [CrossRef]
  29. Lin, T.Y.; RoyChowdhury, A.; Maji, S. Bilinear CNN Models for Fine-Grained Visual Recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1449–1457. [Google Scholar]
  30. Vedaldi, A.; Fulkerson, B. Vlfeat: An Open and Portable Library of Computer Vision Algorithms. In Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Italy, 25–29 October 2010; pp. 1469–1472. [Google Scholar]
  31. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  32. Lazebnik, S.; Schmid, C.; Ponce, J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; pp. 2169–2178. [Google Scholar]
Figure 1. Schematic diagram of LDNR model for classification. (In Residuals, the red bar represents the smallest reconstruction residual, which is the category to which the query sample belongs).
Figure 1. Schematic diagram of LDNR model for classification. (In Residuals, the red bar represents the smallest reconstruction residual, which is the category to which the query sample belongs).
Mathematics 12 00052 g001
Figure 2. Convergence curve of LDNR on AR dataset.
Figure 2. Convergence curve of LDNR on AR dataset.
Mathematics 12 00052 g002
Figure 3. Some images from AR dataset.
Figure 3. Some images from AR dataset.
Mathematics 12 00052 g003
Figure 4. Some images from USPS dataset.
Figure 4. Some images from USPS dataset.
Mathematics 12 00052 g004
Figure 5. Some images from CUB-200-2011 dataset.
Figure 5. Some images from CUB-200-2011 dataset.
Mathematics 12 00052 g005
Figure 6. Some images from Oxford 102 Flowers dataset.
Figure 6. Some images from Oxford 102 Flowers dataset.
Mathematics 12 00052 g006
Figure 7. Some images from Aircraft dataset.
Figure 7. Some images from Aircraft dataset.
Mathematics 12 00052 g007
Figure 8. Some images from Cars dataset.
Figure 8. Some images from Cars dataset.
Mathematics 12 00052 g008
Figure 9. The effects of three parameters of LDNR on classification accuracy.
Figure 9. The effects of three parameters of LDNR on classification accuracy.
Mathematics 12 00052 g009
Table 1. Summary of general mathematical notations throughout all this paper.
Table 1. Summary of general mathematical notations throughout all this paper.
NotationsDefinitions
Cnumber of classes
nnumber of training samples
mdimension of feature
n i number of samples for the i-th class
α R n × 1 representation vector
α i R n i × 1 representation vector for the i-th class
y R m × 1 given query sample
X = X 1 , X 2 , , X C R m × n training sample matrix
X i = x i , 1 , x i , 2 , , x i , n i R m × n i training samples for the i-th class
Table 2. Classification accuracy (%) of various methods on AR dataset.
Table 2. Classification accuracy (%) of various methods on AR dataset.
DimensionsNSC [8]SVM [25]SRC [5]CRC [6]CROC [9]ProCRC [10]NRC [11]DRC [14]LDNR
5470.781.682.180.382.081.485.280.685.8
12075.589.388.390.090.890.791.390.191.4
30076.191.690.393.793.793.793.393.894.3
Table 3. Classification accuracy (%) of various methods on USPS dataset.
Table 3. Classification accuracy (%) of various methods on USPS dataset.
Images[8]SVM [25]SRC [5]CRC [6]CROC [9]ProCRC [10]NRC [11]DRC [14]LDNR
5091.291.691.489.291.990.991.287.892.1
10092.292.593.190.691.391.992.489.393.2
20092.893.194.291.491.792.293.290.494.0
Table 4. Classification accuracy (%) of various methods on CUB-200-2011 dataset.
Table 4. Classification accuracy (%) of various methods on CUB-200-2011 dataset.
DatasetSoftmaxNSC [8]SRC [5]CRC [6]CROC [9]ProCRC [10]NRC [11]DRC [14]LDNR
CUB-200-20118.28.47.79.49.19.9 9.97.89.9
Table 5. Classification accuracy (%) of various methods on Oxford 102 Flowers dataset.
Table 5. Classification accuracy (%) of various methods on Oxford 102 Flowers dataset.
DatasetSoftmaxNSC [8]SRC [5]CRC [6]CROC [9]ProCRC [10]NRC [11]DRC [14]LDNR
Oxford 102 Flowers46.546.747.249.949.451.252.137.952.5
Table 6. Classification accuracy (%) of various methods on Aircraft dataset.
Table 6. Classification accuracy (%) of various methods on Aircraft dataset.
DatasetVGG16 [26]Symbiotic [27]FV-FGC [28]B-CNN [29]LDNR
Aircraft85.672.580.784.187.7
Table 7. Classification accuracy (%) of various methods on Cars dataset.
Table 7. Classification accuracy (%) of various methods on Cars dataset.
DatasetVGG16 [26]Symbiotic [27]FV-FGC [28]B-CNN [29]LDNR
Cars88.778.082.790.691.0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Song, H.; Yin, H.; Zhang, Y.; Zhang, G. Locality-Constraint Discriminative Nonnegative Representation for Pattern Classification. Mathematics 2024, 12, 52. https://doi.org/10.3390/math12010052

AMA Style

Li Z, Song H, Yin H, Zhang Y, Zhang G. Locality-Constraint Discriminative Nonnegative Representation for Pattern Classification. Mathematics. 2024; 12(1):52. https://doi.org/10.3390/math12010052

Chicago/Turabian Style

Li, Ziqi, Hongcheng Song, Hefeng Yin, Yonghong Zhang, and Guangyong Zhang. 2024. "Locality-Constraint Discriminative Nonnegative Representation for Pattern Classification" Mathematics 12, no. 1: 52. https://doi.org/10.3390/math12010052

APA Style

Li, Z., Song, H., Yin, H., Zhang, Y., & Zhang, G. (2024). Locality-Constraint Discriminative Nonnegative Representation for Pattern Classification. Mathematics, 12(1), 52. https://doi.org/10.3390/math12010052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop