Kernel Reverse Neighborhood Discriminant Analysis

Li, Wangwang; Tan, Hengliang; Feng, Jianwei; Xie, Ming; Du, Jiao; Yang, Shuo; Yan, Guofeng

doi:10.3390/electronics12061322

Open AccessArticle

Kernel Reverse Neighborhood Discriminant Analysis

by

Wangwang Li

,

Hengliang Tan

^*

,

Jianwei Feng

,

Ming Xie

,

Jiao Du

,

Shuo Yang

and

Guofeng Yan

School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(6), 1322; https://doi.org/10.3390/electronics12061322

Submission received: 31 December 2022 / Revised: 13 February 2023 / Accepted: 28 February 2023 / Published: 10 March 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Currently, neighborhood linear discriminant analysis (nLDA) exploits reverse nearest neighbors (RNN) to avoid the assumption of linear discriminant analysis (LDA) that all samples from the same class should be independently and identically distributed (i.i.d.). nLDA performs well when a dataset contains multimodal classes. However, in complex pattern recognition tasks, such as visual classification, the complex appearance variations caused by deformation, illumination and visual angle often generate non-linearity. Furthermore, it is not easy to separate the multimodal classes in lower-dimensional feature space. One solution to these problems is to map the feature to a higher-dimensional feature space for discriminant learning. Hence, in this paper, we employ kernel functions to map the original data to a higher-dimensional feature space, where the nonlinear multimodal classes can be better classified. We give the details of the deduction of the proposed kernel reverse neighborhood discriminant analysis (KRNDA) with the kernel tricks. The proposed KRNDA outperforms the original nLDA on most datasets of the UCI benchmark database. In high-dimensional visual recognition tasks of handwritten digit recognition, object categorization and face recognition, our KRNDA achieves the best recognition results compared to several sophisticated LDA-based discriminators.

Keywords:

linear discriminant analysis; kernel trick; reverse nearest neighbors; Gaussian kernel

1. Introduction

Linear discriminant analysis (LDA) [1] is a supervised classical linear learning method stemming from the Fisher criterion [2], which can be used not only for supervised classification problems but also for feature dimensional reduction. Its idea is not difficult to understand. For a set of given training samples, LDA attempts to find the optimal projection matrix that can maximize the between-class scatter matrix and minimize the within-class scatter matrix, simultaneously. As a supervised feature extraction method, LDA has demonstrated its feasibility and efficiency in pattern classification [3], and it has been widely applied in hyperspectral image classification [4], EEG signals analysis [5], re-identification [6], etc. LDA is also a critical evaluation approach to visual classification, such as handwriting recognition, face recognition and object categorization [7].

LDA assumes that the samples of all training data follow a multivariate Gaussian distribution that has the same covariance but different mean values for different classes [8]; once this statement is not satisfied, for example, when several different clusters are encountered from samples in the same class, LDA gives undesirable results [9]. To solve this problem, a two-step subclass division method was proposed [10]; the first step applies k-means for clustering and the second step uses the EM-alike framework to optimize the subclasses. Recently, the reverse nearest neighbors approach has been exploited with LDA to form neighborhood linear discriminant analysis (nLDA) [7]. As an unsupervised outlier detection method, reverse nearest neighbor (RNN) can eliminate the “isolated points” in the training set. nLDA expects that a sample and its RNN should be as close as possible, while the RNN from two samples that belong to different classes should be as far as possible in the projected space. As the scatter matrix is defined based on RNN, which does not directly consider the whole class, the problem of datasets containing multimodal classes can be overcome by nLDA. nLDA is a method where the scatter matrix is directly defined on the neighborhood, which is a new localized discriminator. The smallest subclass can be thought of as a neighborhood; thus, it can perform discriminative learning without obeying the independently and identically distributed (i.i.d.) assumption of sample data.

However, since nLDA is extended from LDA, which is a linear parametric method, thus it may fail to handle nonlinear data. It is well known that the complex appearance variations caused by deformation, illumination and visual angle often generate non-linearity [11,12]. If we ignore the non-linearity of the data, the linear discriminators often give a lower performance. One practical solution is to map the points of the original input data to a higher-dimensional feature space and then learn the linear classifier on that space for discriminant learning. However, addressing the non-linearity in the higher-dimensional feature space is non-trivial, especially when the dimension of the mapped space is infinite. Many methods have been proposed to use the kernel trick to handle this problem [11,12,13], which is also exploited in our proposed approach.

To handle the nonlinear data, circumvent the assumption of i.i.d. of sample data and solve the multimodal classes problem simultaneously, in this paper, we extend nLDA to its kernel version, called kernel reverse neighborhood discriminant analysis (KRNDA). The Gaussian kernel is employed to map the input data into a higher-dimensional feature space where non-linearity can be alleviated. The deduction of KRNDA with kernel tricks is presented in detail. We give extensive evaluations on the UCI dataset, and the visual classification tasks of handwriting recognition, face recognition and object categorization to verify the effectiveness of the proposed approach.

The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 briefly introduces the basic principles of LDA and nLDA methods. Subsequently, we propose our KRNDA method; the deduction with the kernel trick is presented in Section 4. Experiments are conducted in Section 5. Section 6 concludes this paper.

2. Related Work

Research works on LDA have been studied for decades [14,15,16] and plentiful approaches have been proposed to address different problems of LDA. For pattern recognition, the small sample size (SSS) problem is an essential issue for LDA. The SSS problem generally occurs when the feature dimension of each sample is very high, while there are insufficient training samples of each class. This result often leads to the singularity or ill-condition of the within-class scatter matrix, and the LDA can not be solved by the eigen-decomposition. To address this problem, Fisherfaces [1] were proposed to conduct principal component analysis (PCA) for reducing the feature dimension of each sample and then implement the LDA algorithm. The singularity or ill-condition of the within-class scatter matrix caused by the SSS problem can also be solved by adding a small value to the diagonal of the within-class scatter matrix, which is known as regularized LDA [17]. Pang et al. [18] combined the clustering and regularization terms to alleviate the deviation of the scatter matrix caused by the SSS problem. There are many other approaches proposed to circumvent the singularity of the within-class scatter matrix and the instability of its reverse caused by the SSS. For example, direct LDA (DLDA) [19] and null space LDA (NLDA) [20] select different subspaces of the scatter matrix to avoid the singularity. Eigenfeature regularization and extraction (ERE) [21] retains and reweights the entire eigenspace of the within-class scatter matrix for non-singularity to persist. This approach achieved excellent performance on face recognition.

The multimodal classes problem of LDA arouses a lot of attention. Marginal fisher analysis (MFA) [22] employs graph embedding to characterize the nearest neighbors to avoid the i.i.d. of the sample data. Exploiting manifolds to represent local structures is also one of the solutions. For instance, local fisher discriminant analysis (LFDA) [23] effectively combines the ideas of locality preserving projections (LPPs) [24] and LDA; it attains inter-class separation and intra-class local structure preservation by defining a local intra-class and a local inter-class scatter matrix to handle the multimodal classes problem. Locality sensitive discriminant analysis (LSDA) [25] projects the dataset to a lower-dimensional subspace to preserve local manifold structure and discriminant information. Neighbors with the same label should be as close as possible; that is the aim of LFDA and LSDA. Nonparametric discriminant analysis (NDA) [9] solves the multimodal problem of LDA by using k-nearest neighbors, which constructs a kind of nonparametric inter-class divergence in a local area. Nonparametric discriminant analysis for face recognition [26] extends NDA to address multi-class situations. Recently, nLDA [7] directly defined the scatter matrix on the neighborhood to solve the multimodal classes problem and achieved remarkable results.

To handle the non-linearity problem, many LDA-based approaches are extended to their kernel versions to learn the nonlinear structures by the powerful tool of the kernel trick [27]; for example, generalized discriminant analysis (GDA) [11] detailed the deduction of kernel LDA. Kernel Fisherface (KLDA) [28], kernel direct-LDA (KDDA) [13], null space kernel LDA (NKDA) [12] and complete discriminant evaluation and feature extraction (CDEFE) [29] are the kernel versions of PCA+LDA, DLDA, NDA and ERE, respectively. In the previous studies, researchers worked out an alternative formulation of kernel LPP (KLPP) used to develop a framework for KPCA+LPP algorithms. Ref. [30] achieved good results in recognition tasks, such as face recognition and radar target recognition. In solving the nonlinear problems encountered in image classification, the KAHISD/KCHISD of the Cevikalp and Triggs [31] article is an extension of the affine/convex hull-based image classification task to its kernel version. Zhu et al. [32] proposed the KCH-ISCRC with kernel tricks, which well addresses collaborative image set-based representation and classification (ISCRC). It is easy to see that Gaussian kernels can successfully solve nonlinear problems in various applications [28,29,31,32].

LDA has new developments with recent technologies. Dorfer et al. [33] proposed a deep linear discriminant analysis with the deep neural network. Alarcón and Destercke [34] proposed a Gaussian discriminator to fuse the near-ignorance priors and robust Bayesian analysis. Belous et al. [35] proposed a framework called dual spatial discriminative projection learning and successfully applied it to image classification tasks. Hu et al. [36] proposed a cross-modal discriminative network for cross-modal learning.

3. LDA and nLDA

In this section, we briefly introduce the algorithm of LDA [1] and nLDA [7].

3.1. LDA

The idea and algorithm of LDA are simple and intuitive. Let

X = [x_{1}, x_{2}, x_{3}, \dots, x_{N}] \in R^{D \times N}

denotes the D-dimensional training dataset, where N is the number of total training samples and D is the dimension of each sample. The class label of the training sample

x_{i} \in R^{D}

is denoted by

y_{i}

with

y (x_{i}) \in (1, 2, \dots, C)

, where C represents the total class number of samples. We use

N_{i}

to describe the number of training samples in class i. The execution of LDA is to use the most discriminative projection matrix

ϱ = [ϱ_{1}, ϱ_{2}, ϱ_{3} \dots ϱ_{d}] \in R^{D \times d}

(

d \leq D

) to project the training set

X \in R^{D \times N}

from high-dimensional space to a lower-dimensional feature space by:

ϱ^{T} X \in R^{d \times N} .

(1)

The optimal projected matrix

ϱ

of LDA can be obtained by maximizing the equation:

J_{L D A} (ϱ) = \frac{ϱ^{T} S_{b} ϱ}{ϱ^{T} S_{w} ϱ} .

(2)

where the between-class scatter matrix

S_{b}

and within-class scatter matrix

S_{w}

are defined, respectively, by:

S_{b} = \frac{1}{C} \sum_{i = 1}^{C} (u_{i} - u) {(u_{i} - u)}^{T} .

(3)

S_{w} = \sum_{j = 1}^{C} \sum_{y (x_{i}) = j} (x_{i} - u_{j}) {(x_{i} - u_{j})}^{T} .

(4)

where

u_{i}

is the mean vector of the i-th class:

u_{i} = \frac{1}{N_{i}} \sum_{y (x_{j}) = i} x_{j} .

(5)

u is the mean of all training samples:

u = \frac{1}{N} \sum_{i = 1}^{N} x_{i} .

(6)

In LDA, Equation (2) can be reduced to the following eigen-decomposition problem:

S_{b} ϱ = λ S_{w} ϱ .

(7)

By retaining d eigenvectors corresponding to the largest eigenvalues, the most discriminative projection matrix

ϱ

is formed.

3.2. nLDA

LDA constructs the within-class scatter matrix with the information provided by the whole class, while nLDA defines the local neighborhood scatter by using the reverse k-nearest neighbor (

R N N_{k}

) set. The nearest neighbors (NN) represents the samples with high similarity. For N samples in dataset

X

, the k-NN of point

x_{p} \in X

is defined by

N N_{k} (x_{p}, X)

, which ranks the similarities of

x_{p}

to the other points in

X

from high to low and selects the top k nearest samples to represent the nearest neighbors of point

x_{p}

. The Euclidean distance is the common approach for measuring the similarity of k-NN. The definition of the

R N N_{k}

[37] is based on the k-NN. Given a dataset

X

and the point

x_{p} \in X

, the definition of the

R N N_{k}

set of

x_{p}

is presented by:

\begin{matrix} R N N_{k} (x_{p}, X) = \{x_{q} ∣ x_{q} \in X \ \{x_{p}\}, x_{p} \in N N_{k} \{x_{q}, X\}\} . \end{matrix}

(8)

The number of the reverse nearest neighbors may not be equal to k. It depends on the distribution of the sample’s data. In some exceptional cases, the number of reverse nearest neighbors is empty. Read [7] for more detailed information.

For the training set

X = [x_{1}, x_{2}, x_{3}, \dots, x_{N}]

, the associated label of

x_{i}

is

y_{i} \in {1, \dots, C}

. In the projected space, nLDA has the same optimization objective as LDA, that is, to find the optimal projection matrix

ϱ

which maximizes the equation:

J_{n L D A} (ϱ) = \frac{ϱ^{T} S_{b}^{n L D A} ϱ}{ϱ^{T} S_{w}^{n L D A} ϱ} .

(9)

where

S_{b}^{n L D A}

and

S_{w}^{n L D A}

represent the between-neighborhood scatter matrix and within-neighborhood scatter matrix, respectively. In [7], to avoid the consuming calculation, the approximate between-neighborhood scatter matrix is preferred. They are defined by Equations (10) and (11).

\begin{matrix} S_{b}^{n L D A} = & \sum_{\begin{matrix} i = 1 \\ ∣ R N N_{k} (x_{i}, X_{y_{i}}) ∣ \geq t \end{matrix}}^{N} \sum_{\begin{matrix} x_{j} \in N N (x_{i}, X - X_{y_{i}}) \\ | R N N_{k} (x_{j}, X_{y_{j}}) ∣ \geq t \\ y_{i} \neq y_{j} \end{matrix}} (u_{i} - u_{j}) {(u_{i} - u_{j})}^{T} . \end{matrix}

(10)

\begin{matrix} S_{w}^{n L D A} = & \sum_{\begin{matrix} i = 1 \\ ∣ R N N_{k} (x_{i}, X_{y_{i}}) ∣ \geq t \end{matrix}}^{N} \sum_{x_{j} \in R N N_{k} (x_{i}, X_{y_{i}})} (x_{j} - u_{i}) {(x_{j} - u_{i})}^{T} . \end{matrix}

(11)

where

|R N N_{k} (x_{j}, X_{y_{j}})|

is the number of local samples obtained from the RNN and

X_{y_{j}}

is the set of image samples with the same label as sample

x_{j}

. In

|R N N_{k} (x_{j}, X_{y_{j}})| \geq t

, t is the threshold of the number of RNN samples, i.e., when the number of RNN in

R N N_{k} (x_{i}, X_{y_{i}})

reaches t, the local neighborhood is constructed and the samples in the local neighborhood are used to calculate the mean vector

u_{i}

which is formulated by:

u_{i} = \frac{1}{|R N N_{k} (x_{i,} X_{y_{i}})|} \sum_{x_{j} \in R N N_{k} (x_{i}, X_{y_{i}})} x_{j} .

(12)

x_{j} \in N N (x_{i}, X - X_{y_{i}})

indicates that

x_{j}

is the nearest neighbor from other classes of

x_{i}

and

u_{j}

is the mean vector of the RNN of

x_{j}

in set

X_{y_{j}}

.

u_{j}

can be formulated by:

u_{j} = \frac{1}{|R N N_{k} (x_{j}, X_{y_{j}})|} \sum_{x_{i} \in R N N_{k} (x_{j}, X_{y_{j}})} x_{i} .

(13)

Then, maximizing Equation (9) can be reduced to the following generalized eigen-decomposition problem:

S_{b}^{n L D A} ϱ = λ S_{w}^{n L D A} ϱ .

(14)

By retaining d orthogonal eigenvectors according to the largest eigenvalues, the projection matrix

ϱ

can be used to extract discriminative features. Finally, the Euclidean distance between a test sample

x^{*}

and

u_{i}

is represented as

d i s t (ϱ^{T} u_{i}, ϱ^{T} x^{*})

, the label of

x^{*}

is determined by the nearest distance between

x^{*}

and the mean of each RNN set in the gallery data, which can be expressed as:

\begin{matrix} min_{|R N N_{k} (x_{i}, X_{y_{i}})| \geq t} dist (ϱ^{T} u_{i}, ϱ^{T} x^{*}) . \end{matrix} .

(15)

4. Kernel Reverse Neighborhood Discriminant Analysis

In this section, we detail the proposed kernel reverse neighborhood discriminant analysis (KRNDA). We use the kernel trick to deal with the non-linearity underlying the input data of nLDA. First, the kernel method based on nLDA requires the definition of a mapping for each observed sample x being mapped from the original feature space to a higher-dimensional feature space

F

:

φ : R^{D} \to F, x \to φ (x) .

(16)

Subsequently, the mapped training dataset of

X

can be represented as

φ (X) =

[φ (x_{1}), φ (x_{2}), \dots, φ (x_{N})]

. Usually, the mapping function is not explicitly specified [38], especially when the mapped feature space is infinite-dimensional. Hence, many kernel functions are defined for implicitly calculating the inner product of two image vectors in the mapped space

F

. For two images

x_{i}

and

x_{j}

, the inner product of them in the mapped space can be expressed as:

k (x_{i}, x_{j}) = 〈φ (x_{i}), φ (x_{j})〉 = φ {(x_{i})}^{T} φ (x_{j}) .

(17)

where

〈\cdot〉

is the inner product. How to select a suitable kernel function to implicitly calculate the inner product of two vectors in higher-dimensional feature space is a critical factor for kernel methods [39]. The typical and effective Gaussian kernel function is used in this paper:

k_{g} (x_{i}, x_{j}) = exp (- \frac{{∥x_{i} - x_{j}∥}_{2}^{2}}{2 σ^{2}}) .

(18)

One of the advantages of the Gaussian kernel is that it has only one free parameter

σ

, which is easily tuned.

The proposed KRNDA aims to find an optimal projection matrix

ω = [ω_{1}, ω_{2}, \dots, ω_{h}]

, where h is the reserved dimension of the final extracted feature, such that the mapped data in space

F

can be projected onto a more discriminative and lower-dimensional feature space by

ω^{T} φ (X)

. The optimal

ω

can be obtained by maximizing the following equation:

J_{K R N D A} (ω) = \frac{ω^{T} S_{b}^{K R N D A} ω}{ω^{T} S_{w}^{K R N D A} ω} .

(19)

where

S_{b}^{K R N D A}

is the between-neighborhood scatter matrix and

S_{w}^{K R N D A}

is the within-neighborhood scatter matrix in space

F

.

According to [11], the projection matrix

ω

can be linearly represented by all the training data in the mapped space

F

. Therefore we can obtain:

ω_{i} = \sum_{j = 1}^{N} α_{i j} φ (x_{j}) .

(20)

where

α_{i j}

denotes the coefficient of the linear combination. For a sample

x_{m}

in higher-dimensional space

φ (x_{m})

, the projected vector can be expressed by:

\begin{matrix} g_{m} = & ω^{T} φ (x_{m}) \\ = & {[ω_{1}, ω_{2}, \dots, ω_{h}]}^{T} φ (x_{m}) \\ = & {[ω_{1}^{T} φ (x_{m}), ω_{2}^{T} φ (x_{m}), \dots, ω_{h}^{T} φ (x_{m})]}^{T} . \end{matrix}

(21)

Obviously, the final extracted feature

g_{m}

does not seem to be explicitly expressed, since each

ω_{i}

and the

φ (x_{m})

contain the implicit mapping

φ (.)

. However, the kernel trick can circumvent this problem. According to Equations (17) and (20), each

ω_{i}^{T} φ (x_{m})

in Equation (21) can be calculated as:

\begin{matrix} {ω_{i}}^{T} φ (x_{m}) = & \sum_{j = 1}^{N} φ {(x_{j})}^{T} α_{i j}^{T} φ (x_{m}) = \sum_{j = 1}^{N} α_{i j} φ {(x_{j})}^{T} φ_{m} \\ = & α_{i 1} φ {(x_{1})}^{T} φ (x_{m}) + α_{i 2} φ {(x_{2})}^{T} φ (x_{m}) + \dots + α_{i N} φ {(x_{N})}^{T} φ (x_{m}) \\ = & [α_{i 1}, α_{i 2}, \dots, α_{i N}] [\begin{matrix} φ {(x_{1})}^{T} φ (x_{m}) \\ φ {(x_{2})}^{T} φ (x_{m}) \\ ⋮ \\ φ {(x_{N})}^{T} φ (x_{m}) \end{matrix}] \\ = & α_{i}^{T} K_{m} . \end{matrix}

(22)

where

α_{i} = {[α_{i 1}, α_{i 2}, \dots, α_{i N}]}^{T}

. According to Equation (17), we have

k (x_{i}, x_{m}) = φ {(x_{i})}^{T} φ (x_{m})

;

K_{m}

denotes the inner product between all training samples and the m-th training sample, in the mapped space

F

; then the extracted feature

g_{m}

can be explicitly expressed as:

\begin{matrix} g_{m} = & ω^{T} φ (x_{m}) \\ = & {[α_{1}^{T} K_{m}, α_{2}^{T} K_{m}, \dots, α_{h}^{T} K_{m}]}^{T} \\ = & α^{T} K_{m} . \end{matrix}

(23)

where

α = {[α_{1}, α_{2}, \dots, α_{h}]}^{T}

. We define the between-neighborhood scatter matrix of our KRNDA in Equation (19) and formulate the

ω^{T} S_{b}^{K R N D A} ω

as:

\begin{matrix} ω & ^{T} S_{b}^{K R N D A} ω \\ = & \sum_{\begin{matrix} i = 1 \\ |R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))| \geq t \end{matrix}}^{N} \sum_{\begin{matrix} φ (x_{j}) \in N N_{k} (φ (x_{i}), φ (X) - φ (X_{y_{i}})) \\ |R N N_{k} (φ (x_{j}), φ (X_{y_{j}}))| \geq t \\ y_{i} \neq y_{j} \end{matrix}} [ω^{T} (φ (u_{i}) - φ (u_{j})] {[ω^{T} (φ (u_{i}) - φ (u_{j})]}^{T} \\ = & α^{T} B α . \end{matrix}

(24)

where

\begin{matrix} B & = \sum_{\begin{matrix} i = 1 \\ |R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))| \geq t \end{matrix}}^{N} \sum_{\begin{matrix} φ (x_{j}) \in N N_{k} (φ (x_{i}), φ (X) - φ (X_{y_{i}})) \\ |R N N_{k} (φ (x_{j}), φ (X_{y_{j}}))| \geq t \\ y_{i} \neq y_{j} \end{matrix}} (P_{i} - P_{j}) {(P_{i} - P_{j})}^{T} \end{matrix}

(25)

In Equation (24),

φ (u_{i})

denotes the mean vector of the local neighborhood computed by reverse k-nearest neighbors (

R N N_{k}

) of

φ (x_{i})

in dataset

φ (X_{y_{i}})

with the same label

y_{i}

. Certainly, the number of the

R N N_{k}

should be equal to or larger than threshold t. Likewise,

φ (u_{j})

denotes the mean vector of the local neighborhood computed by

R N N_{k}

of

φ (x_{j})

in dataset

φ (X_{y_{j}})

with the same label

y_{j}

. It should be noted that the labels of these two

R N N_{k}

neighborhoods are different (

y_{i} \neq y_{j}

), which exhibits the between-class information. The calculation of

P_{i}

and

P_{j}

in Equation (25) can be formulated by calculating

ω^{T} φ (u_{i})

and

ω^{T} φ (u_{j})

. Using the relationship of Equation (23), we give the deduction of

ω^{T} φ (u_{i})

, for instance:

ω^{T} φ (u_{i}) = ω^{T} \frac{\sum_{φ (x_{m}) \in R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))} φ (x_{m})}{|R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))|} = α^{T} \frac{\sum_{φ (x_{m}) \in R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))} K_{m}}{|R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))|}

(26)

Hence,

P_{i} = \frac{\sum_{φ (x_{m}) \in R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))} K_{m}}{|R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))|}

. Similarly,

P_{j} = \frac{\sum_{φ (x_{m}) \in R N N_{k} (φ (x_{j}), φ (X_{y_{j}}))} K_{m}}{|R N N_{k} (φ (x_{j}), φ (X_{y_{j}}))|}

.

We employ the definition of the approximate between-neighborhood scatter in [7] to construct

S_{b}^{K R N D A}

; hence, for each

φ (u_{i})

, the number of the between-class reverse nearest neighborhoods is restricted to k by using

N N_{k} (φ (x_{i}), φ (X) - φ (X_{y_{i}}))

. This approach reduces the amount of computation when the dataset is very large [7]. The calculation of

R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))

follows the definition of the

R N N_{k}

set in Equation (8), where the distance of nearest neighbors is calculated in the mapped higher-dimensional feature space with the kernel trick.

Likewise, the within-neighborhood scatter can also be explicitly expressed by the kernel trick; the equation

ω^{T} S_{w}^{K R N D A} ω

in Equation (19) can be formulated as:

\begin{matrix} ω^{T} S_{w} & ^{K R N D A} ω \\ = & \sum_{\begin{matrix} i = 1 \\ |R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))| \geq t \end{matrix}}^{N} \sum_{φ (x_{j}) \in R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))} [ω^{T} (φ (x_{j}) - φ (u_{i}))] {[ω^{T} (φ (x_{j}) - φ (u_{i}))]}^{T} \\ = & α^{T} W α . \end{matrix}

(27)

where:

\begin{matrix} W = & \sum_{\begin{matrix} i = 1 \\ |R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))| \geq t \end{matrix}}^{N} \sum_{φ (x_{j}) \in R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))} & (K_{j} - Q_{i}) {(K_{j} - Q_{i})}^{T} \end{matrix}

(28)

where

Q_{i} = \frac{\sum_{φ (x_{m}) \in R N N (φ (x_{i}), φ (X_{y_{i}})} K_{m}}{|R N N_{k} (φ (x_{i}), φ (X_{y_{i}})) |}

is the mean of higher-dimensional features in

R N N_{k}

of

φ (x_{i})

with the same label

y_{i}

. According to Equations (24), (25), (27) and (28), the object function of Equation (19) can be rewritten as:

J_{K R N D A} (ω) = \frac{α^{T} B α}{α^{T} W α} .

(29)

Hence, the optimization of maximizing Equation (29) can be reduced to the generalized eigen-decomposition problem:

B α = λ W α .

(30)

The optimal projection matrix can be obtained by applying the eigen-decomposition to

{(W)}^{- 1} B

. We reserve h eigenvectors according to the largest eigenvalues to form the final coefficient matrix

α = [α_{1}, α_{2}, \dots, α_{h}]

.

At the testing stage, for a test sample

x^{*}

, we first map it to the higher-dimensional space

F

by

φ (x^{*})

, then the final extracted feature can be expressed as

ω^{T} φ (x^{*})

; according to the kernel trick, it can be rewritten as

α^{T} K^{*}

.

K^{*}

denotes the kernel vector of inner production between all gallery (or training) samples and test sample

x^{*}

in the mapped space

F

. We compare the Euclidean distance between

α^{T} K^{*}

and the projected mean vector

ω^{T} φ (u_{i})

, where

ω^{T} φ (u_{i})

can be explicitly rewritten by

α^{T} K_{u_{i}}

.

φ (u_{i})

is the mean vector of the i-th gallery RNN with the same label, and

K_{u_{i}}

denotes the kernel vector of inner production between all gallery samples and the mean vector

u_{i}

in the mapped space

F

. Therefore the label of test sample

φ (x^{*})

can be determined by the nearest Euclidean distance of:

\begin{matrix} min_{|R N N_{k} (φ (x_{i}), φ (X_{y_{i}}))| \geq t} dist (α^{T} K_{u_{i}}, α^{T} K^{*}) . \end{matrix}

(31)

5. Experiments

Experiments are conducted on two handwritten digit datasets and the University of California at Irvin (UCI) benchmark database. In addition, the COIL-20 (Columbia Object Image Library) [40] object recognition dataset and the ORL face recognition dataset [41] are also adopted for evaluation.

To verify the effectiveness of our method, we compare the performances of our KRNDA with the original nLDA. Several LDA-based discriminators are added for comparison, including LDA, LFDA, ccLDA and

L_{2, 1} - RLDA

. The LFDA is a traditional local discriminator which has been successfully applied to pedestrian re-identification [42]. We implemented the code of ccLDA [18], which uses clustering to solve the SSS problem.

L_{2, 1} - RLDA

is a discriminator with excellent discriminant effect and the performance of

L_{2, 1} -

norm is better than that of

l_{2}

-norm [43]. The codes of LFDA,

L_{2, 1} - RLDA

and nLDA are provided by the authors.

The parameters in the above discriminators are set as follows: for LDA, the final extracted features are preserved according to 95% energy of the eigenvalues. For LFDA, we set the parameter K = 2 to achieve the optimal effect in our experiments (K is a parameter used in the local scaling method). To avoid the singularity caused by the SSS problem, regularization is applied to the within-class scatter matrix:

S_{w} = S_{w} + τ I

, where

I

is the identity matrix and the

τ

is a small value, i.e,

10^{- 3} \times t r a c e (S_{w})

. For ccLDA, we follow the original experimental settings: K < C ( here, K is the cluster number and C is the class number) and 98% energy of the eigenvalues is reserved for experiments. In

L_{2, 1} - RLDA

, we use the original parameter settings provided by the authors.

There are four main parameters related to the proposed KRNDA. The first one is the parameter t, which controls the isolated points in Equations (24) and (27); we set

t = 1

according to the article [7], which has proved that it does not affect the performance much. We also applied this setting to our KRNDA in all experiments. The second parameter is the value of k in

R N N_{k}

; [7] has proved that the value of k has better performance in the range

{10, 20, 30, 40}

; hence, we adopted similar setting with nLDA in our KRNDA. The third parameter is the

σ

in Gaussian kernel, see Equation (18); we evaluate this parameter in Section 5.4 and apply the best value for our experiments. The last parameter is the dimension of the final extracted feature, similar to nLDA, and we preserve 95% energy of the eigenvalues to construct the final features. All of the experiments were implemented on an Intel(R) Core(TM) i7-11700K (3.60GHz) PC, using Matlab R2020b.

5.1. Experiments on Handwritten Digit Recognition

Two handwritten digit datasets, Mnist and USPS, are used in our evaluation. The purpose of KRNDA is to solve the nonlinear problem; however, it is the extension of the nLDA which performs well on the multimodal class problem. Therefore, we construct the classification problem of handwritten digit recognition as a binary classification task to verify the performance of KRNDA in solving the multimodal classes problem. For binary digit recognition, it is to determine whether the numbers are odd or even. The odd class is 1, 3, 5, 7 and 9, and the even class is 2, 4, 6, 8 and 10. Some digital images of the two classes are shown in Figure 1. As the five numbers in each of the odd class and even class are quite different, it can be considered that there are five subclasses (clusters) in each class.

The Mnist dataset has a total of 60,000 samples in the training set; the details of the training and testing samples are shown in Table 1. Adopting the whole set directly for training is time-consuming and may lead to being out of memory. Hence, we constructed subsets for our experiments. We randomly selected different numbers of samples for experiments and created four subsets: Mnist600, Mnist1000, Mnist6000 and Mnist10000. This means that 600, 1000, 6000 and 10,000 samples are selected from the total training set for training, respectively. For the corresponding testing sets, we randomly selected 100, 166, 1000 and 1666 samples from the original 10,000 testing samples to form four testing subsets for evaluation. The USPS dataset has a total of 9298 handwritten digit images. We conducted our experiments according to the protocol in [7], where the training set contains 7291 images and the testing set contains 2007 images, and the detailed training and testing numbers of each class are shown in Table 2.

Firstly, we compare the proposed method (KRNDA) with the Gaussian kernel function, sigmoid kernel function (SRNDA) and polynomial kernel function (PRNDA) on the USPS and Mnist datasets. We chose the parameters corresponding to the highest recognition rates during our experiments. For the Mnist datasets, we performed 10-fold cross-validations to obtain the average recognition rates and standard deviations.

From Table 3, we can find that the Gaussian kernel function outperforms the polynomial and sigmoid kernel functions. The poor performance of the sigmoid kernel function may be caused by the multimodal classes data. The Gaussian kernel demonstrates a better ability to correct the non-linearity and it has only one adjustable parameter, which is easy to use for evaluations. Consequently, in the following experiments of this paper, we employ the Gaussian kernel function with KRNDA for all evaluations.

Here, we present the recognition rates on the USPS handwritten digit dataset, and the average recognition rates and standard deviations of 10-fold cross-validations on Mnist handwritten digit datasets. As shown in Table 4, compared with the original nLDA, our KRNDA consistently achieves better recognition rates, especially on the Mnist600 and Mnist1000 subsets, because the small training subset may cause more non-linearity. These results demonstrate the effectiveness of the proposed KRNDA on handling the nonlinear data problem. We can see that the proposed KRNDA also outperforms other LDA-based methods, such as LDA, LFDA, ccLDA and

L_{2, 1} - RLDA

. This result illustrates that our KRNDA inherits the advantage in solving the multimodal classes problem.

5.2. Experiments on COIL-20 and ORL Datasets

In this subsection, we use the COIL-20 object dataset and the ORL face dataset to evaluate more complicated image classification tasks. The COIL-20 consists of 20 different objects, such as car, cat, cup and eraser, etc. For each object, there are 72 images of different viewpoints [44]. The images are normalized and resized to

20 \times 20

pixels. We randomly selected 50% of the images per class for training and the remaining images for testing. Some sample images are shown in Figure 2.

The ORL face dataset contains 400 images of 40 individuals. Images were captured at different times, and other variations of expression (open or closed eyes, smiling or non-smiling) and facial details (glasses or no glasses) [45] are involved. The face images of the ORL database are normalized and resized to

20 \times 20

pixels. We randomly selected 50% of the images per class for training and the remaining images for testing. Some sample images are shown in Figure 3.

For the COIL-20 object dataset and the ORL face dataset, we also performed 10-fold cross-validations to obtain the average recognition rates and standard deviations. As shown in Table 5, the proposed KRNDA outperforms the original nLDA, and the recognition rates reach the highest, 98.62% and 99.90% on ORL and COIL-20, respectively. We can see that, in Figure 2 and Figure 3, the COIL-20 dataset with multi-view samples causes much more distortion than the ORL dataset. Hence the accuracy increment of KRNDA on the COIL-20 dataset is larger than the ORL dataset compared to nLDA. Our KRNDA also performs well on face recognition which contains non-linearity caused by lighting and pose variation. These results have shown the effectiveness of the proposed kernel efficient solver. Due to the benefits of RNN and the non-linearity solver, the proposed KRNDA can handle different applications of pattern recognition.

5.3. Experiments on UCI Benchmark Datasets

We selected 40 benchmark datasets from the University of California at Irvine (UCI) machine learning repository [46] for our experiments. The details of the class number, sample number and feature dimension of each dataset are listed in Table 6. The number of classes is from 2 to 10. The number of features is from 2 to 256, which is less than the visual classification tasks. For each dataset, we randomly selected approximately 50% of the samples for training and the remaining samples for testing. We also employed the LDA-based discriminators, which were introduced previously, for comparison.

The classification results of the UCI datasets are reported in Table 7. In each row, the best classification accuracies are highlighted. As can be seen, the proposed KRNDA outperforms other discriminators in most cases, and the average classification accuracy of all datasets achieves 81.63%, which is better than all other discriminators. The degradation of KRNDA, such as in the datasets of Bupa, Sonar and breast-tissue, etc., may be caused by the lower non-linearity (caused by lower-dimensional features of UCI datasets) or the non-optimal value setting of parameter

σ

.

5.4. Parameter Evaluation

The parameter

σ

of the Gaussian kernel plays an important role in exhibiting the performance of the proposed KRNDA. It can be seen as the controller of handling the non-linearity. Therefore, we conducted experiments to tune the optimal value of

σ

. Experiments were conducted on a subset of the Mnist database; we randomly selected 600 samples for training and 100 samples for testing, and 10 cross-validations were conducted to obtain the mean accuracy.

Experimental results are shown in Figure 4. The Y-axis is the recognition accuracy and the X-axis is the value of

σ

. As can be seen, the best value

σ = 6

reaches the highest accuracy with 95.1%; hence, in most of our experiments, the value of

σ

is set as

σ = 6

. Certainly, this value is non-optimal for different applications and different datasets; however, it achieves considerable performances in most cases of our experiments.

6. Conclusions

In this paper, we extended nLDA to its kernel version for handling the non-linearity of the input data, which is called kernel reverse neighborhood discriminant analysis (KRNDA). We mapped the original samples to a higher-dimensional feature space for discriminative analysis, however, since the mapping function generalized by the specific kernel functions can not be explicitly expressed. We gave a comprehensive deduction of the kernel extension with the kernel trick. We evaluated the performances of our KRNDA on a multimodal classes problem of handwritten digit recognition with three different kernel functions; the Gaussian kernel function has better performances than the polynomial and sigmoid kernel functions; this result demonstrates the better capacity to correct the non-linearity by the Gaussian kernel. As a result, we evaluated the KRNDA with the Gaussian kernel function on visual recognition tasks of USPS and Mnist handwritten digits datasets, a COIL-20 object categorization dataset and an ORL face recognition dataset. The performances of the proposed KRNDA outperform the original nLDA and other previous LDA-based discriminators, such as LDA, LFDA, ccLDA and

L_{2, 1} - RLDA

. In addition, we evaluated our KRNDA on non-visual and lower-dimensional datasets of the UCI benchmark database. Due to lower non-linearity or the non-optimal value setting of parameter

σ

, the proposed KRNDA is inferior to other discriminators on some datasets of the UCI database; however, the average performance of KRNDA is still considerable.

Author Contributions

Conceptualization, W.L., H.T.; Methodology, W.L., H.T.; Software, W.L., M.X. and J.F.; Validation, J.D. and S.Y.; Data curation, W.L., M.X. and J.F.; Writing—original draft, W.L.; Writing—review & editing, W.L., H.T., J.D., S.Y. and G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Guangdong Province under Grant 2021A1515011859 and 2020A1515010423, in part by the National Natural Science Foundation of China under Grant 61701126, 62176071, in part by the research projects in Guangzhou University, China, under Grant RP2020123, and in part by the Guangzhou Basic Research Program Jointly Funded by City and University under Grant 202102010395.

Data Availability Statement

http://yann.lecun.com/exdb/mnist/ (accessed on 30 December 2022); https://github.com/hanzheteng/Pattern-Recognition/tree/master/usps (accessed on 30 December 2022); https://paperswithcode.com/dataset/orl (accessed on 30 December 2022); https://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php (accessed on 30 December 2022); http://archive.ics.uci.edu/ml/index.php (accessed on 30 December 2022).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef]
Fukunaga, K. Statistical pattern recognition. In Handbook of Pattern Recognition and Computer Vision; World Scientific: Singapore, 1993; pp. 33–60. [Google Scholar]
Wang, J.; Liu, Z.; Zhang, K.; Wu, Q.; Zhang, M. Robust sparse manifold discriminant analysis. Multimed. Tools Appl. 2022, 81, 20781–20796. [Google Scholar] [CrossRef]
Huang, K.K.; Ren, C.X.; Liu, H.; Lai, Z.R.; Yu, Y.F.; Dai, D.Q. Hyperspectral image classification via discriminative convolutional neural network with an improved triplet loss. Pattern Recognit. 2021, 112, 107744. [Google Scholar] [CrossRef]
Nkengfack, L.C.D.; Tchiotsop, D.; Atangana, R.; Louis-Door, V.; Wolf, D. EEG signals analysis for epileptic seizures detection using polynomial transforms, linear discriminant analysis and support vector machines. Biomed. Signal Process. Control. 2020, 62, 102141. [Google Scholar] [CrossRef]
Li, W.H.; Zhong, Z.; Zheng, W.S. One-pass person re-identification by sketch online discriminant analysis. Pattern Recognit. 2019, 93, 237–250. [Google Scholar] [CrossRef]
Zhu, F.; Gao, J.; Yang, J.; Ye, N. Neighborhood linear discriminant analysis. Pattern Recognit. 2022, 123, 108422. [Google Scholar] [CrossRef]
Zollanvari, A.; Dougherty, E.R. Generalized consistent error estimator of linear discriminant analysis. IEEE Trans. Signal Process. 2015, 63, 2804–2814. [Google Scholar] [CrossRef]
Fukunaga, K.; Mantock, J. Nonparametric discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1983, 6, 671–678. [Google Scholar] [CrossRef]
Tao, Y.; Yang, J.; Chang, H. Enhanced iterative projection for subclass discriminant analysis under EM-alike framework. Pattern Recognit. 2014, 47, 1113–1125. [Google Scholar] [CrossRef]
Baudat, G.; Anouar, F. Generalized discriminant analysis using a kernel approach. Neural Comput. 2000, 12, 2385–2404. [Google Scholar] [CrossRef]
Liu, W.; Wang, Y.; Li, S.Z.; Tan, T. Null space-based kernel fisher discriminant analysis for face recognition. In Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Republic of Korea, 17–19 May 2004; pp. 369–374. [Google Scholar]
Lu, J.; Plataniotis, K.N.; Venetsanopoulos, A.N. Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans. Neural Netw. 2003, 14, 117–126. [Google Scholar] [PubMed]
Li, Y.; Tian, X.; Liu, T.; Tao, D. On better exploring and exploiting task relationships in multitask learning: Joint model and feature learning. IEEE Trans. Neural Networks Learn. Syst. 2017, 29, 1975–1985. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Shi, K.; Zhang, K.; Ou, W.; Wang, L. Discriminative sparse embedding based on adaptive graph for dimension reduction. Eng. Appl. Artif. Intell. 2020, 94, 103758. [Google Scholar] [CrossRef]
Zhang, L.; Liu, Z.; Pu, J.; Song, B. Adaptive graph regularized nonnegative matrix factorization for data representation. Appl. Intell. 2020, 50, 438–447. [Google Scholar] [CrossRef]
Friedman, J.H. Regularized discriminant analysis. J. Am. Stat. Assoc. 1989, 84, 165–175. [Google Scholar] [CrossRef]
Pang, Y.; Wang, S.; Yuan, Y. Learning regularized LDA by clustering. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 2191–2201. [Google Scholar] [CrossRef]
Yu, H.; Yang, J. A direct LDA algorithm for high-dimensional data—with application to face recognition. Pattern Recognit. 2001, 34, 2067–2070. [Google Scholar] [CrossRef]
Chen, L.F.; Liao, H.Y.M.; Ko, M.T.; Lin, J.C.; Yu, G.J. A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit. 2000, 33, 1713–1726. [Google Scholar] [CrossRef]
Jiang, X.; Mandal, B.; Kot, A. Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 383–394. [Google Scholar] [CrossRef]
Yan, S.; Xu, D.; Zhang, B.; Zhang, H.J.; Yang, Q.; Lin, S. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 29, 40–51. [Google Scholar] [CrossRef]
Sugiyama, M. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J. Mach. Learn. Res. 2007, 8, 1027–1061. [Google Scholar]
He, X.; Niyogi, P. Locality preserving projections. Adv. Neural Inf. Process. Syst. 2003, 16, 153–160. [Google Scholar]
Cai, D.; He, X.; Zhou, K.; Han, J.; Bao, H. Locality sensitive discriminant analysis. In Proceedings of the IJCAI, Hyderabad, India, 6–12 January 2007; Volume 2007, pp. 1713–1726. [Google Scholar]
Li, Z.; Lin, D.; Tang, X. Nonparametric discriminant analysis for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 755–761. [Google Scholar] [PubMed]
Breneman, J. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Yang, M.H. Kernal Eigenfaces vs. Kernal Fisherfaces: Face Recognition Using Kernal Methods, Automatrix Face and Gesture Recognition, 202. In Proceedings of the Fourth International Conference on Computation in Electromagnetics—CEM 2002, Bournemouth, UK, 8–11 April 2002; pp. 208–213. [Google Scholar]
Jiang, X.; Mandal, B.; Kot, A. Complete discriminant evaluation and feature extraction in kernel space for face recognition. Mach. Vis. Appl. 2009, 20, 35–46. [Google Scholar] [CrossRef]
Feng, G.; Hu, D.; Zhang, D.; Zhou, Z. An alternative formulation of kernel LPP with application to image recognition. Neurocomputing 2006, 69, 1733–1738. [Google Scholar] [CrossRef]
Cevikalp, H.; Triggs, B. Face recognition based on image sets. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2567–2573. [Google Scholar]
Zhu, P.; Zuo, W.; Zhang, L.; Shiu, S.C.K.; Zhang, D. Image set-based collaborative representation for face recognition. IEEE Trans. Inf. Forensics Secur. 2014, 9, 1120–1132. [Google Scholar] [CrossRef]
Dorfer, M.; Kelz, R.; Widmer, G. Deep linear discriminant analysis. arXiv 2015, arXiv:1511.04707. [Google Scholar]
Alarcón, Y.C.C.; Destercke, S. Imprecise gaussian discriminant classification. In Proceedings of the International Symposium on Imprecise Probabilities: Theories and Applications, PMLR, Ghent, Belgium, 3–6 July 2019; pp. 59–67. [Google Scholar]
Belous, G.; Busch, A.; Gao, Y. Dual subspace discriminative projection learning. Pattern Recognit. 2021, 111, 107581. [Google Scholar] [CrossRef]
Hu, P.; Peng, X.; Zhu, H.; Lin, J.; Zhen, L.; Wang, W.; Peng, D. Cross-modal discriminant adversarial network. Pattern Recognit. 2021, 112, 107734. [Google Scholar] [CrossRef]
Korn, F.; Muthukrishnan, S. Influence sets based on reverse nearest neighbor queries. ACM Sigmod Rec. 2000, 29, 201–212. [Google Scholar] [CrossRef]
Yang, M.; Zhu, P.; Van Gool, L.; Zhang, L. Face recognition based on regularized nearest points between image sets. In Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China, 22–26 April 2013; pp. 1–7. [Google Scholar]
Tan, H.; Gao, Y. Kernelized Fast Algorithm for Regularized Hull-Based Face Recognition With Image Sets. IEEE Access 2018, 6, 36395–36407. [Google Scholar] [CrossRef]
Nene, S.; Nayar, S. Columbia Object Image Library (Coil-20). Technical Report CUCS-005-96. 1996. Available online: https://www1.cs.columbia.edu/CAVE/publications/pdfs/Nene_TR96.pdf (accessed on 30 December 2022).
Samaria, F.S.; Harter, A.C. Parameterisation of a stochastic model for human face identification. In Proceedings of the 1994 IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA, 5–7 December 1994; pp. 138–142. [Google Scholar]
Pedagadi, S.; Orwell, J.; Velastin, S.; Boghossian, B. Local fisher discriminant analysis for pedestrian re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3318–3325. [Google Scholar]
Nie, F.; Wang, Z.; Wang, R.; Wang, Z.; Li, X. Towards Robust Discriminative Projections Learning via Non-Greedy ℓ_2,1-Norm MinMax. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 2086–2100. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Ma, Z.; Niu, G. Mixed region covariance discriminative learning for image classification on riemannian manifolds. Math. Probl. Eng. 2019, 2019, 1261398. [Google Scholar] [CrossRef]
Hu, H.; Zhang, P.; Ma, Z. Direct kernel neighborhood discriminant analysis for face recognition. Pattern Recognit. Lett. 2009, 30, 902–907. [Google Scholar] [CrossRef]
Dua, D.; Graff, C. UCI Machine Learning Repository; University of California: Irvine, CA, USA, 2017; Available online: http://archive.ics.uci.edu/ml (accessed on 30 December 2022).

Figure 1. The digit samples in USPS and Mnist; the left is odd digits and right is even digits.

Figure 2. Examples of three objects from the COIL-20 database.

Figure 3. Examples of three subjects from the ORL face database.

Figure 4. The evaluation of parameter

σ

.

Figure 4. The evaluation of parameter

σ

.

Table 1. Training and test sample numbers of the Mnist dataset.

Class	Number of Training Set	Number of Test Set
0	5923	980
1	6742	1135
2	5958	1032
3	6131	1010
4	5842	982
5	5421	892
6	5918	958
7	6265	1028
8	5851	974
9	5949	1009

Table 2. Training and test sample numbers of the USPS dataset.

Class	Number of Training Set	Number of Test Set
0	644	177
1	1194	359
2	1005	264
3	731	198
4	658	166
5	652	200
6	556	160
7	664	170
8	645	147
9	542	166

Table 3. The accuracy (%) comparison of different kernel method recognition problems.

Data	USPS	Mnist600	Mnist1000	Mnist6000	Mnist10000
SRNDA	81.96	57.50 ± 4.43	55.24 ± 4.38	58.71 ± 3.60	58.62 ± 2.63
PRNDA	96.81	91.30 ± 3.10	93.01 ± 2.88	95.55 ± 0.73	96.21 ± 0.29
KRNDA	97.65	95.70 ± 2.05	96.20 ± 1.55	98.15 ± 0.34	98.06 ± 0.32

Table 4. The accuracy (%) comparison of USPS and Mnist digit recognition problems.

Data	USPS	Mnist600	Mnist1000	Mnist6000	Mnist10000
LDA	89.83	73.80 ± 4.53	77.53 ± 2.69	82.11 ± 1.23	82.39 ± 0.79
LFDA	94.47	77.50 ± 2.58	83.01 ± 2.91	83.92 ± 1.01	85,59 ± 1.20
ccLDA	89.93	70.60 ± 7.26	79.22 ± 2.52	82.52 ± 0.81	82.70 ± 1.29
$L_{2, 1} - RLDA$	96.71	90.86 ± 2.03	92.97 ± 1.80	96.63 ± 0.48	97.17 ± 0.29
nLDA	96.66	81.40 ± 4.13	89.10 ± 1.71	95.47 ± 0.94	96.16 ± 0.43
KRNDA	97.65	95.70 ± 2.05	96.20 ± 1.55	98.15 ± 0.34	98.06 ± 0.32

Table 5. The accuracy (%) comparison of ORL and COIL-20 datasets recognition problems.

Data	ORL	COIL-20
LDA	96.87 ± 0.97	95.52 ± 0.85
LFDA	98.58 ± 0.78	96.38 ± 1.00
ccLDA	94.30 ± 1.79	93.82 ± 0.60
$L_{2, 1} - RLDA$	97.23 ± 1.63	99.38 ± 0.21
nLDA	96.78 ± 1.46	96.39 ± 0.67
KRNDA	98.62 ± 0.73	99.90 ± 0.13

Table 6. The details of UCI benchmark datasets.

Dataset	Classes	Samples	Feature
Aggregation	7	45,170,102,273,34,130,34	2
Bupa	2	145,200	6
Indian	2	167,416	10
Robotnavigation	4	826,2097,2205,328	24
Sonar	2	97,111	60
balance-scale	3	49,288,288	4
balloons	2	9,7	4
bank	2	4000,521	16
breast-cancer	2	201,85	9
breast-tissue	6	21,15,18,16,14,22	9
acute	2	61,59	6
splice	3	767,768,1655	60
vertebral-column-2clases	2	210,100	6
Indian	2	167,416	10
acute_nephritis	2	70,50	6
car	4	65,1210,384,69	6
cardiotocography-3clases	3	1655,295,176	21
chess-krvkp	2	1527,1669	36
climate-simulation	2	494,46	18
blood	2	570,178	5
arrhythmia	2	3664,3736	20
waveform-noise	3	1692,1653,1655	40
mammographic	2	516,445	5
cylinder-bands	2	200,312	35
fertility	2	88,12	9
iris	3	50,50,50	4
knowledge	3	42,55,75	5
ionosphere	2	126, 226	34
ozone	2	2463,73	72
monks-3	2	266,288	6
nursery	4	4320,328,4266,4044	8
parkinsons	2	48,147	23
promoters	2	53,53	57
semeion	10	161,162,159,159,161,159,161,158,155,158	256
statlog-australian-credit	2	222,468	14
tic-tac-toe	2	332,626	9
statlog-german-credit	2	700,300	24
pittsburg-bridges-TYPE	6	16,11,44,13,11,10	7
mushroom	2	4208,3916	22

Table 7. The classification accuracy (%) of different discriminators on UCI benchmark datasets.

Dataset	LDA	LFDA	ccLDA	$L_{2, 1} - RLDA$	nLDA	KRNDA
Aggregation	99.49	99.49	99.49	99.49	99.75	99.75
Bupa	59.88	55.23	62.21	57.56	72.09	55.24
Indian	66.32	63.23	57.39	64.60	67.01	70.10
Robotnavigation	71.74	86.47	84.42	85.48	85.37	87.02
Sonar	76.92	79.81	83.65	77.88	76.92	80.77
balance-scale	87.82	89.74	79.81	78.21	85.58	89.10
balloons	37.50	50.00	50.00	50.00	62.50	75.00
bank	83.98	86.64	86.59	85.49	86.64	88.32
breast-cancer	58.74	62.94	64.34	64.34	63.64	65.74
breast-tissue	49.06	66.04	60.38	56.60	50.94	15.09
acute	100.00	100.00	100.00	100.00	100.00	100.00
splice	74.11	71.16	63.70	62.13	75.49	83.70
vertebral-column-2clases	75.48	78.06	78.06	81.29	80.00	81.94
Indian	66.32	63.23	57.39	64.60	67.01	70.10
acute_nephritis	100.00	100.00	100.00	100.00	100.00	100.00
car	79.98	94.56	93.63	93.52	94.68	96.41
balance-scale	87.82	89.74	79.81	78.21	85.58	89.10
cardiotocography-3clases	87.49	87.39	88.81	88.33	88.43	82.88
chess-krvkp	91.24	95.62	89.99	87.11	94.87	95.49
climate-simulation	91.85	90.00	86.67	87.04	91.48	90.37
blood	71.66	72.99	71.93	73.53	75.40	66.84
arrhythmia	69.78	73.46	72.27	72.11	87.86	83.00
waveform-noise	80.56	77.92	61.68	75.60	79.92	84.52
mammographic	71.25	77.08	75.00	77.71	74.17	74.79
cylinder-bands	64.84	66.80	68.75	68.75	68.36	58.59
fertility	84.00	74.00	80.00	82.00	80.00	86.00
iris	97.33	94.67	94.67	94.67	85.14	96.00
knowledge	93.02	89.53	88.37	74.42	96.00	82.56
ionosphere	85.14	83.43	83.43	82.29	61.86	89.71
ozone	95.35	95.43	95.98	94.95	96.37	97.79
monks-3	83.75	97.83	82.31	88.45	97.83	98.56
nursery	86.05	91.56	88.85	82.89	93.78	95.38
parkinsons	83.51	76.29	84.54	79.38	82.47	85.54
promoters	62.26	62.26	52.83	77.36	62.26	75.47
semeion	83.92	85.68	65.58	89.57	88.44	93.72
statlog-australian-credit	53.04	61.74	55.07	58.84	60.29	64.35
tic-tac-toe	97.91	98.54	98.54	100.00	98.54	98.96
statlog-german-credit	64.20	69.40	68.60	66.80	68.80	72.20
pittsburg-bridges-TYPE	51.92	46.15	48.08	42.31	38.46	46.15
mushroom	89.78	100.00	100.00	100.00	100.00	100.00
AVERAGE	77.88	80.10	77.57	78.59	80.60	81.63

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Tan, H.; Feng, J.; Xie, M.; Du, J.; Yang, S.; Yan, G. Kernel Reverse Neighborhood Discriminant Analysis. Electronics 2023, 12, 1322. https://doi.org/10.3390/electronics12061322

AMA Style

Li W, Tan H, Feng J, Xie M, Du J, Yang S, Yan G. Kernel Reverse Neighborhood Discriminant Analysis. Electronics. 2023; 12(6):1322. https://doi.org/10.3390/electronics12061322

Chicago/Turabian Style

Li, Wangwang, Hengliang Tan, Jianwei Feng, Ming Xie, Jiao Du, Shuo Yang, and Guofeng Yan. 2023. "Kernel Reverse Neighborhood Discriminant Analysis" Electronics 12, no. 6: 1322. https://doi.org/10.3390/electronics12061322

APA Style

Li, W., Tan, H., Feng, J., Xie, M., Du, J., Yang, S., & Yan, G. (2023). Kernel Reverse Neighborhood Discriminant Analysis. Electronics, 12(6), 1322. https://doi.org/10.3390/electronics12061322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kernel Reverse Neighborhood Discriminant Analysis

Abstract

1. Introduction

2. Related Work

3. LDA and nLDA

3.1. LDA

3.2. nLDA

4. Kernel Reverse Neighborhood Discriminant Analysis

5. Experiments

5.1. Experiments on Handwritten Digit Recognition

5.2. Experiments on COIL-20 and ORL Datasets

5.3. Experiments on UCI Benchmark Datasets

5.4. Parameter Evaluation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI