Next Article in Journal
Covert Channel Based on Quasi-Orthogonal Coding
Next Article in Special Issue
Black-Box Evasion Attack Method Based on Confidence Score of Benign Samples
Previous Article in Journal
RLFAT: A Transformer-Based Relay Link Forged Attack Detection Mechanism in SDN
Previous Article in Special Issue
Local Pixel Attack Based on Sensitive Pixel Location for Remote Sensing Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Few-Shot Learning for Multi-POSE Face Recognition via Hypergraph De-Deflection and Multi-Task Collaborative Optimization

1
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2
School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
3
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
*
Authors to whom correspondence should be addressed.
Electronics 2023, 12(10), 2248; https://doi.org/10.3390/electronics12102248
Submission received: 28 March 2023 / Revised: 6 May 2023 / Accepted: 10 May 2023 / Published: 15 May 2023
(This article belongs to the Special Issue AI-Driven Network Security and Privacy)

Abstract

:
Few-shot, multi-pose face recognition has always been an interesting yet difficult subject in the field of pattern recognition. Researchers have come up with a variety of workarounds; however, these methods make it either difficult to extract effective features that are robust to poses or difficult to obtain globally optimal solutions. In this paper, we propose a few-shot, multi-pose face recognition method based on hypergraph de-deflection and multi-task collaborative optimization (HDMCO). In HDMCO, the hypergraph is embedded in a non-negative image decomposition to obtain images without pose deflection. Furthermore, a feature encoding method is proposed by considering the importance of samples and combining support vector data description, triangle coding, etc. This feature encoding method is used to extract features from pose-free images. Last but not the least, multi-tasks such as feature extraction and feature recognition are jointly optimized to obtain a solution closer to the global optimal solution. Comprehensive experimental results show that the proposed HDMCO achieves better recognition performance.

1. Introduction

Face recognition is a very important technology with a wide range of applications, such as video surveillance, forensics, and security [1,2,3]. Pose change is one of the difficulties in face recognition. Posture changes involved in images can cause images of one person to look like images of other people. That is to say, the change in pose will lead to an increase in intra-class difference and a decrease in inter-class difference, which will hinder the classifier from correctly recognizing the face images. One study shows that the performance of most algorithms decreases by more than 10% from frontal-frontal to frontal-profile face verification; however, there is only a small drop in the recognition performance of the human eye [4]. Therefore, it is of great significance to study face recognition involving pose change.
Many methods have been proposed to solve the multi-pose face recognition problem [5,6,7,8,9,10,11]. These methods can be divided into the following categories: face normalization, feature representation, spatial mapping, and pose estimation.
The method based on face normalization can better identify the image by normalizing the image with attitude deflection to the front image or the image close to the front image. For example, Ding et al. [12] transform images with pose deflection into frontal images by pose normalization. Luan et al. [13] take geometry preservation into account in GAN networks and exploit perceptual loss constraints along with norm loss to obtain the frontal images that preserve global and local information. Liu et al. [14] use pixel-level loss, feature space perception loss, and identity-preserving loss to generate real class-invariant frontal images. Yin et al. [15] embed the contextual dependency and the local consistency into GAN networks to extract the frontal images. Lin et al. [16] use the deep representation alignment network to extract the pose-invariant face feature. Yang et al. [17] use the multi-bit binary descriptor to extract the pose-invariant feature. Tu et al. [18] jointly optimize image inpainting and image frontalization to deal with the recognition of low-resolution face images involving pose variations.
Learning the effective feature representations of images can be beneficial for tackling the task of classification. For example, Zhou et al. [19] use the divide-and-strategy to deal with the representation and classification of samples, which can reduce the challenge of posture. Zhang et al. [20] use locality-constrained and label information to enhance the representational power of regression-based methods. Gao et al. [21] use the multi-modal hashing and discriminative correlation maximization analysis for feature representation learning to allow them to obtain the easily distinguishable feature representation of each pose image. Yang et al. [22] learn the more discriminative feature representations by imposing penalties on weighted vectors. Huang et al. [23] use the samples and feature centers to enhance the similarity of features between samples of the same class.
The method based on spatial mapping can reduce the intra-class differences and increase the inter-class differences by mapping samples into a new space, which is beneficial for classification. For example, He et al. [24] use the identity consistency loss and the pose-triplet loss to minimize the intra-class and maximize the inter-class. Wang et al. [25] use the divergence loss to increase the diversity among multiple attention maps. Furthermore, the attention sparsity loss is used to highlight the regions with strong discriminative power. He et al. [26] reduce the difference between images with different modes by applying adversarial learning to both image-level and feature level. Liu et al. [27] use the source domain data to improve the performance of target domain data so that the poses of two images with the same category from the source and target domains are markedly different. Sun et al. [28] use the equalized margin loss to reduce the impact of unevenly distributed data (uneven distribution of attitude deflection).
It is also a good way to estimate the attitude deflection angle of the image and use the information of the deflection angle to recognize the image. For example, Zhang et al. [29] use the pose-guided margin loss to estimate the head poses, then the recognition process can be completed in the same pose. Badave et al. [30] use multiple cameras for pose estimation and then use the estimated pose to recognize the face images. Wang et al. [31] combine the learned sub-classifiers into a classifier with a strong performance by learning the dictionaries and the sub-classifiers at the same time.
The above methods have good effects on face recognition involving small posture deflection or face recognition with a large number of samples. However, most face recognition is either few-shot face recognition or face recognition involving large pose deflection. It is difficult for these methods to learn the intrinsic relationship between multiple samples of the same category in the process of model learning, and the essential attributes of the samples of the same category summarized by the learned model are incomplete or inaccurate.
Hypergraphs can represent complex relationships between objects. Unlike ordinary graphs (where each edge of an ordinary graph can only connect two nodes), each edge of a hypergraph can connect multiple nodes. That is, a hypergraph can reveal complex relationships between multiple nodes. Furthermore, non-negative matrix factorization is widely used in the field of computer vision, such as feature extraction. A matrix can be decomposed into two matrices with different properties by non-negative matrix factorization. Inspired by non-negative matrix factorization, we can decompose each image involving attitude changes through non-negative matrix decomposition, and one matrix obtained by the decomposition is used as the image without attitude deflection, and the other matrix is used as the attitude change matrix. The image without pose deflection is finally obtained through multiple iterative decompositions. Inspired by the hypergraph, we treat each image as a node in the hypergraph and embed the hypergraph formed by multiple images into a non-negative matrix decomposition to extract images with better performance and no attitude deflection. A few-shot multi-pose face recognition method based on hypergraph de-deflection and multi-task collaborative optimization (HDMCO) is proposed in this paper. First, HDMCO uses the hypergraph and non-negative matrix factorization to obtain the images that are approximately frontal. Then, a novel feature encoding method based on the improved support vector data description is proposed, and it is jointly optimized with a dictionary learning-based classifier for feature extraction and feature classification. Figure 1 shows the flowchart of the proposed HDMCO.
In the de-attitude deflection phase of Figure 1, the image without attitude deflection is separated from the image with attitude deflection by non-negative matrix decomposition. In this process, the hypergraph is embedded in the non-negative decomposition to protect the structural information of the image. In the feature extraction phase, the improved support vector data description is used to obtain the clustering center and radius of each cluster, and triangle coding is used to encode features for each patch. Then, image coding can be obtained. In the feature classification phase, the dictionary learning-based classifier recognizes the features of the image and then determines the category of the image.
The main innovations of this paper are as follows.
(1)
A novel multi-pose face recognition framework based on hypergraph de-deflection is proposed. The framework first isolates the pose-free deflection images, then utilizes the proposed feature coding method based on improved support vector data description to extract the features of the pose-free deflection images, and recognizes the extracted features.
(2)
A new feature encoding method based on improved support vector data description is proposed. The feature encoding method utilizes the improved support vector data description and triangle encoding to make the extracted features more discriminative.
(3)
An effective feature extraction and feature classification optimization model is constructed, which makes it easy to obtain a solution closer to the global optimum and helps to improve the recognition performance of the algorithm.
The subsequent sections of this article are arranged as follows: Section 2 introduces related studies. Section 3 describes the proposed method. Section 4 outlines the details of the experiments, and Section 5 presents the conclusion.

2. Relate Studies

This section will introduce some theories related to the proposed method. Specifically, few-shot face recognition, non-negative matrix factorization, and hypergraph theory will be introduced in turn.

2.1. Few-Shot Face Recognition

Few-shot face recognition has always been an interesting yet difficult research topic. Few-shot learning provides an effective solution to the very relevant and unavoidable problem of data scarcity in many applications. Prior knowledge is applied to small datasets so that few-shot learning can be generalized to new tasks and samples [32].
Researchers have proposed many methods to solve the problem of few-shot face recognition by using few-shot learning [33,34,35]. Masi et al. [36] propose the pose-aware model (PAM). PAM uses multiple networks to synthesize various pose images and uses the synthesized pose images to train the model to improve its recognition ability. However, this method needs a large amount of memory to store a large number of training images when using a variety of networks to generate a large number of images of various poses, so it is difficult to carry out during the actual process. Elharrouss et al. [37] propose the cascade networks (abbreviated as MCN) corresponding to multiple tasks to enhance the recognition ability of the recognition network for images involving pose variations. However, the diversity of attitude changes considered by this method is limited during model training, so the learned model is invalid when processing images involving other pose changes. Liu et al. [38] use multiple profile images to generate frontal images and use the Siamese network to learn the depth representation of the generated frontal images. The depth representation of the images is more easily recognized by the classifier, which helps to improve the recognition rate of the algorithm. Tao et al. [39] use the identity information of the images and the latent relationship between the frontal and profile images to model the distribution of the profile images and reduce the difference between the profile images and the frontal images. However, it is difficult to judge whether the underlying relationship between the frontal and profile images used is correct and comprehensive. Gao et al. [40] propose a multilayer locality-constrained structural orthogonal Procrustes regression (MLCSOPR) and use MLCSOPR to extract pose-robust features. This method only considers the horizontal change in the posture, but in practice, the image involves both the horizontal and vertical changes of the posture, so the application scope of this method is very narrow.

2.2. Non-Negative Matrix Factorization

Given any non-negative matrix X 0 , it can be decomposed into two non-negative matrices Y and P T .
{ min Y , P T X 0 Y P T F 2 s . t . Y 0 , P 0
where X 0 m × n is the non-negative matrix, Y m × r is the basis matrix, P T r × n is the submatrix.
Then, Y and P T can be updated by
{ Y i j Y i j ( X P ) i j ( Y P T P ) i j ( P T ) j k ( P T ) j k ( Y T X ) j k ( Y T Y P T ) j k

2.3. Hypergraph Theory

A hypergraph is very helpful for maintaining the internal structure of the data. Next, we will briefly introduce the hypergraph theory.
Hypergraph is defined as follows: Hypergraph G is an ordered binary group G = ( V , e ) , where V is a non-empty set with nodes or vertices as elements, which is called vertex set; e is a cluster of non-empty subsets whose elements are called hyperedges. Unlike ordinary graphs, each edge of the hypergraph can connect not only two vertices but also more vertices. Here, the hypergraph is undirected.
Given a hypergraph G = ( V , e ) , V = { v 1 , v 2 , , v k } is a set of finite data points, v i ( i = 1 , 2 , , k ) is a vertex. e = { e 1 , e 2 , , e t } is the set of hyperedges, e j is a hyperedge.
The hyperedge set e satisfies the following two conditions:
(a)
e j ϕ , j = 1 , 2 , , t ;
(b)
e 1 e 2 e 3 e t = V ;
Each hyperedge e j has a corresponding weight w j . Vertices hyperedges will form an association matrix H | V | × | e | , any element in H can be calculated by Equation (3):
H i j = { 1 , v i e j 0 , v i e j
To better understand the hypergraph theory, we take the hypergraph in Figure 2 as an example to illustrate the knowledge of the hypergraph. In Figure 2, the set of all vertices is denoted as V = { v 1 , v 2 , , v 8 } , e 1 = { v 1 , v 2 , v 3 } , e 2 = { v 4 , v 5 , v 6 } and e 3 = { v 7 , v 8 } denote the three hyperedges of G . The set of all hyperedges is denoted as e = { e 1 , e 2 , e 3 } . The value of each element in H can be obtained according to Equation (3) and shown in Figure 2. Each image serves as a data point and becomes a vertex in the hypergraph. Hyperedges are composed of several similar data points. Similar data points indicate images in which the contents of the images appear to be relatively close, such as two images of the same person with small differences in attitude.
The degree d i of each vertex in the hypergraph is defined as the sum of the weights of the hyperedges to which it belongs, and the degree ρ i of the hyperedges is defined as the number of nodes to which the hyperedge belongs. d i and ρ i are calculated as follows:
{ d i = j = 1 t w j H i j ρ i = i = 1 k H i j
Let D v denotes a diagonal matrix, whose main diagonal elements are D v i i = d i , where i = 1 , 2 , , k . Similarly, let D e and W be the diagonal matrices generated by ρ j and w j , respectively, where j = 1 , 2 , , t . Then, the non-regularized hypergraph Laplacian matrix can be calculated by Equation (5).
L H = D v H W D e 1 H

3. Proposed Method

In this section, we introduce the proposed method (the few-shot, multi-pose face recognition method based on hypergraph de-deflection and multi-task collaborative optimization). The main idea of the proposed method is as follows. First, we propose a feature discrimination enhancement method based on non-negative matrix factorization and hypergraph embedding and use it to extract near-frontal images from pose-deflected images. After that, we propose a feature encoding method based on improved support vector data description and use it to extract the distinguishing features. Meanwhile, those distinguishing features are classified by the dictionary learning-based classifier. When performing feature extraction and feature classification, these two processes are jointly optimized. Hence, we mainly introduce the feature discrimination enhancement method based on non-negative matrix factorization and hypergraph embedding, feature encoding method, dictionary learning-based classifier, joint optimization of the feature extraction, and feature classification.

3.1. Feature Discrimination Enhancement Method Based on Non-Negative Matrix Factorization and Hypergraph Embedding

Suppose a given dataset is denoted as Y m × n , and each column in Y represents an image sample. First, we apply a Gaussian filter to each image in Y to remove the noise in the image. Next, we check whether the pixel value of each image is negative, change the negative value to 0 for the negative values, and keep the original value for the positive values, then obtain Y W . After that, we construct the deregularized hypergraph Laplacian matrix L H of Y W . Assume that the number of hyperedges is t , the number of data in the hypergraph is N and t is equal to N . The number of vertices contained in each hyperedge is s . The vertices contained in each hyperedge are generated by Y n W itself and its nearest s 1 neighbors, where Y n W is the n t h column of Y W . L H is calculated according to Equation (5), where w j can be calculated by Equation (6).
w j = Y n 1 W , Y n 2 W e j exp ( Y n 1 W Y n 2 W δ 2 )
where δ = 1 s × t j = 1 t Y n 1 W , Y n 2 W e j Y n 1 W Y n 2 W .
After Y W and L H are obtained, the objective function is as follows.
{ min Y W Y P T F 2 + λ T r ( P T L H P ) s . t . Y 0 , P 0
where Y W m × n , Y m × n , P n × n , L H n × n , Y W Y P T F 2 represents the error resulting from the non-negative decomposition of Y W . T r ( P T L H P ) is the hypergraph regular term, which can protect the local geometric structure of the data and improve the performance of the algorithm. The value of λ is set to 0.3.
It is difficult to solve Equation (7) directly, so an iterative solution method is adopted to solve this problem. The Lagrangian function corresponding to Equation (8) is:
Δ = Y W Y P T F 2 + λ T r ( P T L H P ) + T r ( Ψ Y T ) + T r ( Φ P T )
where Ψ is the matrix formed by the Lagrange multipliers of Ψ m k for Y m k 0 , Φ is the matrix formed by the Lagrange multipliers of Φ n k for P m k 0 .
Δ in Equation (8) can be rewritten as
Δ = T r ( Y W T Y W ) T r ( Y W T Y P T ) T r ( P Y T Y W ) + T r ( P Y T Y P T ) + λ T r ( P T L H P ) + T r ( Ψ Y T ) + T r ( Φ P T ) = T r ( Y W T Y W ) 2 T r ( P Y T Y W ) + T r ( P Y T Y P T ) + λ T r ( P T L H P ) + T r ( Ψ Y T ) + T r ( Φ P T )
By taking the partial derivatives of Δ with respect to Y and P, respectively, we obtain
{ Δ Y = 2 Y W P + 2 Y P T P + Ψ Δ P = 2 Y W T Y + 2 P Y T Y + 2 λ L H P + Φ
According to the KKT conditions Ψ m k Y m k = 0 and Φ n k P n k = 0 , we obtain
( Y W P ) m k Y m k + ( Y P T P ) m k Y m k = 0
( Y W T Y ) n k P n k + ( P Y T Y ) n k P n k + λ ( L H P ) n k P n k = 0
In Equations (11) and (12), the subscript of each variable indicates the number of iterations of the variable.
Then, Y m k and P n k can be updated by the following two equations.
Y m k Y m k ( Y W P ) m k ( Y P T P ) m k
P n k P n k ( Y W T Y ) n k + ( λ H W D e 1 H P ) n k ( P Y T Y ) n k + ( λ D v P ) n k
The variables in Equations (13) and (14) have appeared before; please see the previous section for their definitions. The subscript of each variable indicates the number of iterations of the variable. ⊗ represents the element-wise multiplication of two matrices. The output Y is the image set with almost no attitude deflection. The features of each image with almost no attitude deflection can be obtained by using the proposed feature coding method, which has high-class discrimination.
Figure 3 shows the process of extracting near-frontal images from images involving pose variations. Y represents the original image set involving pose deflection, Y W represents the image set after preprocessing Y , Y represents the image set of the approximate frontal image obtained by decomposition and iteration, P represents the pose change matrix. In Figure 3, we first preprocess each image in the original image set to obtain a non-negative image set without noise pollution. Then, the hypergraph is embedded into the non-negative matrix factorization to preserve the structure of the decomposed images. Finally, the image set with almost no deflection is obtained through matrix factorization and multiple iterative updates.

3.2. Feature Coding Method Based on Improved Support Vector Data Description

The main idea of the proposed feature coding method based on improved support vector data description is as follows. First, we propose an improved support vector data description and use it to obtain the sphere center and radius of each cluster. After that, the radius and center of the ball corresponding to each cluster are used for feature encoding. The existing support vector data description considers that each data point plays the same role when calculating the radius of each cluster, which is not in line with reality. Hence, we assign a learned weight to each data in the model learning and propose an improved support vector data description; its model is as follows.
{ min r , χ r 2 + ς i = 1 n u m ρ ( y i ) χ i s . t . y i b 2 r 2 + χ i , χ i 0 , b = 1 n u m i = 1 n u m y i
where r is the radius of the ball, yi is the i t h sample, ρ ( y i ) is the weight of y i , b is the center of the ball, n u m is the number of the samples, χ i is the slack variable. ς is a parameter whose value is set to 0.4.
The weight of any sample is calculated as follows.
First, we divide the whole data set into C clusters, and assume that the sample set of the k th cluster is denoted as { y 1 k , y 2 k , , y P k k } , where y i k is the i th data point in { y 1 k , y 2 k , , y P k k } , i = 1 , 2 , , P k . P k is the number of data points in { y 1 k , y 2 k , , y P k k } , and y i k = [ v 1 k i , v 2 k i , , v d k i ] T d × 1 .
Denote the average distance between two data points in { y 1 k , y 2 k , , y P k k } as m k .
If the number of data points contained in { y 1 k , y 2 k , , y P k k } is greater than one, then
m k = 2 p k ( p k 1 ) i = 1 p k j = i + 1 p k d ( y i k , y j k )
d ( y i k , y j k ) = ( v 1 k i v 1 k j ) 2 + ( v 2 k i v 2 k j ) 2 + + ( v d k i v d k j ) 2
If the number of data points contained in { y 1 k , y 2 k , , y P k k } is equal to one, then
m k = 1 i = 1 , i k C p k t = 1 , t k C i = 1 P t d ( y 1 k , y i t )
Generally speaking, the distances between data points in the same cluster are far less than the distances between data points in different clusters. Thus, we assume that data points in the same cluster have the same weight.
ρ k = 1 m k i = 1 C m i
The Lagrange function of Equation (15) can be written as
L ˜ ( r , χ , α , β ) = r 2 + ς i = 1 n u m ρ ( y i ) χ i + i = 1 n u m α i { y i 1 n u m j = 1 n u m y j 2 r 2 χ i } i = 1 n u m β i χ i
Let L ˜ r = 0 and L ˜ χ i = 0 , we can obtain
{ min α 2 n u m α Q e α T Ω s . t .   α T e = 1
where Q = ( < y i , y j > ) n u m × n u m , Ω = ( < y i , y j > ) n u m × 1 , e = ( 1 , 1 , 1 , , 1 ) T , y i and y j are the i t h sample and j t h sample in the dataset with attitude deflection removed, respectively, α = [ α 1 , α 2 , , α n u m ] . α can be obtained by using the linear programming algorithm.
r can be obtained by using Equation (22).
r 2 = y i y j 2 i , j = 1 ϒ α i ( y i y j ) + i , j = 1 ϒ α i α j ( y i y j )
where ϒ is the set of support vectors, the sample points used in Equation (22) are the support vectors. Whether the data point is a support vector, the following condition needs to be met: if the data point y i is a support vector, its corresponding α i is non-zero. r = [ r 1 , r 2 , , r C ] , C is the number of clusters in the dataset.
Then, for each image with pose deflection removed, it is decomposed into N ˜ patches (each patch has the same size), and each patch is encoded. The schematic diagram of the image being divided into small pieces is shown in the Figure 4. For example, for an image q with attitude deflection removed, it is divided into N ˜ small patches. N ˜ is determined by our experience. For any small patch q j , j = 1 , 2 , , N ˜ , it can be encoded as U ( q j ) .
U ( q j ) = [ U 1 ( q j ) U 2 ( q j ) U C ( q j ) ] T
where U i ( q j ) = [ U i , 1 ( q j ) U i , 2 ( q j ) ] , i = 1 , 2 , , C , j = 1 , 2 , , N ˜ , U i , 1 ( q j ) and U i , 2 ( q j ) are obtained by the triangle coding. U i , 1 ( q j ) = max { 0 , d ( s ) s i ( q j ) } , s i ( q j ) = q j o i 2 represents the distance from q j to o i , d ( s ) is the mean of all s i ( q j ) values. U i , 2 ( q j ) = max { 0 , A ( m ) m i ( q j ) } , m i ( q j ) = r i k = 1 C r k , A ( m ) is the mean of all m i values.
Figure 5 shows the schematic diagram of the encoding. q j represents the j t h patch of the image q (The image q is divided into N ˜ patches). o i denotes the center of the SVDD sphere formed by the j t h cluster (multiple sample points are clustered into a cluster.), r i .
Hence, the image q can be encoded as F q , and the expression of F q is as follows.
F q = [ ( U ( q 1 ) ) T ( U ( q 2 ) ) T ( U ( q N ˜ ) ) T ] T

3.3. Dictionary Learning-Based Classifier

The de-deflection operations and feature encoding operations described above greatly reduce the influence of posture changes on face recognition. To further improve the recognition rate of the whole algorithm on this basis, we decided to learn the classifier, questioning which classifier can not only realize the learning function but also learn the characteristics related to the classified samples in the process of learning. Recent studies have shown that sparse representations have been successfully applied in many fields, such as image restoration and image classification. Dictionaries play an important role in sparse representation, and the quality of dictionaries greatly affects the performance of sparse representation. The latest research on dictionary learning shows that learning a desirable dictionary from the training data itself can usually yield good results for tasks on images or video [41]. Inspired by this, we are ready to learn the dictionary and use the learned dictionary to represent the test samples, and then determine the category of the test samples according to the representation residual.
The basic model of the dictionary learning-based classifier is as follows.
{ min D , Z X D Z F 2 + η Z 1 s . t . d i 2 2 1
where X is the training samples, D represents the dictionary to be learned, Z is the representation coefficient, d i represents the i t h atom in D . η is set to 0.3.

3.4. Joint Optimization of the Feature Extraction and Feature Classification

To obtain the globally optimal solution of HDMCO, we jointly optimized the feature extraction and feature classification.
The model for jointly optimizing the feature extraction and feature classification is as follows.
{ min α , D , Z X D Z F 2 + η ( 2 n u m α T Q e α T Ω ) Z 1 s . t . α T e = 1 , d i 2 2 1
According to Equation (26), we can obtain α , D and Z .
α can be obtained by using Equation (27).
{ min α η ( 2 n u m α T Q e α T Ω ) Z 1 s . t . α T e = 1
Then, the value of α can be obtained by using the linear programming algorithm.
D can be obtained by solving Equation (28).
{ min D X D Z F 2 s . t . d i 2 2 1
Solving Equation (28) can be converted to solving Equation (29).
{ D = arg min D X D Z F 2 + ϑ D V + J F 2 V = arg min V ϑ D V + J F 2 , s . t . v i 2 2 1 J = J + D V
where 𝜗 is set to 0.2.
Then, D can be obtained by iteratively solving the variables in Equation (29).
Z can be obtained by solving Equation (30).
min Z X D Z F 2 + η ( 2 n u m α T Q e α T Ω ) Z 1
The solution to Equation (30) is as follows.
Z = shrink ( D 1 X , η ( 2 n u m α T Q e α T Ω ) 2 )
where shrink ( x , a ) = signmax ( | x | a , 0 ) .
Figure 6 shows the schematic diagram of seeking a globally optimal solution.

4. Experiments

4.1. Dataset

Here, Multi-PIE [42], MegaFace [43], CAS-PEAL [44], YTF [45], CPLFW [46], and CVL [47] are used in experiments to verify the performance of HDMCO.
Multi-PIE mainly involves pose variations and illumination variations, and includes a total of more than 750,000 images of 337 different people. Figure 7a shows some samples of multi-PIE.
MegaFace is a challenging, large-scale face dataset. It contains the gallery set and the probe set. The gallery set contains more than 1 million face images, while the probe set contains 106,863 face images of 530 celebrities. Figure 7b shows some samples of MegaFace.
CAS-PEAL includes 99,450 images of 1040 different people, which mainly involve pose variations, expression variations, and lighting variations. Figure 7c shows some samples of CAS-PEAL.
YTF contains 3425 videos of 1595 subjects with diverse ethnicities. Figure 6d shows some samples of YTF.
CPLFW includes 11,652 images of 5749 different people, which mainly involves pose variations. Figure 7e shows some samples of CPLFW.
CVL contains 798 images of 114 different people, which mainly involves pose variations. Figure 7f shows some samples of CVL.

4.2. Experimental Results and Analysis

4.2.1. Comparison with State-of-the-Art Methods

Experimental Setup

Resnet [48], Duan’s method [49], PGM-Face [29], PCCycleGAN [14], LDMR [20], MH- DNCCM [21], DRA-Net [16], TGLBP [35], MCN [37], 3D-PIM [50] WFH [17], mCNN [51], HADL [31], RVFace [52], DTDD [53], ArcFace [54], VGG [55], and DeepID [56] are used as the comparison methods.
For multi-PIE, we choose images with pose deflection angles of 45 ° , 30 ° , 15 ° , 0 ° , 15 ° , 30 ° , 45 ° for experiments. In other words, a total of 2359 images of 337 subjects were used for the experiments. For images of each subject, we randomly selected three images for training and the remaining images for testing. It means that the number of training images accounted for 42.85% of the total number of images, and the number of testing images accounted for 57.15% of the total number of images.
For MegaFace, we selected the samples of categories with the number of images greater than or equal to two for experiments. For each class of samples used for experiments, we randomly selected one image for training and one image for testing. Namely, the number of training images accounted for 50% of the total number of images, and the number of testing images accounted for 50% of the total number of images.
For CAS-PEAL, we choose those images involving 800 subjects in three different poses ( 0 ° , 45 ° and 45 ° ) for experiments. That is to say, each subject contains three images deflected at different angles. The image with a deflection angle of 0 degrees in each subject is used for training and the rest are used for testing. Specifically, the number of training images accounted for 33.33% of the total number of images and the number of testing images accounted for 66.67% of the total number of images.
For YTF, we selected 226 subjects with four or more videos available. Then, we selected 225 subjects from 226 subjects for experiments and divided the 225 subjects into five groups, each group involving 45 subjects. For each group, the first three videos of each subject as gallery sets and the remaining videos for testing. The results obtained from the five groups of experiments are averaged as the final experimental result. The number of training samples accounted for 43.29% of the total number of images, and the number of testing samples accounted for 56.71% of the total number of images.
For CPLFW, we selected the samples of 2000 classes to form a subset. For samples belonging to a certain class (each class) in this subset, we randomly selected one image as the training sample and one image as the test sample. Precisely, the number of training images accounted for 50% of the total number of images, and the number of testing images accounted for 50% of the total number of images.
For CVL, we choose three images in each class for training and the rest for testing. That is to say, the number of training images accounted for 42.85% of the total number of images, and the number of testing images accounted for 57.15% of the total number of images.
The images used in the experiments are cropped to 60 × 80 .

Experimental Result

The Accuracies of different methods on different datasets are shown in Table 1. It can be seen from the experimental results on multi-PIE that ArcFace has the highest recognition rate, reaching 95.89%. This may be because the proportion of images involving large pose changes is relatively small, resulting in the difference between most training images and test images not being too large, and ArcFace can achieve good recognition results. Furthermore, almost all methods have achieved good results. The reason for this result is as follows. The Multi-PIE dataset involves relatively few images with large pose deflection. For example, the number of images with an attitude deflection of 45 degrees only accounts for two-sevenths of the total dataset, which means most of the images used for training have little difference from the test images. Then, the trained model can better identify the test samples.
For the experimental results on MegaFace, almost all methods based on deep learning achieved good results. The possible reasons are as follows: although the number of samples used for training in each category is not large, the difference between the large number of samples used for pre-training and the test samples is not too large. Thus, the final learned model has better classification ability for the test samples. Among all the methods, Duan’s method has the worst performance, which may be because the performance of the method depends on finding the parts related to the pose. However, it cannot completely and correctly determine which parts of the image are related to the pose. Furthermore, this method mainly solves face recognition involving pose changes, while the MegaFace dataset involves not only pose changes but also other changes, so the recognition rate of this method on MegaFace is not very high.
The experimental results on YTF show that the recognition rate of all algorithms does not exceed 85%. This is because YTF datasets involve large changes (e.g., large pose changes, large expression changes), so their performance is not very good. Specifically, for methods based on deep learning, the pre-trained model is not suitable for the classification of test images. This is because a large number of samples used for pre-training are quite different from the images in the used dataset. For HADL, because the samples used for training may be quite different from the samples used for testing, the learned dictionary cannot accurately represent the test samples, which means that the algorithm cannot obtain a higher recognition rate. For Duan’s method, because the samples used for training may be quite different from the samples used for testing, the learned characteristics of a certain category are quite different from those of the same category of images in the test set. Then, the recognition rate of the algorithm on the YTF dataset is not very high.
The experimental results on CPLFW show that the recognition rate of our method is higher than that of other algorithms. This may be because our proposed non-negative matrix factorization based on hypergraph embedding extracts the frontal images with better quality. In other words, we use hypergraph and non-negative matrix factorization to separate the frontal image from the profile image. The extracted pose-free features are then used to learn the dictionaries with strong performance, and the learned dictionaries are used to accurately represent the test samples, thereby greatly improving the recognition rate of the algorithm. The reasons why the recognition rate of the deep learning-based
Method is not as high as that of our method are as follows. A large number of samples used for pre-training are too different from the samples in the test set. For example, many samples used for pre-training are images with small attitude deflection, while many test samples are images with large attitude deflection. Then, the rules summarized for each category through training are not suitable for the rules of the same category of images in the test set.
The experimental results on CVL show that the recognition rate of our method is 92%, which is higher than that of other methods. The reasons for this result are as follows: the hypergraph is embedded in the non-negative matrix factorization so that the resulting images retain the intrinsic properties of the original images. Furthermore, triangular encoding is used to encode the obtained pose-free images, which makes the extracted features highly unique. Furthermore, we use the encoded features to train the dictionary, so that the learned dictionary has a stronger representation ability. Then, the test samples can be accurately represented by the dictionaries, thereby achieving the purpose of improving the recognition rate. The performance of deep learning-based methods is not the best among all methods, and the reasons for this result are as follows. The rules summarized for each category through pre-training are quite different from the rules of the same category of images in the test set. Therefore, the model obtained by training is not suitable for the classification of the test images, or the model obtained by training cannot correctly classify many test images. For HADL and LDMR, it is difficult for them to extract the pose-invariant features of the images when dealing with images with large pose changes, which makes it difficult for subsequent classifiers to correctly identify samples.
Table 2 and Table 3 show the recall and precision of different methods on different datasets. The experimental results obtained are generally consistent with those in Table 1; HDMCO has the best effect.

4.2.2. Cross-Validation Experiment

In order to further verify the performance of HDMCO, cross-validation experiments are carried out in this section. For each data set, we selected the face image with an attitude deflection angle greater than 45°, and 5-fold cross-validation was performed. Specifically, the data set was divided into five parts, four of which were taken as training data and one as test data in turn, and the experiment was carried out. Each trial obtained the corresponding recognition rate. The average recognition rate of the results of five times was used as the estimation of the algorithm accuracy.
As can be seen from Table 4, the average recognition rate of many algorithms on the multi-PIE data set and CAS-PEAL data set is more than 80%. At the same time, it can also be seen that the recognition rate of the proposed HDMCO is higher than that of other algorithms.

4.2.3. The Effect of Feature Dimension on the Recognition Performance of the Algorithms

To illustrate the effect of feature dimension on the recognition rate of our method, we conducted experiments. DDTD, HADL, and PCCycleGAN are used as comparison methods. The experimental conditions are the same as the experimental conditions in Section 4.2.1. The only difference is that the dimension of the features ranges from 100 to 600. Figure 8 shows the effect of feature dimension on the recognition rate of different methods. It can be seen from Figure 8 that the recognition rates of all methods first gradually increase with the feature dimension and then remain unchanged. Furthermore, the recognition performance of our method is better than other methods.

4.2.4. The Display of the Extracted Frontal Images

To illustrate that our method can effectively separate pose-free images from pose-deflected images, we show the obtained separated images. In Figure 9, the left half of each subfigure shows the original image, and the right half shows the pose-free deflection image separated from the original image. As can be seen from Figure 9, the separated images are close to the frontal image. This shows that the proposed feature discrimination enhancement method based on non-negative matrix factorization and hypergraph embedding can indeed achieve the de-pose function.

4.2.5. Ablation Experiment

To verify the role of each component in the proposed method, we performed ablation experiments. The main components of the method proposed in this paper are the “feature discrimination enhancement method based on non-negative matrix factorization and hypergraph embedding”, the “feature coding method based on improved support vector data description”, “dictionary learning-based classifier”, and “joint optimization of the feature extraction and feature classification”, which are abbreviated as de-deflection, feature coding, dictionary learning, and joint optimization.

Experimental Setup

The experimental conditions are the same as in Section 4.2.1.

Experimental Results

Figure 10 shows the results of ablation experiments. It can be seen from Figure 10a that using the de-deflection component can improve the recognition rate of the algorithm by about 2% on some datasets, and more on some datasets, such as 5% and 7%. As can be seen from Figure 10b, the use of feature encoding component improves the recognition rate of the algorithm by about 2% on almost all datasets. It can be seen from Figure 10c that the use of the dictionary learning component improves the recognition rate of the algorithm by about 1% on some datasets and by about 2% on others. It can be seen from Figure 10d that using the joint optimization component improves the recognition rate of the algorithm by about 3% on almost all datasets.

4.2.6. The Effect of Parameters on the Recognition Performance of HDMCO

In HDMCO, η and λ are the main parameters. To explore their impact on the recognition rate of HDMCO, we conducted experiments. The experimental conditions are the same as the experimental conditions in Section 4.2.1. The only difference is that η ranges from 0.1 to 0.6, and λ ranges from 0 to 1. Figure 11 shows the effect of the main parameters on the recognition rate of HDMCO. It can be seen from Figure 11 that the recognition rate of HDMCO is the highest when the value of η is about 0.3, and the recognition rate of HDMCO is the highest when the value of λ is about 0.5.

4.2.7. Comparison of Computational Complexity

In this section, we analyze the computational complexity of the proposed algorithm and compare it with the computational complexity of several existing methods. The computational complexity of HDMCO is mainly dervived from solving α using linear programming; meanwhile, the computational complexity of calculating α is o ( n 0 2 ) , and n 0 is the number of training samples. Thus, the computational complexity of HDMCO is o ( n 0 2 ) . HADL [27] and LDMR [19] are used as comparative methods. The computational complexity of HADL is O ( M τ ( K n 0 3 + K max ( L , K ) ) ) , where τ is the iteration number, and L is the dimension of each sample, K is the number of atoms in the dictionary. M is the maximum number of the iteration number. The computational complexity of LDMR is O ( u 0 v 0 n 0 2 + n 0 3 + τ ( u 0 v 0 2 + u 0 v 0 n 0 ) ) , and u 0 and v 0 are the width and height of the image, respectively. It is easy to see from the computational complexity expressions of the three algorithms that the computational complexity of HDMCO is n 0 2 , while the computational complexity of the other two algorithms is n 0 3 . Hence, HDMCO has low computational complexity. Meanwhile, for example, the running time of HDMCO on the multi-PIE database is 713.45 s, while the running time of HADL and LDMR are 4397.45 s and 5813.24 s. The configuration of our computer is as follows: Intel Core i7-9700 K, 3.6 GHz, Nvidia GeForce RTX 2080 Ti.

5. Conclusions

In this paper, we propose a novel few-shot, multi-pose face recognition method based on hypergraph de-deflection and multi-task collaborative optimization (HDMCO). HDMCO uses the hypergraph theory and non-negative matrix decomposition to separate the frontal images from the attitude deflection images, and then uses the improved support vector data description and triangle coding to extract the features of the separated images without attitude deflection. Dictionary learning-based classifier is then also used to classify those features. The feature extraction process and feature classification process are jointly optimized. The large number of experimental results show that the proposed HDMCO does achieve good results. Although we have jointly optimized feature extraction and feature classification and achieved better results, since the separation of frontal images is separate from the subsequent feature extraction, the obtained recognition result is still not the ultimate optimal result of HDMCO. In future work, we will continue to explore the joint optimization of the separation of frontal images and feature extraction to obtain the ultimate optimal recognition effect of HDMCO.

Author Contributions

Conceptualization, X.F.; Methodology, X.F.; Validation, M.L.; Formal analysis, M.L.; Data curation, L.C.; Writing—review & editing, X.F.; Supervision, L.C.; Funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by the Post-doctoral Innovative Talent Support Program (Grant no. BX20200048), and in part by the General Program of China Postdoctoral Science Foundation (Grant no. 2021M700405).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jeevan, G.; Zacharias, G.C.; Nair, M.S.; Rajan, J. An empirical study of the impact of masks on face recognition. Pattern Recognit. 2022, 122, 108308. [Google Scholar] [CrossRef]
  2. Solovyev, R.; Wang, W.; Gabruseva, T. Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis. Comput. 2021, 107, 104117. [Google Scholar] [CrossRef]
  3. Wu, C.; Ju, B.; Wu, Y.; Xiong, N.N.; Zhang, S. WGAN-E: A generative adversarial networks for facial feature security. Electronics 2020, 9, 486. [Google Scholar] [CrossRef]
  4. Sengupta, S.; Chen, J.C.; Castillo, C.; Patel, V.M.; Chellappa, R.; Jacobs, D.W. Frontal to profile face verification in the wild. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
  5. Khrissi, L.; El Akkad, N.; Satori, H.; Satori, K. Clustering method and sine cosine algorithm for image segmentation. Evol. Intell. 2022, 15, 669–682. [Google Scholar] [CrossRef]
  6. Zhao, J.; Xiong, L.; Cheng, Y.; Cheng, Y.; Li, J.; Zhou, L.; Xu, Y.; Karlekar, J.; Pranata, S.; Shen, S.; et al. 3D-aided deep pose-invariant face recognition. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; Volume 2, p. 11. [Google Scholar]
  7. Zhao, J.; Xiong, L.; Li, J.; Xing, J.; Yan, S.; Feng, J. 3D-aided dual-agent gans for unconstrained face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2380–2394. [Google Scholar] [CrossRef]
  8. Zhao, J.; Cheng, Y.; Xu, Y.; Xiong, L.; Li, J.; Zhao, F.; Jayashree, K.; Pranata, S.; Shen, S.; Xing, J.; et al. Towards pose invariant face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2207–2216. [Google Scholar]
  9. Zhao, J.; Xiong, L.; Karlekar Jayashree, P.; Li, J.; Zhao, F.; Wang, Z.; Sugiri Pranata, P.; Shengmei Shen, P.; Yan, S.; Feng, J. Dual-agent gans for photorealistic and identity preserving profile face synthesis. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  10. Zhao, J. Deep Learning for Human-Centric Image Analysis. Ph.D. Thesis, National University of Singapore, Singapore, 2018. [Google Scholar]
  11. Khrissi, L.; EL Akkad, N.; Satori, H.; Satori, K. An Efficient Image Clustering Technique based on Fuzzy C-means and Cuckoo Search Algorithm. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 423–432. [Google Scholar] [CrossRef]
  12. Ding, C.; Tao, D. Pose-invariant face recognition with homography-based normalization. Pattern Recognit. 2017, 66, 144–152. [Google Scholar] [CrossRef]
  13. Luan, X.; Geng, H.; Liu, L.; Li, W.; Zhao, Y.; Ren, M. Geometry structure preserving based gan for multi-pose face frontalization and recognition. IEEE Access 2020, 8, 104676–104687. [Google Scholar] [CrossRef]
  14. Liu, Y.; Chen, J. Unsupervised face frontalization for pose-invariant face recognition. Image Vis. Comput. 2021, 106, 104093. [Google Scholar] [CrossRef]
  15. Yin, Y.; Jiang, S.; Robinson, J.P.; Fu, Y. Dual-attention gan for large-pose face frontalization. In Proceedings of the 2020 15th IEEE international conference on automatic face and gesture recognition (FG 2020), Buenos Aires, Argentina, 16–20 November 2020; pp. 249–256. [Google Scholar]
  16. Lin, C.-H.; Huang, W.-J.; Wu, B.-F. Deep representation alignment network for pose-invariant face recognition. Neurocomputing 2021, 464, 485–496. [Google Scholar] [CrossRef]
  17. Yang, H.; Gong, C.; Huang, K.; Song, K.; Yin, Z. Weighted feature histogram of multi-scale local patch using multi-bit binary descriptor for face recognition. IEEE Trans. Image Process. 2021, 30, 3858–3871. [Google Scholar] [CrossRef]
  18. Tu, X.; Zhao, J.; Liu, Q.; Ai, W.; Guo, G.; Li, Z.; Liu, W.; Feng, J. Joint face image restoration and frontalization for recognition. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1285–1298. [Google Scholar] [CrossRef]
  19. Zhou, L.-F.; Du, Y.-W.; Li, W.-S.; Mi, J.-X.; Luan, X. Pose-robust face recognition with huffman-lbp enhanced by divide-and-rule strategy. Pattern Recognit. 2018, 78, 43–55. [Google Scholar] [CrossRef]
  20. Zhang, C.; Li, H.; Qian, Y.; Chen, C.; Zhou, X. Locality-constrained discriminative matrix regression for robust face identification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 1254–1268. [Google Scholar] [CrossRef]
  21. Gao, L.; Guan, L. A discriminative vectorial framework for multi-modal feature representation. IEEE Trans. Multimed. 2021, 24, 1503–1514. [Google Scholar]
  22. Yang, S.; Deng, W.; Wang, M.; Du, J.; Hu, J. Orthogonality loss: Learning discriminative representations for face recognition. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 2301–2314. [Google Scholar] [CrossRef]
  23. Huang, F.; Yang, M.; Lv, X.; Wu, F. Cosmos-loss: A face representation approach with independent supervision. IEEE Access 2021, 9, 36819–36826. [Google Scholar] [CrossRef]
  24. He, M.; Zhang, J.; Shan, S.; Kan, M.; Chen, X. Deformable face net for pose invariant face recognition. Pattern Recognit. 2020, 100, 107113. [Google Scholar] [CrossRef]
  25. Wang, Q.; Guo, G. Dsa-face: Diverse and sparse attentions for face recognition robust to pose variation and occlusion. IEEE Trans. Inf. Forensics Secur. 2021, 16, 4534–4543. [Google Scholar] [CrossRef]
  26. He, R.; Li, Y.; Wu, X.; Song, L.; Chai, Z.; Wei, X. Coupled adversarial learning for semi-supervised heterogeneous face recognition. Pattern Recognit. 2021, 110, 107618. [Google Scholar] [CrossRef]
  27. Liu, H.; Zhu, X.; Lei, Z.; Cao, D.; Li, S.Z. Fast adapting without forgetting for face recognition. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 3093–3104. [Google Scholar] [CrossRef]
  28. Sun, J.; Yang, W.; Xue, J.H.; Liao, Q. An equalized margin loss for face recognition. IEEE Trans. Multimed. 2020, 22, 2833–2843. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Fu, K.; Han, C.; Cheng, P.; Yang, S.; Yang, X. PGM-face: Pose-guided margin loss for cross-pose face recognition. Neurocomputing 2021, 460, 154–165. [Google Scholar] [CrossRef]
  30. Badave, H.; Kuber, M. Head pose estimation based robust multicamera face recognition. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; pp. 492–495. [Google Scholar]
  31. Wang, L.; Li, S.; Wang, S.; Kong, D.; Yin, B. Hardness-aware dictionary learning: Boosting dictionary for recognition. IEEE Trans. Multimed. 2020, 23, 2857–2867. [Google Scholar] [CrossRef]
  32. Holkar, A.; Walambe, R.; Kotecha, K. Few-shot learning for face recognition in the presence of image discrepancies for limited multi-class datasets. Image Vis. Comput. 2022, 120, 104420. [Google Scholar] [CrossRef]
  33. Guan, Y.; Fang, J.; Wu, X. Multi-pose face recognition using cascade alignment network and incremental clustering. Signal, Image Video Process. 2021, 15, 63–71. [Google Scholar] [CrossRef]
  34. Zhang, Y.; Fu, K.; Han, C.; Cheng, P. Identity-and-pose-guided generative adversarial network for face rotation. Neurocomputing 2021, 450, 33–47. [Google Scholar] [CrossRef]
  35. Qu, H.; Wang, Y. Application of optimized local binary pattern algorithm in small pose face recognition under machine vision. Multimed. Tools Appl. 2022, 81, 29367–29381. [Google Scholar] [CrossRef]
  36. Masi, I.; Chang, F.J.; Choi, J.; Harel, S.; Kim, J.; Kim, K.; Leksut, J.; Rawls, S.; Wu, Y.; Hassner, T.; et al. Learning pose-aware models for pose-invariant face recognition in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 379–393. [Google Scholar] [CrossRef]
  37. Elharrouss, O.; Almaadeed, N.; Al-Maadeed, S.; Khelifi, F. Pose-invariant face recognition with multitask cascade networks. Neural Comput. Appl. 2022, 34, 6039–6052. [Google Scholar] [CrossRef]
  38. Liu, J.; Li, Q.; Liu, M.; Wei, T. CP-GAN: A cross-pose profile face frontalization boosting pose-invariant face recognition. IEEE Access 2020, 8, 198659–198667. [Google Scholar] [CrossRef]
  39. Tao, Y.; Zheng, W.; Yang, W.; Wang, G.; Liao, Q. Frontal-centers guided face: Boosting face recognition by learning pose-invariant features. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2272–2283. [Google Scholar] [CrossRef]
  40. Gao, G.; Yu, Y.; Yang, M.; Chang, H.; Huang, P.; Yue, D. Cross-resolution face recognition with pose variations via multilayer locality-constrained structural orthogonal procrustes regression. Inf. Sci. 2020, 506, 19–36. [Google Scholar] [CrossRef]
  41. Wang, H.; Kawahara, Y.; Weng, C.; Yuan, J. Representative selection with structured sparsity. Pattern Recognit. 2017, 63, 268–278. [Google Scholar] [CrossRef]
  42. Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; Baker, S. Multi-pie. Image Vis. Comput. 2010, 28, 807–813. [Google Scholar] [CrossRef]
  43. Kemelmacher-Shlizerman, I.; Seitz, S.M.; Miller, D.; Brossard, E. The megaface benchmark: 1 million faces for recognition at scale. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4873–4882. [Google Scholar]
  44. Gao, W.; Cao, B.; Shan, S.; Chen, X.; Zhou, D.; Zhang, X.; Zhao, D. The CAS-PEAL large-scale chinese face database and baseline evaluations. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 2007, 38, 149–161. [Google Scholar]
  45. Wolf, L.; Hassner, T.; Maoz, I. Face recognition in unconstrained videos with matched background similarity. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 529–534. [Google Scholar]
  46. Zheng, T.; Deng, W. Cross-Pose LFW: A Database for Studying Cross-Pose Face Recognition in Unconstrained Environments; Technical Report; Beijing University of Posts and Telecommunications: Beijing, China, 2018; Volume 5. [Google Scholar]
  47. Peer, P. CVL Face Database, Computer Vision Lab., Faculty of Computer and Information Science, University of Ljubljana, Slovenia. 2005. Available online: http://www.lrv.fri.uni-lj.si/facedb.html (accessed on 27 March 2023).
  48. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  49. Duan, X.; Tan, Z.-H. A spatial self-similarity based feature learning method for face recognition under varying poses. Pattern Recognit. Lett. 2018, 111, 109–116. [Google Scholar] [CrossRef]
  50. Wu, H.; Gu, J.; Fan, X.; Li, H.; Xie, L.; Zhao, J. 3D-guided frontal face generation for pose-invariant recognition. ACM Trans. Intell. Syst. Technol. 2023, 14, 1–21. [Google Scholar] [CrossRef]
  51. Zhao, J.; Li, J.; Zhao, F.; Nie, X.; Chen, Y.; Yan, S.; Feng, J. Marginalized CNN: Learning deep invariant representations. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017. [Google Scholar] [CrossRef]
  52. Wang, X.; Wang, S.; Liang, Y.; Gu, L.; Lei, Z. RVFace: Reliable vector guided softmax loss for face recognition. IEEE Trans. Image Process. 2022, 31, 2337–2351. [Google Scholar] [CrossRef]
  53. Zhong, Y.; Deng, W.; Fang, H.; Hu, J.; Zhao, D.; Li, X.; Wen, D. Dynamic training data dropout for robust deep face recognition. IEEE Trans. Multimed. 2021, 24, 1186–1197. [Google Scholar] [CrossRef]
  54. Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4690–4699. [Google Scholar]
  55. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  56. Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1891–1898. [Google Scholar]
Figure 1. The flowchart of the proposed HDMCO. s i represents the distance between the patch q j and O i , O i is the center of the i -th SVDD sphere, i = 1 , 2 , , C . X is the training. Sample set, D is the dictionary, Z is the representation coefficient matrix.
Figure 1. The flowchart of the proposed HDMCO. s i represents the distance between the patch q j and O i , O i is the center of the i -th SVDD sphere, i = 1 , 2 , , C . X is the training. Sample set, D is the dictionary, Z is the representation coefficient matrix.
Electronics 12 02248 g001
Figure 2. Examples of the hypergraph G . v i is the vertex, i = 1 , 2 , , 8 . e j is the hyperedge, j = 1 , 2 , 3 . H is the association matrix, H i j is the element in row i and column j in H .
Figure 2. Examples of the hypergraph G . v i is the vertex, i = 1 , 2 , , 8 . e j is the hyperedge, j = 1 , 2 , 3 . H is the association matrix, H i j is the element in row i and column j in H .
Electronics 12 02248 g002
Figure 3. The process of extracting near-frontal images from images involving pose variations. Y is the original image set involving pose deflection, Y W is the image set after preprocessing Y , Y is the image set of the approximate frontal image obtained by decomposition and iteration, P is pose change matrix, P T is the transpose of P , Y i is the value of Y at the i -th iteration, P i T is the value of P T at the i -th iteration.
Figure 3. The process of extracting near-frontal images from images involving pose variations. Y is the original image set involving pose deflection, Y W is the image set after preprocessing Y , Y is the image set of the approximate frontal image obtained by decomposition and iteration, P is pose change matrix, P T is the transpose of P , Y i is the value of Y at the i -th iteration, P i T is the value of P T at the i -th iteration.
Electronics 12 02248 g003
Figure 4. In the picture, each small grid represents a patch. The number of patches is chosen based on our experience.
Figure 4. In the picture, each small grid represents a patch. The number of patches is chosen based on our experience.
Electronics 12 02248 g004
Figure 5. The schematic diagram of the encoding. o i and o j are the centers of the i -th and j -th SVDD balls, respectively. r i and r j are the radius of the i th and j th SVDD balls, respectively. s i ( q j ) and s j ( q j ) represent the distance from patch q j to o i and o j , respectively. denotes the radius of the SVDD sphere formed by the i t h cluster. s i ( q j ) represents the distance between q j and o i . o j denotes the center of the SVDD sphere formed by the j t h cluster, r j denotes the radius of the SVDD sphere formed by the j t h cluster. s j ( q j ) represents the distance between q j and o j . For the specific encoding of each patch, please refer to Equation (19).
Figure 5. The schematic diagram of the encoding. o i and o j are the centers of the i -th and j -th SVDD balls, respectively. r i and r j are the radius of the i th and j th SVDD balls, respectively. s i ( q j ) and s j ( q j ) represent the distance from patch q j to o i and o j , respectively. denotes the radius of the SVDD sphere formed by the i t h cluster. s i ( q j ) represents the distance between q j and o i . o j denotes the center of the SVDD sphere formed by the j t h cluster, r j denotes the radius of the SVDD sphere formed by the j t h cluster. s j ( q j ) represents the distance between q j and o j . For the specific encoding of each patch, please refer to Equation (19).
Electronics 12 02248 g005
Figure 6. The schematic diagram of seeking a globally optimal solution.
Figure 6. The schematic diagram of seeking a globally optimal solution.
Electronics 12 02248 g006
Figure 7. Example images from different datasets. (a) Multi-PIE (b) MegaFace (c) CAS-PEAL (d) YTF (e) CPLFW (f) CVL.
Figure 7. Example images from different datasets. (a) Multi-PIE (b) MegaFace (c) CAS-PEAL (d) YTF (e) CPLFW (f) CVL.
Electronics 12 02248 g007aElectronics 12 02248 g007b
Figure 8. The effect of feature dimension on the recognition rate of different methods. (a) Multi-PIE (b) MegaFace (c) CAS-PEAL (d) YTF (e) CPLFW (f) CVL.
Figure 8. The effect of feature dimension on the recognition rate of different methods. (a) Multi-PIE (b) MegaFace (c) CAS-PEAL (d) YTF (e) CPLFW (f) CVL.
Electronics 12 02248 g008
Figure 9. Images are separated by non-negative matrix factorization based on the hypergraph (a) Multi-PIE (b) MegaFace (c) CAS-PEAL (d) YTF (e) CPLFW (f) CVL.
Figure 9. Images are separated by non-negative matrix factorization based on the hypergraph (a) Multi-PIE (b) MegaFace (c) CAS-PEAL (d) YTF (e) CPLFW (f) CVL.
Electronics 12 02248 g009aElectronics 12 02248 g009b
Figure 10. Results of ablation experiments. MP represents Multi-PIE, MF represents MegaFace, CP represents CAS-PEAL (a) de-deflection (b) feature learning (c) dictionary learning (d) joint optimization.
Figure 10. Results of ablation experiments. MP represents Multi-PIE, MF represents MegaFace, CP represents CAS-PEAL (a) de-deflection (b) feature learning (c) dictionary learning (d) joint optimization.
Electronics 12 02248 g010aElectronics 12 02248 g010b
Figure 11. The effect of main parameters on the recognition rate of HDMCO. (a) η (b) λ .
Figure 11. The effect of main parameters on the recognition rate of HDMCO. (a) η (b) λ .
Electronics 12 02248 g011
Table 1. Accuracies (%) of Different Methods on Different Datasets.
Table 1. Accuracies (%) of Different Methods on Different Datasets.
Methods\DatasetsMulti-PIE [40]MegaFace [41]CAS-PEAL [42]YTF [43]CPLFW [44]CVL [45]
Reset [48]91.0687.7790.7776.0582.3689.56
Duan [49]87.6882.5589.3773.8881.0685.17
PGM-Face [29]90.2385.3390.0173.2078.5888.06
PCCycleGAN [14]88.9985.0188.8575.1680.6687.38
LDMR [20]91.3386.2392.2977.2285.0690.23
MH-DNCCM [21]91.7885.8990.4676.1182.5087.98
DRA-Net [16]93.0683.9993.1676.4782.9790.01
TGLBP [35]89.0686.1390.6775.6083.7186.97
MCN [37]92.0186.2491.3777.0585.2088.31
3D-PIM [50]93.0286.3191.7978.0585.0789.22
WFH [17]91.5586.0192.7074.8884.0187.34
mCNN [51]88.6885.1787.5872.8980.5981.38
HADL [31]90.8285.3590.4775.9584.3586.40
RVFace [52]92.1088.0393.1778.0585.9790.03
DTDD [53]90.3988.3793.5577.3586.2388.80
ArcFace [54]95.8991.3792.1383.4084.8887.23
VGG [55]95.1489.2990.9281.0583.0685.71
DeepID [56]93.8887.5888.1578.4583.4685.12
HDMCO [ours]95.1990.6795.8880.3488.4192.19
Table 2. Recall (%) of Different Methods on Different Datasets.
Table 2. Recall (%) of Different Methods on Different Datasets.
Methods\DatasetsMulti-PIE [40]MegaFace [41]CAS-PEAL [42]YTF [43]CPLFW [44]CVL [45]
Reset [48]78.3570.6876.1882.6776.8381.33
Duan [49]76.0265.4372.6777.6173.2578.69
PGM-Face [29]79.6673.1475.6875.6077.2580.39
PCCycleGAN [14]81.6472.9577.6273.0876.8976.28
LDMR [20]83.5876.8972.9975.0375.8879.01
MH-DNCCM [21]82.0573.9170.0373.2674.1980.06
DRA-Net [16]85.1180.3477.6879.3280.6481.39
TGLBP [35]80.2478.9278.3375.1778.3879.68
MCN [37]83.6780.2081.0877.6880.3478.18
3D-PIM [50]85.0281.3581.6979.6777.5876.64
WFH [17]82.6778.5476.4477.3979.1475.89
mCNN [51]80.3378.6779.5878.9980.0178.66
HADL [31]78.8980.5979.4479.8878.4680.62
RVFace [52]82.2480.3077.8979.0181.3380.23
DTDD [53]80.9580.6879.2579.3180.6482.07
ArcFace [54]82.5385.0182.3480.0981.6980.60
VGG [55]79.2881.0278.0875.8979.8879.47
DeepID [56]81.0882.1679.6681.0681.0678.30
HDMCO [ours]88.3785.6786.1785.6088.3288.97
Table 3. Precision (%) of Different Methods on Different Datasets.
Table 3. Precision (%) of Different Methods on Different Datasets.
Methods\DatasetsMulti-PIE [40]MegaFace [41]CAS-PEAL [42]YTF [43]CPLFW [44]CVL [45]
Reset [48]89.2686.0886.9278.3480.1686.57
Duan [49]85.0680.3888.1575.0679.6886.23
PGM-Face [29]89.3286.4288.9575.0177.1986.27
PCCycleGAN [14]86.2785.3986.1977.1279.1885.61
LDMR [20]88.9785.0990.8775.8083.9787.18
MH-DNCCM [21]88.3982.1788.6978.0280.0585.10
DRA-Net [16]90.8682.3492.0575.2480.3487.19
TGLBP [35]86.9585.0691.2175.3281.9985.43
MCN [37]90.6783.9789.6876.3883.9787.32
3D-PIM [50]90.9885.1190.0875.8683.4687.68
WFH [17]89.3083.6790.7974.0283.9786.22
mCNN [51]86.9185.0686.4070.6679.3079.66
HADL [31]88.6984.3988.6776.0883.9785.88
RVFace [52]90.6886.9291.8678.6885.0288.60
DTDD [53]88.6787.0892.4376.1885.1587.67
ArcFace [54]93.9190.2890.8882.9182.6986.41
VGG [55]95.8688.0688.6779.3881.6785.02
DeepID [56]93.0586.1485.9777.6881.9784.67
HDMCO [ours]96.0891.3594.8681.5788.0592.30
Table 4. The Results (%) of Cross-validation.
Table 4. The Results (%) of Cross-validation.
Methods\DatasetsMulti-PIE [40]MegaFace [41]CAS-PEAL [42]YTF [43]CPLFW [44]CVL [45]
Reset [48]81.3278.9281.2468.0573.6880.92
Duan [49]80.6875.6080.3863.5871.9978.96
PGM-Face [29]77.6879.3177.5972.3868.5677.90
PCCycleGAN [14]76.8275.6674.9768.3369.9878.62
LDMR [20]80.3883.2980.6473.2076.8280.93
MH-DNCCM [21]81.6081.3283.6764.5171.9379.71
DRA-Net [16]83.6483.1681.4666.4974.6980.97
TGLBP [35]80.3280.9776.9164.9872.6477.62
MCN [37]80.0679.8680.4671.6171.6278.59
3D-PIM [50]81.3080.6178.6770.3873.9278.61
WFH [17]81.6980.6781.3363.8973.6879.68
mCNN [51]79.3775.3176.8263.9971.6870.29
HADL [31]80.3473.9780.3473.6173.6177.85
RVFace [52]77.3176.8983.8975.0675.3879.33
DTDD [53]78.3980.6782.5873.6878.9978.99
ArcFace [54]82.6778.5981.3777.3174.6377.97
VGG [55]80.6977.9880.5973.6873.9176.89
DeepID [56]83.9977.8678.6171.6877.3577.95
HDMCO [ours]89.3085.0785.9978.9583.9383.97
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, X.; Liao, M.; Chen, L.; Hu, J. Few-Shot Learning for Multi-POSE Face Recognition via Hypergraph De-Deflection and Multi-Task Collaborative Optimization. Electronics 2023, 12, 2248. https://doi.org/10.3390/electronics12102248

AMA Style

Fan X, Liao M, Chen L, Hu J. Few-Shot Learning for Multi-POSE Face Recognition via Hypergraph De-Deflection and Multi-Task Collaborative Optimization. Electronics. 2023; 12(10):2248. https://doi.org/10.3390/electronics12102248

Chicago/Turabian Style

Fan, Xiaojin, Mengmeng Liao, Lei Chen, and Jingjing Hu. 2023. "Few-Shot Learning for Multi-POSE Face Recognition via Hypergraph De-Deflection and Multi-Task Collaborative Optimization" Electronics 12, no. 10: 2248. https://doi.org/10.3390/electronics12102248

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop