Next Article in Journal
Discrimination of Earthquake-Induced Building Destruction from Space Using a Pretrained CNN Model
Next Article in Special Issue
A Novel P300 Classification Algorithm Based on a Principal Component Analysis-Convolutional Neural Network
Previous Article in Journal
An IoT Based Mobile Augmented Reality Application for Energy Visualization in Buildings Environments
Previous Article in Special Issue
Double Additive Margin Softmax Loss for Face Recognition
 
 
Article
Peer-Review Record

Improved Single Sample Per Person Face Recognition via Enriching Intra-Variation and Invariant Features

Appl. Sci. 2020, 10(2), 601; https://doi.org/10.3390/app10020601
by Huan Tu 1, Gesang Duoji 2,*, Qijun Zhao 1,2,* and Shuang Wu 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2020, 10(2), 601; https://doi.org/10.3390/app10020601
Submission received: 8 December 2019 / Revised: 1 January 2020 / Accepted: 3 January 2020 / Published: 14 January 2020
(This article belongs to the Special Issue Advanced Biometrics with Deep Learning)

Round 1

Reviewer 1 Report

The article concerns the problem of face recognition using a single sample per person (SSPP FR), which is a challenge in the field of computer vision. The Authors propose an algorithm to improve recognition efficiency by generating additional samples to enrich the intra-variation and extract invariant features. The method consists of three modules: 3D face modeling, 2D image generating and improved SSPP FR. The obtained results are clearly better than using traditional methods such as HOG + SVM, G-FST or deep learning methods. So, I recommend the paper to be published.

However, before the publication I would ask the Authors to clarify and supplement the following issues:

 

There is no definition of parameter λf in equation No. 2 How was the value of parameters λrecon, λl, λa, and λn, (equal to 1) and λf (equal to 0.5) selected? What about recognition results (face verification rates) for FERET-b (Table 1)? In my opinion, in section 4.4.3, it should be mentioned, that especially good results were obtained after matching scores of the original images (denoted by O) with the match scores of features extracted from the enriched images (denoted by E). In the conclusion section, there are no quantitative results regarding the effectiveness of the proposed algorithm and its advantage over other available methods.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This manuscript proposed a hybrid method for face recognition with single sample per person by combining the sampled intra-variations and eliminating external factors by extracting invariant features. Improvement of verification rate was achieved compared to both traditional methods and deep learning based methods. Please consider my comments below for revision.

 

 

Line 266, the scores from original face matcher, albedo matcher and the enriched face matcher are fused by a weighted sum rule. What’s the weight for each score and how are they determined accordingly?  For example, it looks like the original face matcher should contain larger error compared to other two matchers, since the image does not have any processing before feed into the face matcher. There are three factors considered in in the external factors, namely, pose, illumination and expression. In Section 4.3, the combination of Ori+AugP, Ori+AugPI, Ori+AugPE are tested for the algorithm using the CelebA database.  What’s the most important factor among PIE can affect the verification rate?  It would be interesting to see the verification rates with Ori+AugP, Ori+AugI and Ori+AugE only to see the effect of each factor. In line 359, the authors claimed that the limited capability of 3DMM in describing the expression can cause the decreased verification rate using Ori+AugPIE compared to Ori+AugPI dataset. It is not very convicing to me. Looking at using the Aug data only,  the AugPIE data has a improved accuracy compared to the data using pose and illumination only (AugPI). It was claimed that the enriched intra-variation face matcher would be important for reducing the false rejection rate. Along with the face verification rates presented in this manuscript, how about the false rejection rate compared to other methods?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The manuscript proposes a novel single sample per person face recognition framework using 3D morphable face models. To be more specific, a 3D face model is applied to extract robust facial features by reducing the intra-class variations caused by pose, expression and illumination. Additionally, the 3D face model is used to generate additional training data to improve the performance of a training face matching network. Overall, the proposed method is interesting and the manuscript is well presented. The experimental results obtained on several well-known benchmarking dataset demonstrate the merits of the proposed method. Here are some minor issues that the authors should consider for the final published version of the manuscript.

Equ. (1) requires more detailed explanations. Not all the symbols and operations are well introduced. Line 34-41: More advanced feature extraction approaches should be introduced, e.g. LBP, HOG and CNN features. Line 52-64: Some existing approaches perform data augmentation using synthetic faces generated from a 3DMM, e.g. [1][2][3]. These methods are related to the proposed method but the references are missing. Equ. (3): It is well known that L2 norm is sensitive to outliners. Why the authors choose the L2 loss for the illumination coefficients. Please report the hyper-parameter settings of the network training in the experimental results section. Section 3.2.3: Did the authors use blend shape for expression variations of the 3D face model? The authors refer this to [43]. However, the manuscript should be self-contained. Here may require a little bit more explanation of the 3D face model used in the manuscript. The above question links to Table 1. The Ori+AugPIE performs worse than AugPI in most cases. The authors explain that this might be because of the 3DMM cannot describing expression of real faces. Does this issue related to the blend shape PCA basis used in the 3D face model?

[1] Zhu X, Lei Z, Liu X, Shi H, Li SZ. Face alignment across large poses: A 3d solution. InProceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 146-155).

[2] Feng Z, Hu G, Kittler J, Christmas W, Wu X. Cascaded collaborative regression for robust facial landmark detection trained using a mixture of synthetic and real images with dynamic weighting. IEEE Transactions on Image Processing. 2015 Jun 17;24(11):3425-40.

[3] Song X, Feng Z, Hu G, Kittler J, Wu X. Dictionary integration using 3D morphable face models for pose-invariant collaborative-representation-based classification. IEEE Transactions on Information Forensics and Security. 2018 May 3;13(11):2734-45.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

All the comments are addressed properly in the revised manuscript. I would like recommend it for publication at Applied Sciences.

Back to TopTop