Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Adaptive Facial Imagery Clustering via Spectral Clustering and Reinforcement Learning

Appl. Sci. 2021, 11(17), 8051; https://doi.org/10.3390/app11178051

by Chengxiao Shen, Liping Qian^*

and Ningning Yu

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2021, 11(17), 8051; https://doi.org/10.3390/app11178051

Submission received: 11 August 2021 / Revised: 27 August 2021 / Accepted: 27 August 2021 / Published: 30 August 2021

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

The authors present an interesting problem. I have been enjoying reading it.

But there are still some problems that need to be dealt with:
The figures must be more clear and explained.
There are some typos and grammar errors. Please carefully proofread the manuscript to correct them.
More datasets can be processed for robustness and scalability-based evaluation like the YouTube Faces database and the LFW dataset.
Some sentences could be improved, writing style.

References must be improved by recent papers, I suggest:

Abdallah, Mohamed S., et al. "Zero-shot deep learning for media mining: Person spotting and face clustering in video big data." Electronics 8.12 (2019): 1394.

Qi, Chao, et al. "Deep face clustering using residual graph convolutional network." Knowledge-Based Systems 211 (2021): 106561.

Wang, Mei, and Weihong Deng. "Deep face recognition with clustering based domain adaptation." Neurocomputing 393 (2020): 1-14.

Author Response

Response to Reviewer 1 Comments

Point 1: The figures must be more clear and explained. 

Response 1: Thank you very much for your suggestions. I have updated all the figures, the figures are clearer

Point 2: There are some typos and grammar errors. Please carefully proofread the manuscript to correct them.

Response 2: We have carefully corrected grammar errors and typos in comparison with the manuscript.

Point 3: More datasets can be processed for robustness and scalability-based evaluation like the YouTube Faces database and the LFW dataset.

Response 3: We conducted experiments on the YouTube Faces databases and the LFW dataset, and the results are as follows:

We also test on the LFW face dataset [28], which contains 13,233 images of 5749 peo-ple. First, we deleted the images that contained multiple faces in the same image. The 12,323 images of the remaining 5,484 people are clustered, and the final cluster number K is 5622, and the F-score is 0.6492. Because the data samples provided in the LFW data set are not balanced (the number of images for each person is different), the clustering accu-racy is not high.

We conducted a test on the YouTube Faces database dataset [29]. First, the video was intercepted into images at 20-frame intervals, and images containing multiple faces and no faces were removed through face detection. While obtaining enough face images, we also ensured that there will be a certain amount of deflection angle, illumination and other changes between the images of the same person. In the end, we constructed a dataset of 27650 images containing 1,595 people. Using proposed approach to cluster the dataset, the final cluster number K is 1684, and the F-score is 0.7724. Although the amount of im-ages increase, accuracy is better than the results achieved on the CFP data set, because the changes between the same person images captured in the video are small.

Point 4: Some sentences could be improved, writing style.

Response 4: We have carefully corrected grammar errors and typos in comparison with the manuscript.

Point 5: References must be improved by recent papers.

Response 5: We have added the specified references to the article.

Qi et al. [8] proposed a deep face clustering method based on the Residual Graph Con-volu-tional Network (RGCN). M.S. Abdallah et al. [9] proposed a TV media mining system based on DCNN to rapidly identifying a specific individual in real-time processing video data.
Wang et al. [17] learned the dis-criminative target feature by aligning the feature do-main globally, and, at the meantime, distinguishing the target clusters locally. Their method reduced the impact of illuminance, pose and image quality.

Author Response File: Author Response.docx

Reviewer 2 Report

1) It would be interesting to add a sentence in the abstract concerning the future developing of the ongoing research.

2) Please, make the captions auto-explicative.

3) please, provide any info about the computationa complexity of the proposed approach.

4) The images under study could be affected by uncertainty. Therefore, they would need fuzzy preprocessing to handle this problem at hand. Therefore, I ask to the Authors to add a sentence in the text that summarizing this possibility inserting in the references the following relevant papers:

M. Versaci, S. Calcagno and F. C. Morabito, "Fuzzy geometrical approach based on unit hyper-cubes for image contrast enhancement," 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2015, pp. 488-493, doi: 10.1109/ICSIPA.2015.7412240.
doi: 10.1016/j.asoc.2020.106077
doi: 10.1007/s10916-020-01568-9

Author Response

Response to Reviewer 2 Comments

Point 1: It would be interesting to add a sentence in the abstract concerning the future developing of the ongoing research. 

Response 1: Thank you very much for your suggestions. We have added the future developing of the ongoing research to the abstract

Subsequent research will focus on reducing the computational complexity of dealing with more face images.

Point 2: Please, make the captions auto-explicative.

Response 2: We have made the captions auto-explicative. The comments are as follows:

Figure 1. The diagram of our proposed adaptive face clustering. First, the face feature vectoris obtained. Second, the images are clustered according to the set parameters. Third, the parameters are adjusted by evaluating the clustering results. Finally we repeat the second and third steps, and output the clustering results and parameters when the results meet our set requirements.

Figure 2. The diagram of the deep face representation. First, we determine the 68 landmark points of the face and crop them according to the landmark points.Then,we input the image and landmark points into the ResNet model, Finally, we use L2-Normalization to obtain the normalized feature vector.

Figure 3. The sketch of reinforcement learning, in which agent guides the choice of actions according to the state feedback information given by the environment, and the choice of actions will affect the changes of the environment, thereby updating the parameters.

Figure 7. The observation of reduction dimension dim. Dim is regarded as the changing parameter while K as the invariable parameter. When , the DBI value first decreases and then increases with the increase of dim value. When , the DBI value decreases with the increase of dim.

Point 3: Provide any info about the computationa complexity of the proposed approach

Response 3: The computational complexity of the clustering mainly depends on the spectral clustering and parameter adjustment. Let and respectively denote the computational complexity of parameter searching K and parameter searching dim, satisfying

Here, is the starting point clustering number, is the starting point reduction dimension, and are step values, is the number of face images. is the computational complexity of Spectral Clustering, in the computational complexity of Davies-Bouldin Index.

Point 4: Some sentences could be improved, writing style. The images under study could be affected by uncertainty. Therefore, they would need fuzzy preprocessing to handle this problem at hand. Therefore, I ask to the Authors to add a sentence in the text that summarizing this possibility inserting in the references the following relevant papers:

Versaci, S. Calcagno and F. C. Morabito, "Fuzzy geometrical approach based on unit hyper-cubes for image contrast enhancement," 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2015, pp. 488-493, doi: 10.1109/ICSIPA.2015.7412240.

Response 4: We have added the specified references to the article.

Versaci et al. [16] presented a fuzzy geometrical approach based on unit hyper-cubes for image contrast enhancement to solve this problem.

Author Response File: Author Response.docx

Article Menu

Adaptive Facial Imagery Clustering via Spectral Clustering and Reinforcement Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI