Next Article in Journal
Oscillation Suppression Strategy of Three-Phase Four-Wire Grid-Connected Inverter in Weak Power Grid
Next Article in Special Issue
Multi-Scale Spatial–Spectral Attention-Based Neural Architecture Search for Hyperspectral Image Classification
Previous Article in Journal
A Low-Latency, Low-Jitter Retimer Circuit for PCIe 6.0
Previous Article in Special Issue
Quality of Life Prediction on Walking Scenes Using Deep Neural Networks and Performance Improvement Using Knowledge Distillation
 
 
Article
Peer-Review Record

Gaze Estimation Method Combining Facial Feature Extractor with Pyramid Squeeze Attention Mechanism

Electronics 2023, 12(14), 3104; https://doi.org/10.3390/electronics12143104
by Jingfang Wei 1, Haibin Wu 1,*, Qing Wu 1, Yuji Iwahori 2, Xiaoyu Yu 3 and Aili Wang 1,*
Reviewer 1:
Reviewer 2:
Reviewer 3:
Electronics 2023, 12(14), 3104; https://doi.org/10.3390/electronics12143104
Submission received: 13 June 2023 / Revised: 4 July 2023 / Accepted: 11 July 2023 / Published: 17 July 2023
(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

Round 1

Reviewer 1 Report

In this paper, the authors proposed a novel algorithm for gaze estimation. One of the actual problems in the article is that the introduction section needs to be split into the classical introduction and related work. Also, images are out of format, so their width needs to be changed. In Table 2, where a summary of datasets is presented training set is much smaller than the test dataset, which is not a usual practice where we take the training dataset to be a larger one. One more problem is that input into the first block, the so-called facial feature extractor, utilizes RGB input images of 128x128 pixels, and in that case, there is a significant distortion from Gaze360 and ETH-XGaze datasets having originally 3392x4096 and 6000x4000 pixels, respectively.

Similarly, only MPIIGaze and Gaze Capture are suitable for comparison having 224x224 pixels original resolution. The Experimental Results and Analysis section states that the proposed network runs on a hardware i5-8300H CPU, NVIDIA GeForce RTX2080Ti GPU - this needs to be in the section Materials and Methods! More importantly, it needs to be clarified if CUDA is used, and if it is used, how many GPUs RTX2080Ti are available. Also, what is the RAM capacity of the used machine? The proposed algorithm has too many parameters and works with relatively large datasets, so there is no room for guessing the required resources. A short description of determining their values would be welcomed in addressing hyperparameters. Table 4 states The error comparison, but accuracy is depicted in it. Overall, after changing these few proposed recommendations for improvement paper will be valuable for publishing. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors propose a novel gaze estimation algorithm based on attention mechanisms. By integrating L2CSNet with FFE and PSA, the model not only improves the robustness of gaze estimation, but also provides more accurate estimation result than the original model. The work is very interesting and well written, with a few minor errors listed in sequence. Please note that the following comments are intended to improve paper quality and readers' understanding.

 

The authors compare the proposed work with some directly related algorithms focusing on gaze estimation, but there is no comparison performed with FAZE, which is currently one of the best gaze estimators available, according to the rank in paperswithcode.com

 

It is important to compare the proposed approach with the results from FAZE (https://paperswithcode.com/paper/few-shot-adaptive-gaze-estimation). From their paper, they reach a result as low as 3.14 (angular error) on the MPII Gaze dataset, while the proposed solution achieves 3.41 on the same dataset.

 

Please compare the proposed solution with the state of the art in gaze estimation, highlighting how the proposed solution advances the state of the art and the limitations of the proposed technique. It is important to make this clear in order to make the paper acceptable for publication.

 

 

More general comments and minor errors are listed as follows.

 

"image scanning" -> "image scanning."

"Amudha J et al." -> "Amudha J. et al."

"J. Ma  et" -> "J. Ma et"

"which fill a" -> "which fills a"

"primary challenges. " -> "primary challenges: "

"The algorithm flowchart of FPSA_L2CSNet, illustrated" -> "The algorithm flowchart of FPSA_L2CSNet is illustrated"

"the FFE(facial feature extractor)" -> the term was already defined in the text

"is employed to enhance" -> "are employed to enhance"

"Table gives" -> "Table 2 gives"

"213695" -> "213,695"

"2445504" -> "2,445,504"

"1450" -> "1,450"

"Table 2 shows the comparison" -> "Table 3 shows the comparison"

"In Figure (a)" -> "In Figure 8 (a)"

"In Figure (b)," -> "In Figure 8 (b),"

" In Figure(c)," -> " In Figure 8 (c),"

"In Figure (d)," -> "In Figure 8 (d),"

"datasets,." -> "datasets."

"testing on public dataset" -> "testing on public datasets"

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

this is a well writen paperi ntegrating many deep learning models and buiding  several combinations of neural networks, as well as validation and intercomparition Diffidult to non experts, The ckoice of sizes 3 5 7 for the vonvolutional kernel  is a very good choige for the databases

I sugest to enlarge small text in figures 4 and 8, but I undersstand the pattern dtection discarts non mathecatical information. 

Otherwise I recomend acceptaton.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Dear authors, I'm satisfied with the modifications performed and I believe the current version of the paper can be accepted now. Congratulations!

Please, when prepairing the final version, fix this minor issue:

"5 Conclusion" -> "5. Conclusion"

Back to TopTop