Gaze Estimation Method Combining Facial Feature Extractor with Pyramid Squeeze Attention Mechanism
Round 1
Reviewer 1 Report
In this paper, the authors proposed a novel algorithm for gaze estimation. One of the actual problems in the article is that the introduction section needs to be split into the classical introduction and related work. Also, images are out of format, so their width needs to be changed. In Table 2, where a summary of datasets is presented training set is much smaller than the test dataset, which is not a usual practice where we take the training dataset to be a larger one. One more problem is that input into the first block, the so-called facial feature extractor, utilizes RGB input images of 128x128 pixels, and in that case, there is a significant distortion from Gaze360 and ETH-XGaze datasets having originally 3392x4096 and 6000x4000 pixels, respectively.
Similarly, only MPIIGaze and Gaze Capture are suitable for comparison having 224x224 pixels original resolution. The Experimental Results and Analysis section states that the proposed network runs on a hardware i5-8300H CPU, NVIDIA GeForce RTX2080Ti GPU - this needs to be in the section Materials and Methods! More importantly, it needs to be clarified if CUDA is used, and if it is used, how many GPUs RTX2080Ti are available. Also, what is the RAM capacity of the used machine? The proposed algorithm has too many parameters and works with relatively large datasets, so there is no room for guessing the required resources. A short description of determining their values would be welcomed in addressing hyperparameters. Table 4 states The error comparison, but accuracy is depicted in it. Overall, after changing these few proposed recommendations for improvement paper will be valuable for publishing.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The authors propose a novel gaze estimation algorithm based on attention mechanisms. By integrating L2CSNet with FFE and PSA, the model not only improves the robustness of gaze estimation, but also provides more accurate estimation result than the original model. The work is very interesting and well written, with a few minor errors listed in sequence. Please note that the following comments are intended to improve paper quality and readers' understanding.
The authors compare the proposed work with some directly related algorithms focusing on gaze estimation, but there is no comparison performed with FAZE, which is currently one of the best gaze estimators available, according to the rank in paperswithcode.com
It is important to compare the proposed approach with the results from FAZE (https://paperswithcode.com/paper/few-shot-adaptive-gaze-estimation). From their paper, they reach a result as low as 3.14 (angular error) on the MPII Gaze dataset, while the proposed solution achieves 3.41 on the same dataset.
Please compare the proposed solution with the state of the art in gaze estimation, highlighting how the proposed solution advances the state of the art and the limitations of the proposed technique. It is important to make this clear in order to make the paper acceptable for publication.
More general comments and minor errors are listed as follows.
"image scanning" -> "image scanning."
"Amudha J et al." -> "Amudha J. et al."
"J. Ma et" -> "J. Ma et"
"which fill a" -> "which fills a"
"primary challenges. " -> "primary challenges: "
"The algorithm flowchart of FPSA_L2CSNet, illustrated" -> "The algorithm flowchart of FPSA_L2CSNet is illustrated"
"the FFE(facial feature extractor)" -> the term was already defined in the text
"is employed to enhance" -> "are employed to enhance"
"Table gives" -> "Table 2 gives"
"213695" -> "213,695"
"2445504" -> "2,445,504"
"1450" -> "1,450"
"Table 2 shows the comparison" -> "Table 3 shows the comparison"
"In Figure (a)" -> "In Figure 8 (a)"
"In Figure (b)," -> "In Figure 8 (b),"
" In Figure(c)," -> " In Figure 8 (c),"
"In Figure (d)," -> "In Figure 8 (d),"
"datasets,." -> "datasets."
"testing on public dataset" -> "testing on public datasets"
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
this is a well writen paperi ntegrating many deep learning models and buiding several combinations of neural networks, as well as validation and intercomparition Diffidult to non experts, The ckoice of sizes 3 5 7 for the vonvolutional kernel is a very good choige for the databases
I sugest to enlarge small text in figures 4 and 8, but I undersstand the pattern dtection discarts non mathecatical information.
Otherwise I recomend acceptaton.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Dear authors, I'm satisfied with the modifications performed and I believe the current version of the paper can be accepted now. Congratulations!
Please, when prepairing the final version, fix this minor issue:
"5 Conclusion" -> "5. Conclusion"