Next Article in Journal
Imaging Top of Volcanic Mounds Using Seismic Time- and Depth-Domain Data Processing
Next Article in Special Issue
3D Skeletal Joints-Based Hand Gesture Spotting and Classification
Previous Article in Journal
Analytical Modeling of Current-Voltage Photovoltaic Performance: An Easy Approach to Solar Panel Behavior
Previous Article in Special Issue
Translating Videos into Synthetic Training Data for Wearable Sensor-Based Activity Recognition Systems Using Residual Deep Convolutional Networks
 
 
Article
Peer-Review Record

A New Multi-Person Pose Estimation Method Using the Partitioned CenterPose Network

Appl. Sci. 2021, 11(9), 4241; https://doi.org/10.3390/app11094241
by Jiahua Wu and Hyo-Jong Lee *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(9), 4241; https://doi.org/10.3390/app11094241
Submission received: 8 March 2021 / Revised: 16 April 2021 / Accepted: 5 May 2021 / Published: 7 May 2021
(This article belongs to the Special Issue Deep Learning-Based Action Recognition)

Round 1

Reviewer 1 Report

The paper presents a bottom-up method for multi-person pose estimation, proposing an intermediate encoding step where instead of directly encoding the offset from the body center, five body regions are used. This encoding idea seems interesting but there significant issues that are not clear in the manuscript. First of all, in the experiments section the proposed method is significantly faster compared to competitive methods, nonetheless there is no intuition on the reason of this speed up. It should be related to the different encoding as this is the main novelty of the method but there is no explanation on this. More generally I believe the proposed method should be more clearly presented, better explaining the connection between offset and keypoints and the details that give the reported speed up. Equations like the focal loss and object keypoint similarity can be omitted since they are already defined in referenced work. Also competitive methods should include more recent ones with Higher-HRnet being the most obvious omission, which if I am not mistaken has better results in terms of AP from the proposed method. The crowdpose dataset should be also used for evalution as it provides more cluttered scenes that can be challenging for bottom up methods. As a final comment, there are several typos and passages in the manuscript that require rephasing/revision, please proofread carefully (e.g. "...since this kind of methods is profited from advances..." -> "...since this kind of methods profit from advances...", in line 124 - page 4 the subscript "n" should be "N"). Overall I don't believe the manuscript is suitable for publication in Applied Sciences.

Author Response

We appreciate the reviewers’ constructive suggestions for our paper. A point-by-point response is attached. Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Aiming at the problem of human pose estimation, this paper presents a method providing relatively high inference speed, and maintaining good performance results for multi-person pose estimation.
The main contributions of the authors seem to be two measures undertaken in order to achieve those results. The first one is the structure of pose representation (PPR) based on presupposition that human body is divided into a priori determined five parts in order to maintain the correlation between adjacent joints. The second one is modification of loss function which treats separately the offset vector in the head and the limbs. 
The specific comments are following:
- Literature review is rather very general and does not provide sufficient background to show the novelty of proposed model.
- It is not clear how reliable is the method when not all parts of the body are visible (e.g occluded).
- The comparison against other state-of-the-art models (table 1) shows different backbones for compared models (except AssocEmbedding). The question is how does it influence comparison of results.
- Although the paper is generally clearly written, some English formulations should be improved because some sentences are not clear (e.g. sentence started with "Compared with the top-down methods(...)" page 1, lines 40-42)

Author Response

We appreciate the reviewers’ constructive suggestions for our paper. The point-by-point response is attached. Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

This paper describes partitioned centerpose network for bottom-up multi-person pose estimation. The major contributions of this paper are novel partitioned pose representation and new bottom-up model with an improved L1 loss. As a result, the proposed partitioned centerpose network is comparable to the state-of-the-art methods while keeping the inference speed high.

Overall this paper is well written and technically sounds. The proposed method for the multi-person pose estimation is quite convincing. My evaluation is that this paper is publishable in present form.

Author Response

We appreciate the reviewers’ constructive suggestions for our paper. The point-by-point response is attached. Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop