Next Article in Journal
Reconstruction of Preclinical PET Images via Chebyshev Polynomial Approximation of the Sinogram
Previous Article in Journal
Effect of Rounded and Hunched Shoulder Postures on Myotonometric Measurements of Upper Body Muscles in Sedentary Workers
 
 
Article
Peer-Review Record

Non-Maximum Suppression Performs Later in Multi-Object Tracking

Appl. Sci. 2022, 12(7), 3334; https://doi.org/10.3390/app12073334
by Hong Liang, Ting Wu *, Qian Zhang and Hui Zhou
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(7), 3334; https://doi.org/10.3390/app12073334
Submission received: 3 March 2022 / Revised: 16 March 2022 / Accepted: 22 March 2022 / Published: 25 March 2022
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)

Round 1

Reviewer 1 Report

This paper proposes to postpone the non-maximum suppression (NMS) from the detection stage to the tracking stage in order to keep more detected boxes for tracking. According to the results of the experiments, I think it is an interesting finding. Detailed comments are below.

  1. When a person of interest appears in a frame for the first time, the NMS in the detection stage helps filter out many false detections to track. I wonder if we postpone the NMS to the tracking stage, how to deal with the newly appeared person?
  2. Based on Table 2, TransMOT is better than the proposed method in terms of all criteria. The authors might want to discuss that.
  3. Some notations are misleading. For example, in line 141, "the ground-truth boxes ... will be filtered out". I think people normally use "ground-truth" to indicate the true detection during training instead of the detected boxes in the detection stage.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

  1. The contributions listed in L64-78 should be carefully revised. The usage of bicubic interpolation is trivial, and cannot be regarded as a contribution. The same goes to the usage of an existing Re-ID model.
  2. L68, how is the occlusion problem solved? Some experimental results are needed to support the claim.
  3. The related work is not through. For example, the works, Matnet: Motion-attentive transition network for zero-shot video object segmentation-tip, Target-aware object discovery and association for unsupervised video multi-object segmentation-cvpr, propose two-stage methods for pixel-level tracking. Please discuss the difference to these works.
  4. L83, "one-shot" should be "one-stage"

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The revision has addressed all my concerns. 

Back to TopTop