Next Article in Journal
Experimental Study on Two PCM Macro-Encapsulation Designs in a Thermal Energy Storage Tank
Next Article in Special Issue
Revisiting Low-Resolution Images Retrieval with Attention Mechanism and Contrastive Learning
Previous Article in Journal
Black on White: Microbial Growth Darkens the External Marble of Florence Cathedral
Previous Article in Special Issue
Action Recognition Network Using Stacked Short-Term Deep Features and Bidirectional Moving Average
 
 
Article
Peer-Review Record

Real-Time Surveillance System for Analyzing Abnormal Behavior of Pedestrians

Appl. Sci. 2021, 11(13), 6153; https://doi.org/10.3390/app11136153
by Dohun Kim, Heegwang Kim, Yeongheon Mok and Joonki Paik *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Appl. Sci. 2021, 11(13), 6153; https://doi.org/10.3390/app11136153
Submission received: 17 May 2021 / Revised: 24 June 2021 / Accepted: 27 June 2021 / Published: 2 July 2021
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)

Round 1

Reviewer 1 Report

  1. In case You write abbreviations, You should write full name wity abbreviation in the parentheses. Next time You can use abbreviation only. I have marked text where it should be corrected.
  2. Please write Equation 1 and 2 in the format "A = B * C" and explain every item being in the Equations.

  3. Figure 4 - Please add QUESTION text format in the question box YES-NO. It should make clear what means "Index?" in the flowchart.
  4. Figure 7 - Please explain in the text what means green area.
  5. Figure 8 - Please explain in the text, have You trained/experimented frames with persons of different high and width (i.e. babies, young and adults; slim and obese ?). What was the result of experiment if it was done?
  6. Table 4 - Please write what means "F1".
  7. "Data Availability Statement: The KISA datasets presented in this study are openly available in [23]." - Reference 23 does not exist. Do you mean [2,3] ? Please correct it.

Comments for author File: Comments.pdf

Author Response

The currently revised edition includes answers for all the reviewer's comments, and We hope you will consider this paper as suitable for publication in your journal and look forward to hearing from you at your earliest convenience.

Author Response File: Author Response.pdf

Reviewer 2 Report

This is an interesting research paper. There are some suggestions for revision.

1. The motivation is not clear. Please specify the importance of the proposed solution.

2. Please highlight your contributions in introduction.

3. Please compare the pros and cons of existing solutions. 

4. Most of existing solutions are a little bit out of date. Please discuss more recently published solutions. 

5. As shown in Eq. 1, what are P_r and IOU?

6. As shown in Eqs. 1 and 2, please explain how to figure out that there is no object in the image.

7. As shown in Figs. 3 and 4, please explain how the time affects the behavior analysis.

8. As shown in Fig. 5, more technical details of ObjectCropper are necessary.

9. More technical details should be shown in Sections 3.3 and 3.4.

10. What is the experiment environment?

11. The experimental results are not convincing. Please compare the proposed solution with recently publishsed solutions. 

Author Response

The currently revised edition includes answers for all the reviewer's comments, and We hope you will consider this paper as suitable for publication in your journal and look forward to hearing from you at your earliest convenience.

Author Response File: Author Response.pdf

Reviewer 3 Report

 

Authors consider their realization of detecting and tracking persons in a selected region as a classifier of abnormalities. 

Critical notes: 

• There are multiple false claims like: "are not suitable for real-time surveillance to detect abnormal behavior because of very high computational complexity.". Most surveillance applications using deep learning can be used in real-time on multiple feeds. 
• Literature review and selection of object detection models are poor. There are multiple better models like CenterNet2, YOLOv4, EfficientDet ect. which are open-source and better than Yolov3. 
• Figure 3,4. present table and scheme which have no indexes - the only one sample is used? 
• There is no explanation of how the proposed scheme deals with objects crossing each other. 
• There is no comparison with other algorithms [3, 5, 6, 1, 2, 4] or commercial tools, nor comparison with other datasets. 

Recommendation: 
The authors are investigating the detection of anomalies in the video feed. However, the experiments contain no comparison within other known surveillance video datasets, nor compare with other methods from literature, nor compare their methods with commercial tools. 
The investigation provides no fundamental knowledge to the field of domain. The reviewer is convinced that investigation of detection and tracking in the video is not sufficient for the journal of Applied Sciences level. 

References:
[1] Jia-Chang Feng, Fa-Ting Hong, and Wei-Shi Zheng. Mist: Multiple in stance self-training framework for video anomaly detection. arXiv preprint arXiv:2104.01633, 2021. 
[2] Ammar Mansoor Kamoona, Amirali Khodadadian Gosta, Alireza Bab Hadiashar, and Reza Hoseinnezhad. Multiple instance-based video anomaly detection using deep temporal encoding-decoding. arXiv preprint arXiv:2007.01548, 2020. 
[3] Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly de tection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6479–6488, 2018. 
[4] Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. Weakly-supervised video anomaly de tection with robust temporal feature magnitude learning. arXiv preprint arXiv:2101.10030, 2021. 
[5] Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In European Conference on Computer Vision, pages 322–339. Springer, 2020. 
[6] Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and Ge Li. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1237–1246, 2019. 

Author Response

The currently revised edition includes answers for all the reviewer's comments, and We hope you will consider this paper as suitable for publication in your journal and look forward to hearing from you at your earliest convenience.

Author Response File: Author Response.pdf

Reviewer 4 Report

Thank you for sharing your work. My questions/comments are as follows:

  1. Line 85: Please explain why object detection doesn’t perform at each frame?
  2. Line 87: How the skipped frame numbers (12) were selected?
  3. Line 122: “If the object is determined to be same”, please explain how it is determined?
  4. Line 147: Please explain how “ a non-contiguous set” is detected ?
  5. Please explain the rationale behind selecting 10 second threshold for loitering detection?
  6. Line 160 – 168: It appears that for intrusion and loitering detection deep learning based approach is an overstatement as it can be achieved with simple approach as described in this section. Please comment on that.
  7. Line 173: Please confirm the number of training images. The number is very low for such complex behavior detection?
  8. Line 191: Bounding box-based detection to identify fall-down behavior may not work when the person is wheelchair assisted. Please comment on that.
  9. Line 221: Please capitalize the initial letter.
  10. Line 242 -244: Please explain “Algorithms were ….. compensated”.
  11. Line 275 – 276: What is meant by “… changing the dataset … “ ?

Author Response

The currently revised edition includes answers for all the reviewer's comments, and We hope you will consider this paper as suitable for publication in your journal and look forward to hearing from you at your earliest convenience.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

There are four suggestions for revision.

1. Most of references are a little bit out of date. Please discuss more recently published solution. 

2. There are only three equations. More technical details of the proposed solution should be given. 

3. The proposed algorithms should be formalized.

4. Please compare the proposed solution with the state-of-the-art solutions.

Author Response

The currently revised edition includes answers for all the reviewer's comments, and We hope you will consider this paper as suitable for publication in your journal and look forward to hearing from you at your earliest convenience.

Author Response File: Author Response.pdf

Reviewer 3 Report

Authors took into account some of the comments of the reviewer,
but there is still major issues with the manuscript before it can be recommended for publication.

* Eq. (1), (5), (6) '*' -> dot or x, '*' means convolution operation in using neural networks, must be fixed.
* Figure 3, 4, is not updated, no indexes to observation rows were added, must be fixed.
* Authors have not added relavent reseach paper, to the comparison, 
of other state-of-the art models. Models [1-6], provided similar or better 
results for similiar problem, but no mentioning, how authors models are in context of other authors results. So suggested results, have no baseline model to compared with, which provides no insights of the quality of the proposed solution.


References:
[1] Jia-Chang Feng, Fa-Ting Hong, and Wei-Shi Zheng. Mist: Multiple in stance self-training framework for video anomaly detection. arXiv preprint arXiv:2104.01633, 2021. 
[2] Ammar Mansoor Kamoona, Amirali Khodadadian Gosta, Alireza Bab Hadiashar, and Reza Hoseinnezhad. Multiple instance-based video anomaly detection using deep temporal encoding-decoding. arXiv preprint arXiv:2007.01548, 2020. 
[3] Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly de tection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6479–6488, 2018. 
[4] Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. Weakly-supervised video anomaly de tection with robust temporal feature magnitude learning. arXiv preprint arXiv:2101.10030, 2021. 
[5] Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In European Conference on Computer Vision, pages 322–339. Springer, 2020. 
[6] Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and Ge Li. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1237–1246, 2019. 

Author Response

The currently revised edition includes answers for all the reviewer's comments, and We hope you will consider this paper as suitable for publication in your journal and look forward to hearing from you at your earliest convenience.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

All my concerns have been addressed. I recommend this paper for publication.

Reviewer 3 Report

I'm satisfied with the updated manuscript.

Also update:
UCF-Crime.. -> UCF-Crime.

Back to TopTop