Lightweight Pedestrian Detection Based on Feature Multiplexed Residual Network
Round 1
Reviewer 1 Report
In the analysis performed in the work, the application of the traditional convolution leads to losses of start-up formation and implicitly to the loss of information in the transmission process.
By using convolution 1×1 it will be maximized the flow of information between all layers in the network. Thus, it will extract the characteristics of pedestrians and it will be possible to build a network for extracting the characteristics using the multiplexed connected residual structure.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The article is about intelligent pedestrian detection. The authors proposed an innovative method.Please answer a few questions:
1. Introduction
a). „The design of lightweight networks can improve” – what are these networks? Please specify your sentence.
2. Related Works
a). „Single stage detection model” and “Lightweight attention mechanism” have the same number 2.1.
3.1. Overall networks
a). What causes the size of the input image to be 416x416x3? Why was this value adopted?
3.2. Multiplexing Connection Residuals
a). He et al. – I do not understand this …
b). pattern (1) – what does "W1" mean?
3.4. Loss Function
a). Patterns (7), (8), (9) – please explain the meaning of the index „l” (e.g., yl, cl,).
4.1. Experimental Data Set and Experimental Parameters
Did the authors include fog in the selected BDD dataset? What will the results look like in the case of low visibility on the road?
4.2. Evaluation indicators
„Table 6. Caltech Pedestrian dataset….” and “Table 6. BDD 100K-Person dataset…” - please correct table numbers.
6. Conclusions
a). „Experimental results show that the pro-posed method is lighter than YOLO v3-tiny, with only 17.6MB of parameters.” - In Table 6, the size of the model is 17.2 MB. Are 17.6 MB and 17.2 MB two different parameters? Please explain.
b). Does the computing power of the computer affect the number of parameters (MB) in the network model?
c). I did not find information about pedestrian detection times in the text. Is the detection time comparable for all presented methods?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Dear authors,
The overall work of this paper is presented really good.
There is a room for improvement in the citation and referencing. For example, the VGG network, in page 2, should be referenced.
The theoretical parts, and the experimental parts are written and presented pretty well.
However, the greatest problem of this work is that there is no sufficient comparison with the state-of-the-art of object detection. Networks like yolov5, yolox, yolov6 or even newer should be compared with the proposed algorithm.
Last but not least, the fps seem to be really advantageous in comparison with the other models of table 6. How do you explain this superiority? A more in-depth analysis related to the inference speed would certainly ameliorate the quality of your work.
In summation, the work of this paper is presented really well. Apart from some minor improvements, it lacks a thorough comparison with the state-of the-art object detection models.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
Thank you for your answers.
The publication requires minor changes, but I accept the publication in its current form.