Next Article in Journal
A Machine Learning Approach for the Forecasting of Computing Resource Requirements in Integrated Circuit Simulation
Previous Article in Journal
Training a Logic Dendritic Neuron Model with a Gradient-Based Optimizer for Classification
 
 
Article
Peer-Review Record

Multi-Scale Residual Aggregation Feature Pyramid Network for Object Detection

Electronics 2023, 12(1), 93; https://doi.org/10.3390/electronics12010093
by Hongyang Wang 1 and Tiejun Wang 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2023, 12(1), 93; https://doi.org/10.3390/electronics12010093
Submission received: 15 November 2022 / Revised: 16 December 2022 / Accepted: 21 December 2022 / Published: 26 December 2022

Round 1

Reviewer 1 Report

The introduction provides a lot of information without proper explanation. Figures 1,2 and 3 are meaningless without describing what is what on those images.

In the experiment description you use terms "Method" and "Backbone" that are in my opinion not explained well.

Please correct the references so that they are more uniform - for some entries there is only list of authors and title. Please provide full link as a text for some of the references.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper investigates the design of feature pyramid module applicable to different object detection architectures. 

The main weakness of this paper is the lack of novelty. The proposed feature pyramid module called UCLRAM is similar to the previously proposed BIFPN [1], DLA [2] and others. These related approaches are indeed mentioned in the paper, however, I would encourage the authors to discuss the differences more thoroughly. Furthermore, I am struggling to see the difference between the proposed CGM and squeeze-and-excite module from the literature [3]. Similarly, I suggest discussing the differences. If there are no differences, I suggest referring to this module as squeeze-and-excite module. Introducing a new name for an existing module does not help the community, but leads to confusion. 

The paper presents a strong evaluation study on two datasets: Pascal Voc and the proposed TKFD. To the best of my knowledge, there are no similar datasets to the proposed TKFD. I find it very interesting. However, at the present time, both datasets lack in size. I think that the evaluation section would be even stronger if results on COCO [4] were presented.

I am a little bit confused about the spatial resolutions of the features in the feature pyramid. This formula suggest that there are only three levels in a feature pyramid: P_in = ( P^in_2 , ...P^in_4), and that the most condensed level is at 1/16 resolution of the input image.  I believe that the default feature pyramid in the literature contains four levels and that the most condensed feature are at 1/32 resolution of the input image. Furthermore, I am not sure that this notation is followed in the rest of the paper. For example, in Figure 4 I am not sure if outputs from C_4 are at 1/16 or 1/32 of input resolution. I recommend to clarify this.

 

[1] Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, pp. 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079.

[2] Yu, F.; Wang, D.; Shelhamer, E.; Darrell, T. Deep Layer Aggregation. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412.

[3] Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141.

[4] LIN, Tsung-Yi, et al. Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, Cham, 2014. p. 740-755.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop