Next Article in Journal
A 5G Coverage Calculation Optimization Algorithm Based on Multifrequency Collaboration
Previous Article in Journal
Transform-Based Feature Map Compression Method for Video Coding for Machines (VCM)
 
 
Article
Peer-Review Record

Efficient Two-Stage Max-Pooling Engines for an FPGA-Based Convolutional Neural Network

Electronics 2023, 12(19), 4043; https://doi.org/10.3390/electronics12194043
by Eonpyo Hong *, Kang-A Choi and Jhihoon Joo
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Electronics 2023, 12(19), 4043; https://doi.org/10.3390/electronics12194043
Submission received: 8 August 2023 / Revised: 20 September 2023 / Accepted: 21 September 2023 / Published: 26 September 2023
(This article belongs to the Section Artificial Intelligence Circuits and Systems (AICAS))

Round 1

Reviewer 1 Report

The paper proposes to transform two-dimensional pooling into two one-dimensional pooling to reduce resource utilization. The paper provides a detailed description to the specific design, but it focuses on verifying the correctness of the design module, and lack of experimental comparative analysis.

There are some writing errors in the paper. For example,line 255:“The RTB-MAXP engine required 157,515 LUTs”,but the number of LUTs is 158,515 in the table 2; line 258: "RTM-MAXP" should be "RTB-MAXP" .

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Thank you very much for this interesting paper with very impressive results. 

However, I miss a few things in the results:

- Latency measurements (or simulations) using these engines, compared to other engines presented in the introduction.

- I would appreciate it if you could add more solid results related to the performance. You state that these engines can increase the speed by 16 times, but wrt which method? Please, give some numbers here. A comparison with a well-known GPU is always welcome.

- Power consumption/FPS (I understand that should not be an important difference between the CMB and RBM-MAXP engines).

- Is there any difference in performance between the two engines presented here?

- Could you please indicate if there is any precision loss when using these engines (or due to the quantization of weights and bias wrt to the float model).

-Minor: in some places I read "Vertex" instead of "Virtex"

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This article firstly proposes two max-pooling engines, named the RTB-MAXP engine and the CMB-MAXP engine, with a scalable window size parameter for FPGA-based convolutional neural network (CNN) implementation. It's good idea that  two one-dimensional max-pooling operations are performed by tracking the rank of the values within the window in the RTB-MAXP engine and cascading the maximum operations of the values in CMB-MAXP engine.I suggest the following revisions and resubmission based on this paper:

1. The sequence number of formula (2) needs to be aligned.

2. In Table 2, the resources used by LUTRAM are indicated. If they are not used, 0 should be marked. In addition, FF should be written as the full name.

3. What are the advantages and disadvantages of this max-pooling operation  compared to other max-poolingoperation in CNN.

4. Could you give a data comparison, such as whether there has been an improvement in data processing throughput or accuracy before the maximum pooling engine was used.

The english  wiriting can be smoothed.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

The paper has described an FPGA design to accelerate the max pooling operation used in CNN architectures.

1-     The paper does not consider the subsequent sub-sampling of the feature maps. Thus, the operation is essentially identical to the max filtering (rank order) operation.

2-     The operation in figure 1 needs further explanation. Since only the maximum value in a K-sized window is desired, why is the sorting operation being implemented? The max. value can be determined using log2(k) number of comparators for a k-sized window. Whereas, the proposed approach uses ‘k-1’ number of comparators in one dimension. Additionally, the proposed approach requires multiplexers to implement the shifting operation.  

3-     Generally, CNN architectures use pooling filter size of 2x2 or 3x3. The results given in Table 2, however, assume K = 13 which is too much subsampling to be used in any CNN architecture.

4-     Multiple efforts related to FPGA design of rank-order filtering have been reported in the literature. These must be cited and compared against the proposed approach. E.g.

a.      https://www.eetimes.com/cost-effective-two-dimensional-rank-order-filters-on-fpgas/

b.      Choo, Chang & Verma, Punam. (2008). A real-time bit-serial rank filter implementation using Xilinx FPGA. 6811. 10.1117/12.765789.

c.      http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1447576&dswid=9861

It is advisable to get the manuscript proofread by a native English speaker. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The novelty of the paper is insufficient. The supplementary references [17-19] for experimental comparative analysis are too outdated.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Thank you for the revised version, it looks good to me.

Author Response

Thank you for your dedicated effort on this paper.

Reviewer 4 Report

The authors have made significant changes to the manuscript and added comparison with reference works. In Table 3, however, zero number of slices have been shown to be utilized by reference work 17. Is this a typo? Please explain. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop