Next Article in Journal
BFD-YOLO: A YOLOv7-Based Detection Method for Building Façade Defects
Previous Article in Journal
Novel Application of Open-Source Cyber Intelligence
 
 
Article
Peer-Review Record

Throughput/Area-Efficient Accelerator of Elliptic Curve Point Multiplication over GF(2233) on FPGA

Electronics 2023, 12(17), 3611; https://doi.org/10.3390/electronics12173611
by Muhammad Rashid 1,*, Omar S. Sonbul 1, Muhammad Yousuf Irfan Zia 2,3,*, Muhammad Arif 4, Asher Sajid 5 and Saud S. Alotaibi 6
Reviewer 1:
Reviewer 2:
Electronics 2023, 12(17), 3611; https://doi.org/10.3390/electronics12173611
Submission received: 10 July 2023 / Revised: 19 August 2023 / Accepted: 24 August 2023 / Published: 26 August 2023

Round 1

Reviewer 1 Report

 

This paper proposes a hardware design for GF(2^233) computation. The approaches for reducing the computation time and hardware resource consumption are proposed. The experiments show that the design has overall better performance (which is evaluated by the proposed FoM metric) compared with other related designs. There are some concerns about the contribution and the writing as follows.

1. In Table 2, when comparing with work [6], although the proposed design spends fewer hardware resources, it has larger latency. Besides, the percentage of the hardware resource reduction is about the percentage of the latency overhead. This means this is more like a trade-off. The FoM is also similar (with limited improvement). Please discuss more about the advantages.

2. Some critical parts of the design, including Karatsuba multiplier, Itoh-Tsujii inversion algorithm, etc., are not newly presented by the authors. Although their implementations under the given application are given, but the challenges need to be clarified.

3. When describing the FSM, the format “state i” is used. This cannot give the core idea of the design. Please consider to remove the description of the state names, and only describe the three main states (In each state, you can say “29 clock cycles are required”, without saying this is state 7-36). By the way, “state 7-36” has 30 states, but not 29. Please check this in your paper.

4. In figure 5, what is the meaning of the number under each operator? Please clarify this in the paper.

 

Please double check the English by using some grammar tools.

Author Response

Please find the report attached.

Author Response File: Author Response.pdf

Reviewer 2 Report

This manuscript presents an FPGA implementation of elliptic-curve point multiplication accelerator over GF(2233). It is fairly evaluated that the proposed accelerator achieves a slightly higher throughput/area ratio than others.

However, the manuscript three major shortcomings on the reliability of description, which should be improved in a future version of the manuscript. 

First, the contributions of the manuscript were not properly argued. They should be distinctive from other studies and be evident by reading the manuscript. Particularly, the following points should be reconsidered:

- The efficiency of the FSM controller is neither discussed nor evaluated.

- The definition of the FoM is identical to the general throughput/area metrics.

Second, discussions on the adequecy of the proposed architecture was not enough. Though the authors discussed the use of the Karatsuba method in Section 3.3, why they preferred three levels of split (Fig. 4) was not clear. For those who are familiar with architectures of Xilinx FPGAs, it is natural to prefer four levels because the they have 25x18-bit built-in multipliers. Similarly, a block RAM with twelve 233-bit words (Fig. 1) seems to be very inefficient. As the maximum data width of a single BRAM block is 36 bits, the proposed block RAM requires 6.5 BRAM blocks. When making such counter-intuitive design choices, the reasons should be explained enough.

Third, additional discussions and/or evaluations on the results of the amount of hardware were required. As far as the reviewer knows, there are no significant differences in CLB (or slice) architecture among Virtex-5, Virtex-6, and Virtex-7. However, there are 1.5x difference in the number of slices between Virtex-5 and Virtex-6 implementations. Why? Showing a breakdown of amount of hardware by component (multiplier, FSM, etc.) might help the readers' understanding. It might also help prove the efficiency of the proposed components. In addition, the unit of power consumption on estimating the authors' accelerator is likely to be W, not mW. A circuit with thousands of slices, running in hundreds of MHz, usually consumes one to a few W.

The manuscript contains many grammatical and typographical errors. The meaning of sentences are understandable, but they have to be improved.

Some examples are as follows:

- throughput/area ECPM architecture (l. 101 and l. 109) -> throughput/area-efficient ECPM architecture

- throughout/slice (l. 103) -> throughput/slice

- these accelerators lack power consumption of ... (l. 128) -> these accelerators lack discussions on power consumption of ...

- Output: $Q = (x_q, y_q) = k \cdot P$ (Algorithm 1) -> $Q = (x_q, y_q) = d \cdot P$

- for ($i from m-2 down to 0$) do (Algorithm 1) -> for ($i from n-2 down to 0$) do

- aout[2], aout[4], aout[5] (Figure 2 (b)) -> sout[2], sout[4], sout[5]

Author Response

Please find the comments attached. 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors addressed my previous concerns

The authors addressed my previous concerns

Author Response

The file is attached. 

Author Response File: Author Response.pdf

Reviewer 2 Report

The reviewer confirmed that the authors diligently answered all of the questions and comments and revised their manuscript. The manuscript now became acceptable.

Author Response

The file is attached. 

Back to TopTop