Next Article in Journal
IRBA: An Identity-Based Cross-Domain Authentication Scheme for the Internet of Things
Previous Article in Journal
Scaled GaN-HEMT Large-Signal Model Based on EM Simulation
 
 
Article
Peer-Review Record

A One-Cycle Correction Error-Resilient Flip-Flop for Variation-Tolerant Designs on an FPGA

Electronics 2020, 9(4), 633; https://doi.org/10.3390/electronics9040633
by Dam Minh Tung, Nguyen Van Toan and Jeong-Gun Lee *,†
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2020, 9(4), 633; https://doi.org/10.3390/electronics9040633
Submission received: 13 March 2020 / Revised: 27 March 2020 / Accepted: 6 April 2020 / Published: 10 April 2020
(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report


The papers requires certain clarifications of a few issues. Some comments are in the following.

1. The term "performance", applied in section 5, is used in a slightly misleading way. Assume, for example, we have a few stages pipelined multiplier with either your EDACFFs or simple FFs. It is expected that the better performance would have the multiplier which performs the multiplication operation with shorter time period or the one which executes more operations within a certain time unit. In the case of EDACFFs the clock is stopped for one cycle when the error occurs. This leads to the increase of overall calculations time. The more errors occurs, the overall time is longer. Even more, since the number of errors is virtually unpredictable, the overall calculations time (and the resulted performance conceived in the aforementioned way) is hard to be determined. Therefore, in the case of EEDACFF - contrary to plain FFs - the actual performance can not be directly referenced to the frequency of clock signal. Then, the authors should clearly define the meaning of the term "performance" in their application, or use some other, more obvious, terms, e.g. clock speed, clock frequency, etc.

2. The authors only measured the power consumption on the internal power line of the Sprartan-6 chip. However, this FPGA has also the auxiliary (AUX) power line (and IO power line, but this one is less important in this case). How is the power consumption on the AUX line in the case of EDACFF and simple FFs?

3. The authors claim that their circuit can work at clock frequency of 88MHz (in the 218 line, the 90MHz frequency is mentioned). How this frequency was determined? And what is a general procedure for the determination of maximum clock frequency in the case of the usage of ADACFFs?

4. What exactly are the CP and NCP circuits in Fig. 6. What is their structure (schematic, etc.)?

5. How the "data1",...,"data4" signal buses from Fig. 8 relate to the diagram from Fig. 6? Where are they on the diagram? Fig. 4 could also contain time units on the horizontal axis.

6. Some minor edits. In the 48 line, the words "latch" and "FF" should probably be swapped. In the 185 line, Spartan-6's slice contains, in fact, 8 FFs, not 4 FFs.

Author Response

The answer sheet is attached in a PDF format.

For better reading, please check the attached PDF document.

 

Comments from the reviewer:

The papers requires certain clarifications of a few issues. Some comments are in the following.

  1. The term "performance", applied in section 5, is used in a slightly misleading way. Assume, for example, we have a few stages pipelined multiplier with either your EDACFFs or simple FFs. It is expected that the better performance would have the multiplier which performs the multiplication operation with shorter time period or the one which executes more operations within a certain time unit. In the case of EDACFFs the clock is stopped for one cycle when the error occurs. This leads to the increase of overall calculations time. The more errors occurs, the overall time is longer. Even more, since the number of errors is virtually unpredictable, the overall calculations time (and the resulted performance conceived in the aforementioned way) is hard to be determined. Therefore, in the case of EEDACFF - contrary to plain FFs - the actual performance can not be directly referenced to the frequency of clock signal. Then, the authors should clearly define the meaning of the term "performance" in their application, or use some other, more obvious, terms, e.g. clock speed, clock frequency, etc.
  2. The authors only measured the power consumption on the internal power line of the Spartan-6 chip. However, this FPGA has also the auxiliary (AUX) power line (and IO power line, but this one is less important in this case). How is the power consumption on the AUX line in the case of EDACFF and simple FFs ?
  3. The authors claim that their circuit can work at clock frequency of 88MHz (in the 218 line, the 90MHz frequency is mentioned). How this frequency was determined? And what is a general procedure for the determination of maximum clock frequency in the case of the usage of EDACFFs ?
  4. What exactly are the CP and NCP circuits in Fig. 6. What is their structure (schematic, etc.)?
  5. How the "data1",...,"data4" signal buses from Fig. 8 relate to the diagram from Fig. 6? Where are they on the diagram? Fig. 4 could also contain time units on the horizontal axis.
  6. Some minor edits. In the 48 line, the words "latch" and "FF" should probably be swapped. In the 185 line, Spartan6's slice contains, in fact, 8 FFs, not 4 FFs.
  1. The term "performance", applied in section 5, is used in a slightly misleading way. Assume, for example, we have a few stages pipelined multiplier with either your EDACFFs or simple FFs. It is expected that the better performance would have the multiplier which performs the multiplication operation with shorter time period or the one which executes more operations within a certain time unit. In the case of EDACFFs the clock is stopped for one cycle when the error occurs. This leads to the increase of overall calculations time. The more errors occurs, the overall time is longer. Even more, since the number of errors is virtually unpredictable, the overall calculations time (and the resulted performance conceived in the aforementioned way) is hard to be determined. Therefore, in the case of EDACFF - contrary to plain FFs - the actual performance can not be directly referenced to the frequency of clock signal. Then, the authors should clearly define the meaning of the term "performance" in their application, or use some other, more obvious, terms, e.g. clock speed, clock frequency, etc.

 

[Author‘s reply]

Thanks for your constructive comment.

Your assumption could be correct in the worst case condition. However, in typical- or best-case conditions, the proposed EDACFF has the benefit of higher performance because of higher clock frequency we can have thanks to the optimized cycle time without considering worst case timing margin. In order to avoid misleading way in all cases, we explain more about the benefit of EDACFF as follows:

(In page 2, the fifth paragraph of the section, “1. Introduction”)

In summary, with a timing error resilience technique, we can have the benefits of higher performance and energy-efficiency with some minor area overhead thanks to the reduced timing margin when there is no errors with this margin. However, in the case of having more timing errors, the benefits are gradually reduced because of a correction overhead. The term “performance” in our paper is defined as average-case timing performance. The average timing performance is higher than worst-case timing performance since circuits work with the clock frequency optimized for typical operating condition. The worst cases of Process-Voltage-Temperature operation conditions that can induce errors happen rarely.

 

  1. The authors only measured the power consumption on the internal power line of the Spartan-6 chip. However, this FPGA has also the auxiliary (AUX) power line (and IO power line, but this one is less important in this case). How is the power consumption on the AUX line in the case of EDACFF and simple FFs ?

[Author‘s reply]

We thank the reviewer for your constructive comment.

In Spartan-6 FPGA chip, an internal power line (VCCINT) is 1.2 voltage and the auxiliary (AUX) power line is either 2.5 or 3.3 voltages. EDACFF only utilizes resources that are supplied by the internal power line of 1.2 voltage. Therefore, the power consumptions of EDACFF and simple FFs on the AUX line are almost same.

 

  1. The authors claim that their circuit can work at clock frequency of 88MHz (in the 218 line, the 90MHz frequency is mentioned). How this frequency was determined? And what is a general procedure for the determination of maximum clock frequency in the case of the usage of EDACFFs ?

 

[Author‘s reply]

Thanks for the constructive comments.

Our circuit works well at clock frequency of 88MHz. When increasing the clock frequency to 90MHz, our proposed EDACFF operates at the point of first failure (PoFF) with 1.2 V power supply. In our methodology, in order to determine the clock frequency, we first find the “point of first failure” frequency (90MHz in our case) and then set the clock frequency by subtracting a minimal frequency margin (2MHz in our case) from the PoFF frequency.

In our work, a clock frequency is synthesized using the direct frequency synthesis feature of a digital clock manager (DCM). A DCM is employed in the clock generator circuit to provide a programmable frequency which can be configured with 2 MHz granularity (This is reason why we choose 2MHz for minimal frequency margin).

 

 

  1. What exactly are the CP and NCP circuits in Fig. 6. What is their structure (schematic, etc.)?

 

[Author‘s reply]

We thank the reviewer for your constructive comment.

In Fig.6, CP and NCP mean “critical path” and “non-critcal path” circuits, respectively. Since the circuit structure in Fig. 6 is for testing the proposed EDACFF, the circuits in CP and NCP are implemented by the dummy functional logics which are constructed by the delay elements. Those delay elements are implemented by cascading multiple look-up table (LUT) resources on an FPGA. Obviously, the number of LUT as delay elements in CP are higher than NCP.

 

( In page 8, the 2nd paragraph of “4. Circuit Structure of Testing EDACFF”)

In Fig.6, CP and NCP mean “critical path" and “non-critcal path" circuits, respectively. Since the circuit structure in Fig. 6 is for testing the proposed EDACFF, the circuits in CP and NCP are implemented by the dummy functional logics which are constructed by the delay elements. Those delay elements are implemented by cascading multiple look-up table (LUT) resources on an FPGA. Obviously, the number of LUT as delay elements in CP are higher than NCP. Then, EDACFFs are inserted at the endpoint of the critical paths in the first, second and fourth stages.

 

 

  1. How the "data1",...,"data4" signal buses from Fig. 8 relate to the diagram from Fig. 6? Where are they on the diagram? Fig. 4 could also contain time units on the horizontal axis.

[Author‘s reply]

We thank the reviewer for your constructive comment.

"data1", "data2", "data3", "data4" are the data values which are captured by EDACFFs and traditional FF at the first, the second, the third, and the four stages of the pipeline, respectively. Note that the datapaths of the first, the second, and the fourth stages are made as critical paths in the pipeline design. Therefore, EDACFFs are inserted at the endpoint of those critical paths in the first, second and fourth stages.  The traditional FF is inserted at the third stage.

In Fig.4, we just show the behavior of the proposed EDACFF circuit. Hence, in our opinion, it is not necessary to add the time units on horizontal axis.

 

( In page 8, the second paragraph of the section, “5. Experimental Results”)

Fig. 8 shows post-layout simulation results of the design. The “data1", “data2", “data3", “data4" in the waveform viewer are the data values which are captured by EDACFFs and traditional FF at the first, the second, the third, and the four stages of the pipeline, respectively. Note that the datapaths of the first, the second, and the fourth stages are made as critical paths in the pipeline design. Therefore, EDACFFs are inserted at the endpoint of those critical paths in the first, second and fourth stages. The traditional FF is inserted at the third stage.

 

  

  1. Some minor edits. In the 48 line, the words "latch" and "FF" should probably be swapped. In the 185 line, Spartan6's slice contains, in fact, 8 FFs, not 4 FFs.

 

[Author‘s reply]

We thank the reviewer for your constructive comment.

We edited in the line 48 and line 185 as follows:

Line 48: “Therefore, it is not recommended to replace a FF by a latch in FPGA deigns”

Line 185: “It consists of four look-up tables (LUTs) and eight FFs/Latches which share a same clock line”

Author Response File: Author Response.pdf

Reviewer 2 Report

In this paper, the authors proposed an error detection and correction flip-flop for variation-tolerant and error-resilient circuit designs on an FPGA. The proposed design occupies only one slice and it is capable of correcting an error within a single clock cycle. The paper is interesting but revision is needed due to some drawbacks.

  1. The results of the paper are not clearly presented. The authors should present comparisons with other similar techniques in terms of FPGA resources usage, performance, and power consumption. 
  2. Table 1 should be reorganized and better commented because it has many similar comparisons and very strange row Design effort.
  3. The part Discussion is needed for the paper. The authors should discuss the results and how they can be interpreted in the perspective of previous studies and of the working hypotheses. Future research directions may also be highlighted.

 

Author Response

The answer sheet is attached in a PDF format.

For better reading, please check the attached PDF document.

 

Comments from the reviewer:

In this paper, the authors proposed an error detection and correction flip-flop for variation-tolerant and error-resilient circuit designs on an FPGA. The proposed design occupies only one slice and it is capable of correcting an error within a single clock cycle. The paper is interesting but revision is needed due to some drawbacks.

  1. The results of the paper are not clearly presented. The authors should present comparisons with other similar techniques in terms of FPGA resources usage, performance, and power consumption.
  2. Table 1 should be reorganized and better commented because it has many similar comparisons and very strange row Design effort.
  3. The part Discussion is needed for the paper. The authors should discuss the results and how they can be interpreted in the perspective of previous studies and of the working hypotheses. Future research directions may also be highlighted.

 

  1. The results of the paper are not clearly presented. The authors should present comparisons with other similar techniques in terms of FPGA resources usage, performance, and power consumption.

 [Author‘s reply]

We thank the reviewer for your constructive suggestion.

Unfortunately, there are no similar works that have been implemented on an FPGA. Therefore, we cannot provide more comparison results. Please note our work has been done on an FPGA. We are sorry for this.

 

 

  1. Table 1 should be reorganized and better commented because it has many similar comparisons and very strange row Design effort.

[Author‘s reply]

Thanks for the constructive comments.

For better understanding Table 1, we revised our manuscript to include more explanation for Table 1 as follows:

 

( In page 10, the last paragraph)

Table 1 compares our proposed EDACFF with other previous EDAC techniques. The main benefits of our proposed design are that it is fully supported by standard cells and it is based on the traditional FF. Therefore, it is suitable for a commercial FPGA circuit synthesis tool. The row “Design Effort” in Table 1 shows that our design requires less “design effort” compared with other timing error resilience techniques since their works have been designed in a custom design style using a typical ASIC design flow. Moreover, compared with previous works in [8] and [12], they need more than one clock cycle penalty for a detected timing error. On the other hand, our design consumes only one cycle penalty. The work in [13] also needs one clock cycle for detecting but their design is a latch-based design. A latch is not recommended to use in an FPGA due to the difficulty of meeting timing constraints which should be satisfied in the latch-based design. Both the works in [9] and [11] consume only one clock cycle penalty. However, the work in [9] needs a metastability detector and the work in [11] incurs a large area overhead due to the large number of half-path points of critical paths.

 

  1. The part Discussion is needed for the paper. The authors should discuss the results and how they can be interpreted in the perspective of previous studies and of the working hypotheses. Future research directions may also be highlighted.

[Author‘s reply]

We thank for your valuable comments. We revised our manuscript to include the Discussion part as follows

 

( In page 11, new section, “6. Discussion”, the first paragraph)

Generally, it is hard to directly compare our proposed EDACFF circuit which is implemented on FPGA with other previous EDAC techniques. This is because most of their works have been implemented in ASICs. However, since the previous EDAC circuits [8-9], [11-13] are designed in a custom design style using an ASIC design flow, their works need much manual circuit optimization. Moreover, when porting them to a new technology, they cost much design efforts for the redesigns on the new technology. This leads to the increase of the design times and costs. Furthermore, the implementation of a metastability detector and using a latch instead of a FF in FPGAs are very difficult due to the timing closure with commercial timing tools. On the other hand, our work focuses on the implementation the timing error resilience technique on an FPGA and we propose suitable architecture of the EDACFF on an FPGA that can be automatically synthesized, placed and routed with commercial computer-aided design (CAD) tools.

 The future work will be developing an industrial digital SoC application with the proposed scheme and exploring in-depth and detailed power-performance design space with the real applications. Finally, automatizing such an exploration and optimization could be another possible future work.

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Thank you for the revision. I think the paper can be accepted in present form.

Back to TopTop