Next Article in Journal
Dual-Wavelength Mode-Locked Oscillation with Graphene Nanoplatelet Saturable Absorber in Erbium-Doped Fiber Laser
Next Article in Special Issue
Adaptive BIST for Concurrent On-Line Testing on Combinational Circuits
Previous Article in Journal
IoT-Based Motorbike Ambulance: Secure and Efficient Transportation
Previous Article in Special Issue
A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure
 
 
Article
Peer-Review Record

Design of Light-Weight Timing Error Detection and Correction Circuits for Energy-Efficient Near-Threshold Voltage Operation

Electronics 2022, 11(18), 2879; https://doi.org/10.3390/electronics11182879
by Xuemei Fan 1,*, Hao Liu 1,*, Hongwei Li 1, Shengli Lu 1 and Jie Han 2
Reviewer 1: Anonymous
Reviewer 3: Anonymous
Electronics 2022, 11(18), 2879; https://doi.org/10.3390/electronics11182879
Submission received: 10 August 2022 / Revised: 2 September 2022 / Accepted: 6 September 2022 / Published: 11 September 2022
(This article belongs to the Special Issue VLSI Circuits & Systems Design)

Round 1

Reviewer 1 Report

A timing-error-tolerant ETFF is proposed in this paper, where its nine-transistor node transition signal detector corrects timing issues. Essential route coverage and Flip-Flop activation rates are used to identify monitored locations. Simulations indicate a CNN accelerator can function at 1.1-0.3 V with a 3.5% area overhead. This design cuts overhead by 54.68 percent and enhances energy efficiency by 53.69 percent at 0.6 V. The proposed design reduces circuit overhead and increases the supply voltage range. Here are my comments:

1. I think the proposed design is novel; however, I am skeptical about the results regarding the performance of the proposed ETFF. Although post-layout simulations were conducted to assess the performance of the ETFF as applied in the CNN processor, the authors did not consider the influence of the pads and loading capacitance on the ETFF.  Since they conducted Monte Carlo and/or Worst-case/PVT simulations to check and see the functionality of the ETFF, they should provide histogram results and interpretations.

2. In Table 2, what is the schematic of the standard DFF that the authors are referring to?

3. Referring again to Table 2, there are many low-power FF that the ETFF can be compared. Moreover, to test the performance of the ETFF without doubt, it can be utilized as a shift register and capacitive loading as done in the following studies:

a. Murugasami, R., & Ragupathy, U. S. (2020). Performance analysis of clock pulse generators and design of low power area efficient shift register using multiplexer based clock pulse generator. Microelectronics Journal105, 104891.

b. Wang, C. C., Tolentino, L. K. S., Ekkurthi, U. K. N., Lou, P. Y., & Sampath, S. (2022). A 100-MHz 3.352-mW 8-bit shift register using low-power DETFF using 90-nm CMOS process. International Journal of Electronics Letters, 1-16.

c. Ekkurthi, U. K. N., Dasari, V., Akiri, J., & Wang, C. C. (2021, May). A 100 MHz 9.14-mW 8-Bit Shift Register Using Double-Edge Triggered Flip-Flop. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1-4). IEEE.

d. Yu, C. C., Chen, K. T., & Wun, J. Y. (2014). A Novel Design of Low-Power Double Edge-Triggered Flip-Flop. In Proceedings of the 2nd International Conference on Intelligent Technologies and Engineering Systems (ICITES2013) (pp. 947-955). Springer, Cham.

e. Wang, C. C., Sung, G. N., Chang, M. K., & Shen, Y. Y. (2010). Energy-efficient double-edge triggered flip-flop. Journal of Signal Processing Systems61(3), 347-352.

 

Author Response

Response letter to reviewer 1

We would like to thank the reviewers for their constructive comments on our manuscript. We have addressed all of the comments and made revisions in the re-submitted manuscript accordingly. The following are our response to reviewer 1 (Comments are italicized.):

  1. I think the proposed design is novel; however, I am skeptical about the results regarding the performance of the proposed ETFF. Although post-layout simulations were conducted to assess the performance of the ETFF as applied in the CNN processor, the authors did not consider the influence of the pads and loading capacitance on the ETFF.  Since they conducted Monte Carlo and/or Worst-case/PVT simulations to check and see the functionality of the ETFF, they should provide histogram results and interpretations.

Answer: Thank you for your valuable comments, we have added some details about the loading capacitance and hardware implement in Section 4.1 of our revised manuscript. The layout of the proposed ETFF design is generated by using the Cadence Virtuoso, following the standard cell design rules defined by the SMIC 40nm process technology. So, the design parameters of pads are also following default set of the SMIC 40nm process technology. Moreover, buffers are added for input signals and a load of a fanout-of-4 inverter (FO4) is used at the output, to simulate a real environment. The output load of the FO4 is also considered for power and delay evaluation, the value of which amounts to 2.8-3.2 fF in SMIC COMS 40nm process that we have used.

      We have added the results and discussion of 10K Monte Carlo simulation in Section 4.2 of our revised manuscript. Figure 10 has also been added to directly present some representative simulation results, as follows:

 

  1. In Table 2, what is the schematic of the standard DFF that the authors are referring to?

Answer:  The schematic of the standard DFF in Table 2 refers to a conventional transmission-gate flip-flop (TGFF) [28]. Thank you for pointing this problem out. We have added the reference [28] and revised “the standard DFF” into “the conventional transmission-gate flip-flop (TGFF)” to avoid making readers feel confused.

[28] Markovic D., Nikolic B. and Brodersen R. W., Analysis and design of low-energy flip-flops, In Proceedings of the IEEE International Symposium on Low Power Electronics and Design (ISLPED), Huntington Beach, CA, USA, 06-07 August 2001, pp. 52-55.

  1. Referring again to Table 2, there are many low-power FF that the ETFF can be compared. Moreover, to test the performance of the ETFF without doubt, it can be utilized as a shift register and capacitive loading as done in the following studies:
  2. Murugasami, R., & Ragupathy, U. S. (2020). Performance analysis of clock pulse generators and design of low power area efficient shift register using multiplexer based clock pulse generator. Microelectronics Journal105, 104891.
  3. Wang, C. C., Tolentino, L. K. S., Ekkurthi, U. K. N., Lou, P. Y., & Sampath, S. (2022). A 100-MHz 3.352-mW 8-bit shift register using low-power DETFF using 90-nm CMOS process. International Journal of Electronics Letters, 1-16.
  4. Ekkurthi, U. K. N., Dasari, V., Akiri, J., & Wang, C. C. (2021, May). A 100 MHz 9.14-mW 8-Bit Shift Register Using Double-Edge Triggered Flip-Flop. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1-4). IEEE.
  5. Yu, C. C., Chen, K. T., & Wun, J. Y. (2014). A Novel Design of Low-Power Double Edge-Triggered Flip-Flop. In Proceedings of the 2nd International Conference on Intelligent Technologies and Engineering Systems (ICITES2013) (pp. 947-955). Springer, Cham.
  6. Wang, C. C., Sung, G. N., Chang, M. K., & Shen, Y. Y. (2010). Energy-efficient double-edge triggered flip-flop. Journal of Signal Processing Systems, 61(3), 347-352.

 

Answer: Thanks for your suggestions and recommendation about the low-power double-edge triggered flip-flop (DETFF) used to realize the shift registers. As presented in the provided literature [b-e], a DETFF has two parallel data paths that operate in a single clock’s opposing phases to retain the throughput at a slower clock frequency. The design in the provided literature [a] focus on the clock pulse generation of shift registers.

However, we propose the timing error detection and correction (EDAC) design based on the structure of the conventional TGFF [28] to reduce the design margins to improve the energy-efficiency. In other words, the proposed EDAC circuit, namely the ETFF, is designed to retain the high clock frequency of circuits without the problem of delay. We have revised and added a more detailed introduction as follows:

“Conventional integrated circuit designs avoid the PVT-induced timing errors by reserving voltage and timing margins as a guard band. However, the conservative guard band causes the reduction of throughput and excessive cost of energy wasting [5], because a circuit does not always work in the worst case. Timing error-tolerant techniques based on the error detection and correction (EDAC) circuits have emerged as a promising solution [2-27]. The EDAC designs use the timing error detection (TED) circuits to monitor the timing conditions of circuits at run time. The timing error correction (TEC) circuits are de-signed to recover the timing errors resulting from the delay violations. Thus, the high operation frequency can be retained under the lower supply voltages. Moreover, the EDAC design can be used with the adaptive voltage frequency scaling technique to eliminate the excessive voltage and timing margins, further saving the energy consumption [3, 25, 26].” in Section 1 of our revised manuscript.

 

Thus, it may be unsuitable to directly compare the proposed EDAC design with the provided reference [a-e] which design the DETFF to realize a shift register. Appreciating your valuable suggestion, we will earnestly consider the feasibility of utilizing our proposed EDAC design as a shift register as provided literature [a].

 

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents a light-weight timing error detection and correction circuit to increase energy efficiency by scaling supply voltages to the near-threshold voltage region.

The proposed approach is explained. At NTSD and a ETFF circuits were designed (and simulated) on a SMIC 40 nm process. The results are discussed and compared to other work.

The results are convincing and I recommend the publication.

Author Response

Response letter to reviewer 2

Comments:

The paper presents a light-weight timing error detection and correction circuit to increase energy efficiency by scaling supply voltages to the near-threshold voltage region.

The proposed approach is explained. At NTSD and a ETFF circuits were designed (and simulated) on a SMIC 40 nm process. The results are discussed and compared to other work.

The results are convincing and I recommend the publication.

Answer:  We would like to appreciate the reviewer for the comments and recognition of our work in this manuscript. We have added some references and revision for experiments, discussion and evaluation and done the improvement of the English language and style in the revision.

 

Reviewer 3 Report

The first criticism is that the literature review is not comprehensive enough. Section I comprises many lumped references and the works are not even cited in an orderly manner. The authors must discuss each work in detail and eliminate such occurrences. Please, update the state-of-the-art analysis because most works are old.

Flip-flops have been thoroughly explored before. So, what can you say about your proposal compared with more conservative designs like [R1] or enhanced ones? Please, elaborate.

[R1] Sato T, Kunitake Y. A simple flip-flop circuit for typical-case designs for DFM. In8th International Symposium on Quality Electronic Design (ISQED'07) 2007 Mar 26 (pp. 539-544). IEEE.

[R2] Jain A, Veggetti AM, Crippa D, Benfante A, Gerardin S, Bagatin M. Radiation Tolerant Multi-Bit Flip-Flop System With Embedded Timing Pre-Error Sensing. IEEE Journal of Solid-State Circuits. 2022 Feb 23.

Other qualitative and quantitative issues like implementation complexity, delay, and power dissipation could be added to Table 5. A figure showing the logic signals applied to the devices is also missing.

Please, consider the assessment of throughput while taking into account the workload. Another important aspect is if possible to add a die photo of the test chip.

Author Response

Response letter to reviewer 3

We would like to thank you the reviewers for the constructive comments on our manuscript. We have addressed all of the comments and made revisions in the re-submitted manuscript accordingly. The following are our response to reviewer 2 (Comments are italicized.):

  1. The first criticism is that the literature review is not comprehensive enough. Section I comprises many lumped references and the works are not even cited in an orderly manner. The authors must discuss each work in detail and eliminate such occurrences. Please, update the state-of-the-art analysis because most works are old.

Answer: Thank you for your valuable comments and suggestions. We have added some state-of-the-art references and revised the literature review in Section 1, 2.2 and 2.3 of our revised manuscript. We also have reordered the references by the order in which they are cited. Moreover, the analysis and discussion of the related works have been mainly added in Section 2.2 and 2.3 of our revised manuscript. The added references are as follows:

 

  1. Jain, A.; Veggetti, A.M.; Crippa, D. ; et. al. Radiation Tolerant Multi-Bit Flip-Flop System with Em-bedded Timing Pre-Error Sensing. IEEE Journal of Solid-State Circuits 2022, 57, 2878-2890.
  2. Uytterhoeven, R.; Dehaene, W. Design Margin Reduction Through Completion Detection in a 28-nm Near-Threshold DSP Processor, IEEE Journal of Solid-State Circuits 2022, 57, 651-660.
  3. Sharma, P.; Das, B. P. Design and Analysis of Leakage-Induced False Error Tolerant Error Detecting Latch for Sub/Near-Threshold Applications, IEEE Transactions on Device and Materials Reliability 2020, 20, 366-375.
  4. Fan, X.; Wang, R.et. al. A Simple Steady Timing Resilient Sample Based on Delay Data Sense Detection, In Proceedings of IEEE International Conference on ASIC, Chongqing, China, 29 October-01 November 2019.
  5. Hao, Z.; Xiang, X.; Chen, C.; et. al. EDSU: Error detection and sampling unified flip-flop with ultra-low overhead, IEICE Electronics Express 2016, 13, 1-11.

 

  1. Flip-flops have been thoroughly explored before. So, what can you say about your proposal compared with more conservative designs like [R1] or enhanced ones? Please, elaborate.

[R1] Sato T, Kunitake Y. A simple flip-flop circuit for typical-case designs for DFM. In8th International Symposium on Quality Electronic Design (ISQED'07) 2007 Mar 26 (pp. 539-544). IEEE.

[R2] Jain A, Veggetti AM, Crippa D, Benfante A, Gerardin S, Bagatin M. Radiation Tolerant Multi-Bit Flip-Flop System With Embedded Timing Pre-Error Sensing. IEEE Journal of Solid-State Circuits. 2022 Feb 23.

Answer: The literature [R1] has been referred as the reference [18] in our revised manuscript. The design in [18] using the error prediction method (EP) to perform the timing error detection (TED). It introduces a margin â–³t by adding some buffers to anticipate possible delay violations. This design uses a main flip-flop sampling the data first and a canary flip-flop [18] sampling the data a time â–³t later. When comparing both values by an XOR gate, the error prediction assumes the last sample of the flip-flop is correct. If the input date arrives later, these two values will not be same and the XOR gate will generate an error signal. This method does not need the operation for timing error correction (TEC) which reduce the implement complexity. However, this also has some limits: It still contains redundant input copies, delay buffers and a comparator, which are not needed in the proposed ETFF design. Moreover, the delay of buffers has to be satisfy timing constraints, limiting the increase of energy-efficiency. Furthermore, voltage and timing margins are still needed, because the main part never causes timing errors. Thus, power reduction caused by the EP method is less than ones resulting from the double sampling comparison (DSC) method [15].

      The literature [R2] has been referred as the reference [20] in our revised manuscript. The design in [20] simplified the design in [20], which just uses a delay chain and an XOR-gate to reduce the power consumption. However, it still has the same characteristics as the design in [20]. We have qualitatively discussed the similarities and differences of three TED methods and different methods for timing error correction in Section 2.2.2 and 2.2.3 of our revised manuscript, as follows:

“The EP method used in [18-21] does not need the operation of the TEC, which reduce the implement complexity. Thus, this method is fundamentally different from the DSC method used in [2-8, 10, 13-17] and the DDTD method used in [10, 12, 22-24]. However, it still contains redundant input copies, delay buffers and a comparator the same as the DSC method, which are not needed in the DDTD method. Thus, comparing with other two methods, the DDTD method generally has the smallest implement complexity for the TED operation. Moreover, the delay of buffers in the EP-based designs has to be satisfy timing constraints, limiting the increase of energy-efficiency. Furthermore, voltage and timing margins are still needed in the EP-based designs, because the main part never causes timing errors. Thus, power reduction caused by the EP method is less than ones resulting from the DSC method [15]. However, the clock collator is required in the DSD-based de-signs to generate the CLK-d signal, which increase the implement complexity and power dissipation.”

 

  1. Other qualitative and quantitative issues like implementation complexity, delay, and power dissipation could be added to Table 5. A figure showing the logic signals applied to the devices is also missing.

Answer: The implementation complexity and power dissipation of different EDAC designs has been qualitatively discussed in Section 2.2. The quantitative comparisons of the numbers of transistors, the propagation delay, the detection delay, area and switching energy of the TGFF [28], the RFF design [13] and the proposed ETFF design have been presented in Table 2 of our revised manuscript.  

     The characteristics of the proposed ETFF design and other EDAC designs applied in NN accelerators have been presented in Table5. Science these EDAC designs have been applied in different systems of NN accelerators using various process technologies, it is unfair to directly and quantitatively compare their performance of area, delay and power dissipation.

     Figure 1 presents the implementation of inserting a proposed ETFF into the processing element (PE) circuit of a CNN accelerator. We have added the detail presentation of the application in Section 4.1, as follows:

“Each PE circuit is composed of a 16-bit fixed multiplier and adder (1/3/12 fixed) and the input and output registers built based on the structure of the TGFF. The proposed ETFF has been inserted in the circuit of data paths by replacing an original TGFF, as shown in Figure 1.”.

 

  1. Please, consider the assessment of throughput while taking into account the workload. Another important aspect is if possible to add a die photo of the test chip.

 

Answer: We have considered the workload and a real work environment by adding buffers for input signals and a load of a fanout-of-4 inverter (FO4) for the output. The output load of the FO4 is also considered for power and delay evaluation. Thank you for your valuable comments, we have added some details about the loading capacitance and hardware implement in Section 4.1 of our revised manuscript.

The performance of throughput is closely related to the delay and supply voltages. We have revised the analysis of these performances in Section 4.2.

We have done the pre-layout and post-layout simulation for the proposed ETFF design and applied it in the circuits of a CNN accelerator. However, the processes of the tape-out and testing silicon chips have not been done and we will consider these processes in our future work.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors already revised their paper based on the comments.

Back to TopTop