1. Introduction
To develop reliable circuits, a traditional synchronous circuit must have a large timing margin to ensure the correct operation under worst-case timing conditions. It means that an appropriate timing margin is added to a clock period to cover the worst-case circuit propagation delays. However, among most of the circuit operation time, the worst-case timing margin is not fully used since the worst-case rarely happens in practice. Therefore, the worst-case timing margin causes higher throughput loss and lower energy efficiency of a design in typical- or best-case conditions.
To minimize the timing margins, many techniques have been proposed for tolerating a timing error that happens in the circuit with the minimal margin. With the help of the techniques, timing margins can be reduced significantly. Generally, the techniques can be categorized into two groups: timing error prediction (TEP), and error detection and correction (EDAC).
TEP circuits [
1,
2,
3,
4,
5,
6] predict a potential error by monitoring data signals. It flags a warning signal whenever the delayed data signals enter an erroneous timing zone that is defined with a clock signal. Then, designer can adjust the supply voltage or clock frequency to ensure correct operation at the edge of predicting a failure. As a result, the output of main flip-flop (FF) always captures the correct data and it does not need any correction. However, in this technique, the timing margin can only be minimized to reserve the enough margin for the correct operation of the main FF. Otherwise, EDAC techniques [
7,
8,
9,
10,
11,
12,
13,
14,
15] detect an actual timing error by monitoring critical paths for late arriving data transitions. Then, it uses extra correction circuits to correct the actually happened error.
In summary, with a timing error resilience technique, we can have the benefits of higher performance and energy efficiency with some minor area overhead thanks to the reduced timing margin when there is no errors with this margin. However, in the case of having more timing errors, the benefits are gradually reduced because of a correction overhead. The term “performance” in our paper is defined as average-case timing performance. The average timing performance is higher than worst-case timing performance since circuits work with the clock frequency optimized for typical operating condition. The worst cases of Process-Voltage-Temperature operation conditions that can induce errors happen rarely.
Generally, most of EDAC circuits are designed in a custom design style and they are implemented in ASIC. They need to optimize their physical designs to satisfy some timing constraint. Moreover, when porting them to a new technology, they cost some efforts for the redesigns on the new technology, leading to the increase of the design times and costs. On the other hand, in an FPGA-based design, it is difficult to implement their designs due to the predefined circuit structures and the un-customized place-and-route (P&R) in FPGAs. For instance, the design of a metastability detector in FPGAs is a critical problem. In addition, the replacement of a traditional FF by a latch in a datapath causes the problem of timing closure in FPGAs. This is because a latch-based design is difficult to meet timing closure with commercial timing tools. Therefore, it is not recommended to replace an FF by a latch in FPGA deigns.
In this paper, for variation-tolerant designs on an FPGA, we propose a metastability-immune error detection and correction flip-flop (EDACFF) working with a one-clock-cycle penalty. The metastability problem can be resolved by imposing a proper timing constraint on a design. Our proposed EDACFF is fully supported by standard cells and it is based on the traditional FF. Therefore, it is suitable with a commercial synthesis tool for an FPGA circuit design. Consequently, it can be ported easily to other process technologies with much less design efforts when compared with other timing error resilience techniques.
The remainder of this paper is organized as follows. In
Section 2, we discuss the related works about an EDAC circuit. Next, in
Section 3, we propose an EDACFF. In this section, the metastability issue is also considered.
Section 4 shows the testing structure for verifying the functional correction of the proposed EDACFF.
Section 5 provides the simulation and measurement results from the implementation of the proposed EDACFF on a Spartan-6 FPGA device (XFC6SLX45).
Section 6 discusses the presented experimental results and possible future work. Finally,
Section 7 concludes the paper.
2. Related Works
In general, traditional EDAC approaches have used ASIC-style implementation. They can be grouped into two categories: (a) FF-based designs and (b) latch-based design.
(a) FF-based Designs [
7,
8,
9,
10,
11]: Razor in [
7] detects an error and recomputes computation to recover the correct results at a reduced clock rate with some minor performance degradation. It includes a main FF, a shadow latch, a multiplexer, and a XOR gate. The XOR gate plays the role of comparing the outputs of the main flip-flop that samples data at a rising clock and the shadow latch that is clocked by the delay clock. Since the output of the main FF can be in a metastable state, the output of XOR gate can be in a metastable state too. Therefore, Razor needs a metastability detector at the output of the FF to guarantee a stable output after the detection for a reliable design.
In [
8], a light-weight error detection register using virtual supply rails occupies small area overhead since it requires only eight extra transistors along with a traditional FF. However, Razor-Lite adopts an instruction replay to correct occurred errors and it leads to the high-performance penalty up to 11 clock cycles per correction. In [
9], a low-overhead transition detector (TD) with a 9-transistor current sensing circuit is proposed. TDs are inserted at the half-path points of critical paths and TDs predict possible timing errors based on the timing behavior observed at the mid-points of critical paths. So that the timing error in the current clock cycle can be prevented before the real timing violation that can be happened at the endpoint of the critical paths. Thus, it does not need an error correction circuit. However, this design incurs a large area overhead and it needs a significant design effort due to the large number of half-path points of critical paths.
In [
10], a timing error tolerant (TET) flip-flop was proposed. It consists of a transition detection unit for an error detection and an FF with preset/clear options for error correction. Whenever an error is detected, the output of the FF is preset to “1” or clear to “0” depending on the input value of the FF. However, this design costs area overhead due to the circuits for generating the preset and clear signals. In particular, this design cannot be implemented in an FPGA since a D-FF structure has only one signal line for presetting or clearing in an FPGA. In [
11], an EDAC technique is proposed with a new bit flipping FF. Whenever a timing error is detected, it is corrected by complementing the output of the corresponding FF. However, their design requires a metastability detector to detect the metastability that can be occurred at the output of the FF. Their design is prototyped in a MIPS microprocessor core on an FPGA, but the metastability detector is not implemented in their demonstration.
(b) Latch-based Designs [
12,
13,
14,
15]: Razor II [
12] is another version of the Razor where a transition detector is used to detect errors. Similar to Razor, it detects timing errors after they actually occur, and it corrects the timing violations using an architectural replay mechanism. A current-based timing error detector was proposed in [
13]. It costs a very small area overhead since it requires only 3 additional transistors which are embedded in the FFs that is located on potential critical paths. Bubble Razor in [
14] totally replaces the FFs in pipeline stages by the latches. It corrects an error within one clock cycle by sending stall signals to neighboring stages. However, it significantly increases design complexity due to the complex control logic. In [
15], a simple error detection latch (EDL) which includes a positive latch and a transition detector is proposed for variation-tolerant designs in ultra-low voltage circuits. Their design has the ability of recovering an error within one clock cycle. However, their design incurs area overhead for padding buffers on short paths.
Surveying through literature, most of the traditional EDAC approaches are designed in a custom design style at a transistor level. Meanwhile, the FPGA chip is a predefined LUT-based fabric and a designer mainly focuses on functional design without considering detailed layouts. It leads to the difficulty of a fine-grain timing control. Moreover, the EDACFF-based design requires a metastable detector which is a component hard to implement in an FPGA. On the other hand, an EDAC latch-based design is not recommended to use in FPGA due to the difficulty of meeting timing violations happen in the latch-based design. Traditional EDAC approaches are not suitable in current FPGAs.
4. Circuit Structure of Testing EDACFF
Figure 6 shows a block diagram of testing an EDACFF circuit. It consists of three parts: a 5-stage pipeline design (error-free) as a reference, a 5-stage pipeline with EDACFF as our proposed design, and control circuit. The 5-stage reference pipeline employs traditional FFs without combinational circuits between the stages. Therefore, under the PVT variations, all FFs always sample correct values. In the 5-stage pipeline design with EDACFF, a 3-bit counter is used to generate 3-bit input data. The datapaths of the first, the second, and the fourth stages are made as critical paths in the pipeline design.
In
Figure 6, CP and NCP mean “critical path” and “non-critical path” circuits, respectively. Since the circuit structure in
Figure 6 is for testing the proposed EDACFF, the circuits in CP and NCP are implemented by the dummy functional logics which are constructed by the delay elements. Those delay elements are implemented by cascading multiple look-up table (LUT) resources on an FPGA. Obviously, the number of LUT as delay elements in CP are higher than NCP. Then, EDACFFs are inserted at the endpoint of the critical paths in the first, second and fourth stages.
During the circuit operating time, because of PVT variations, a timing violation may happen and may be detected in EDACFFs. The other stages use traditional FFs. Finally, the value of the last pipeline stage is compared with “always correct value” of the 5-stage reference pipeline. If they are equal to each other, it means that the functionality of the proposed EDACFF is correct, and the error monitoring signal will not be flagged. The control circuit includes the control and clock generator circuits where their operations are explained in detail before in
Section 3.2. In our approach, the direct frequency synthesis feature of a DCM is employed in the clock generator circuit to provide a programmable frequency which can be configured with 2 MHz granularity.
5. Experimental Results
The experimental setup for measuring the performance and the power consumption of the proposed design is presented in
Figure 7. It consists of an oscilloscope, a DC power supply and a Spartan 6 FPGA board.
Figure 8 shows post-layout simulation results of the design. The “data1”, “data2”, “data3”, “data4” in the waveform viewer are the data values which are captured by EDACFFs and traditional FF at the first, the second, the third, and the four stages of the pipeline, respectively. Please note that the datapaths of the first, the second, and the fourth stages are made as critical paths in the pipeline design. Therefore, EDACFFs are inserted at the endpoint of those critical paths in the first, second and fourth stages. The traditional FF is inserted at the third stage.
When we increase a clock frequency, the clock cycle time is reduced. Therefore, the timing error happens and the EDACFF samples the wrong data (e.g., “data2” samples the wrong value “3” instead of “2”). Hence, the “Final_Error” signal is flagged and then the “ERR_Signal” and the “Switch_Signal” go high to gate/stall one clock cycle for an error recovery. As a result, at the next clock cycle, all FFs sample the correct data as shown in
Figure 8. It is noteworthy that an “Error_monitoring” signal keeps low during the whole simulation time. It means that our proposed EDACFF works correctly even though a timing error occurs inside circuits.
Figure 9 shows the measured signal waveforms of the proposed design. The frequency of the baseline design is 62 MHz at the supply voltage of 1.2 V. Our proposed EDACFF can operate at the point of first failure (PoFF) with working frequency of 90 MHz under the typical operating condition (1.2 V power supply and room temperature). As shown in
Figure 9, at the PoFF, the timing error occurs and it causes an ERR signal to go high in order to gate one clock cycle for error correction. Then, the data are sampled correctly at the next clock cycle. In the measurement as well, “Error_Monitoring” signal keeps low and it shows that our proposed EDACFF works correctly in a real situation. Recall that the “Error_Monitoring” signal goes high when the output values of the 5-stage reference pipeline (error-free) and the 5-stage pipeline with our EDACFF are different to each other.
Figure 10 shows the benefit of our proposed design compared with a typical worst-case design. The frequencies of both cases are compared at the same supply voltage of 1.2 V. The power consumptions for the both cases are evaluated and compared at the same frequency of 62 MHz. At 1.2 V, the baseline frequency of the design is 62 MHz (worst-case performance). However, by employing the proposed EDACFF in the pipeline design, our design can work at the clock frequency of 88 MHz (best performance with EDACFF). Therefore, we can obtain 42% performance improvement. On the other hand, we have measured the dynamic power consumption of our proposed design and the worst-case design at the same frequency of 62 MHz. As shown in the right axis in
Figure 10, at 1.2 V, the power consumption of the worst-case design is 10.8 mW (9 mA current consumption). On the other hand, our proposed design can operate at a low voltage supply of 1.05 V with 7.35 mW (7 mA current consumption) power consumption at the same performance. Thus, it can save in 32% power consumption.
Table 1 compares our proposed EDACFF with other previous EDAC techniques. The main benefits of our proposed design are that it is fully supported by standard cells and it is based on the traditional FF. Therefore, it is suitable for a commercial FPGA circuit synthesis tool. The row “Design Effort” in
Table 1 shows that our design requires less “design effort" compared with other timing error resilience techniques since their works have been designed in a custom design style using a typical ASIC design flow. Moreover, compared with previous works in [
8,
12], they need more than one clock cycle penalty for a detected timing error. On the other hand, our design consumes only one cycle penalty. The work in [
13] also needs one clock cycle for detecting but their design is a latch-based design. A latch is not recommended to use in an FPGA due to the difficulty of meeting timing constraints which should be satisfied in the latch-based design. Both the works in [
9,
11] consume only one clock cycle penalty. However, the work in [
9] needs a metastability detector and the work in [
11] incurs a large area overhead due to the large number of half-path points of critical paths.
6. Discussion
Generally, it is hard to directly compare our proposed EDACFF circuit which is implemented on FPGA with other previous EDAC techniques. This is because most of their works have been implemented in ASICs. However, since the previous EDAC circuits [
8,
9,
11,
12,
13] are designed in a custom design style using an ASIC design flow, their works need much manual circuit optimization. Moreover, when porting them to a new technology, they cost much design efforts for the redesigns on the new technology. This leads to the increase of the design times and costs. Furthermore, the implementation of a metastability detector and using a latch instead of an FF in FPGAs are very difficult due to the timing closure with commercial timing tools. On the other hand, our work focuses on the implementation the timing error resilience technique on an FPGA and we propose suitable architecture of the EDACFF on an FPGA that can be automatically synthesized, placed and routed with commercial computer-aided design (CAD) tools.
The future work will be developing an industrial digital SoC application with the proposed scheme and exploring in-depth and detailed power-performance design space with the real applications. Finally, automatizing such an exploration and optimization could be another possible future work.