*5.2. Reliability Analysis*

Various sources of delay variation, for example, process variation, aging, temperature and voltage variations, can potentially lead to timing failures. Therefore, timing failure is presented as a stochastic metric, which is dependent on the value of an additional timing margin and hence the allocated clock period. Therefore, statistical information regarding the circuit delay can be used to evaluate the reliability. Accordingly, *Reliability* is the probability of not having a timing failure due to variation effects.

In a functional unit such as an ALU, the Failure Probability of a circuit caused by timing issues can be modeled as a function of the allowed time for instruction execution *T*, based on the delay distributions of the instructions:

$$\text{Reliability} = \text{CDF}\_{ALL}(T) = \prod\_{\texttt{INSST}}^{\text{\{all instructions\}}} \text{CDF}\_{delay, \texttt{INST}}(T), \tag{14}$$

where *CDFdelay*,INST is the Cumulative Distribution Function (CDF) of delay of instruction INST. Accordingly, the *Failure Probability* is obtained as:

$$\text{Failure Probability} = 1 - \text{Reliability}.\tag{15}$$

There is a trade-off between reliability and performance as explained in the above equation. A larger clock period (*T*) results in higher reliability and lower failure probability at the cost of speed.

In a multi-cycling scenario, the allowed time for instruction execution *T* is dependent on the number of cycles allocated by each instruction. For a single-cycle instruction *T* is equal to the clock period *Tclk*; however, a two-cycle instruction is allowed to be executed for 2*Tclk*. Therefore, Equation (14) is modified as follows:

$$\text{CDF}\_{ALLI}(T\_{clk}) = \prod\_{\text{INST}}^{\text{\{all instructions\}}} \text{CDF}\_{delay, \text{INST}}(n\_{\text{INST}} \times T\_{clk}). \tag{16}$$

In the above equation, *n*INST is the number of cycles allocated for instruction INST obtained based on the distribution of the instruction delays:

$$m\_{\text{INST}} = \lceil \frac{d\_{\text{INST}}}{T\_{clk}} \rceil. \tag{17}$$

*d*INST is the instruction delay considering the variation, i.e., a point in the tail of the delay distribution referring to very low failure probability. Please note that slow instructions have large logic depth, i.e., there are many gates in the critical paths of these instructions. According to the *Central Limit Theorem* [111], the delay distribution of such instructions is approximately normal (*Gaussian*). In such case, we can use parameters such as the mean (μ) and standard deviation (*σ*) of instruction delay to approximate its CDF function. Therefore, we may choose *μ* + 3*σ* of the instruction delay as *d*INST, which corresponds to less than 0.135% failure probability.

We perform SSTA to evaluate the impact of process variation [112] on the timing of the circuit. The SSTA tool reads variation information from the variation library (see [71]) containing variation information at the cell-level. The SSTA extracts accurate delay distribution for each instruction of the ALU represented by their CDF (i.e., *CDFdelay*,INST). Based on these distribution functions and Equation (16), we extract the reliability of the ALU.
