Standard Cell Sizing for Worst-Case Performance Optimization Considering Process Variation in Subthreshold Region

Cao, Peng; Guo, Jingjing

doi:10.3390/electronics13224477

Open AccessArticle

Standard Cell Sizing for Worst-Case Performance Optimization Considering Process Variation in Subthreshold Region

by

Peng Cao

^1,*

and

Jingjing Guo

²

¹

National ASIC System Engineering Center, Southeast University, Nanjing 210096, China

²

College of Integrated Circuit Science and Engineering, Nanjing University Posts and Telecommunications, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4477; https://doi.org/10.3390/electronics13224477

Submission received: 3 October 2024 / Revised: 11 November 2024 / Accepted: 14 November 2024 / Published: 14 November 2024

(This article belongs to the Section Microelectronics)

Download

Browse Figures

Versions Notes

Abstract

:

Ultra-low-voltage design brings considerable outcomes in power reduction and energy efficiency improvement at the cost of performance degradation and uncertainty. Conventional standard cell design methodology cannot guarantee optimal performance for subthreshold operations due to the lack of consideration of process variation. In this paper, an effective subthreshold cell sizing method is proposed to minimize the worst-case propagation delay by deriving the optimal pMOS-to-nMOS width ratio (β) analytically, which reveals the relation between the minimal worst-case delay and the process parameters and provides distinct guidance for standard cell library design. The proposed method demonstrated good agreement with the Monte Carlo SPICE simulation results and was validated at the cell level and the circuit level. At the cell level, the logic cells designed with the proposed method show at least 8.6% and 7.4% improvement, on average, for worst-case delay and energy-delay product (EDP), respectively, with an additional 3.2% energy overhead compared to the prior approaches. At the circuit level, the proposed method improves the worst-case performance and worst-case EDP of the ring oscillator by at least 15.5% and 15.0%, respectively, with a 0.9% energy penalty. Moreover, the ISCAS’89 and OpenCores circuits synthesized with the optimized cells achieve at least 6.6% worst-case performance enhancement, 6.9% power reduction, and 9.4% area saving.

Keywords:

low-voltage design; performance optimization; process variation; standard cell sizing

1. Introduction

State-of-the-art ultra-low-voltage design decreases the supply voltage down to threshold voltage as a promising candidate to meet stringent power budgets for many applications [1,2]. However, due to the small gate voltage drive, subthreshold circuits face severe challenges in terms of over 500~1000× performance degradation [3] and uncertainty compared with super-threshold operation, which could be mitigated with customized standard cells. Commercial cell libraries are designed and characterized for super-threshold voltage operations [4,5], which require special modifications to improve performance and reduce power consumption, as well as variability for the subthreshold region.

Plenty of research has been presented to deal with subthreshold cell design [6,7,8,9,10,11,12,13,14]. The minimum-width cell design was proposed in [6,7] by breaking wider transistors into multiple fingers to mitigate the impact of the inverse narrow width effect (INWE) or the narrow width effect (NWE) for performance improvement. The optimal pMOS-to-nMOS width ratio (β) for the subthreshold domain was reevaluated in [8] to achieve equal rise and fall times. The concept of logical effort was adopted in [9,10] to perform transistor sizing for standard cells with stacking structure, which diverges from the situation in the super-threshold region when the subthreshold operation is performed. However, the impact of process variation is not considered, nor is the statistical delay distribution. An analytical expression was derived in [11] to find the optimal pMOS-to-nMOS width ratio in the subthreshold region with the consideration of process variation. The work in [12] introduced a subthreshold cell sizing methodology by balancing the mean value of the pMOS and nMOS transistor currents, but the variance of the current distribution is neglected. In [13], although the optimization solution was finally verified with Monte Carlo (MC) simulations, the impact of process variation is not considered during cell sizing. In [14], a digital cell library was presented in the near-threshold region to obtain both high energy efficiency and optimal performance with an asymmetric gate length scheme and a forward body biasing technique. A multi-threshold-voltage and multi-channel-length standard cell library was developed in [15] to enable the fine granularity of driving strength for near-threshold and subthreshold circuit design at minimal power and area overhead. The impact of the Reverse Short Channel Effect (RCSE) and the Inverse Narrow Width Effect (INWE) on the device I-V characteristics under the subthreshold region was studied by [16] for standard cell library design. The best switching efficiency was used as the indicator in [17] for the optimal channel length design targeting ultra-low voltages. In [18], the standard cell pMOS-to-nMOS width ratio was sized to maximize the performance with the constraint of a full diffusion layout structure to improve the circuit performance at the cost of higher energy consumption.

In most prior cell sizing methods for ultra-low-voltage design, the cell delay variation due to process mismatch is not taken into consideration, leading to a suboptimal solution for cell sizing. To demonstrate the impact of delay variation on the cell sizing solution, the fluctuation tendency of nominal delay and worst-case delay is plotted in Figure 1 by varying β, which is obtained by the MC simulation results of an inverter cell driving an identical one under the TSMC 28 nm process. The worst-case delay is defined as the 3σ percentile point of the delay distribution. In the super-threshold region (Figure 1a), the nominal delay achieves the minimum value, with nearly the same β as the worst case. However, in the subthreshold region (Figure 1b), the optimal β for the nominal delay deviates from that for the worst case, so that could not guarantee the minimal worst-case delay, suffering from 26.2% performance degradation.

In this work, a standard cell sizing technique is proposed to derive the optimal pMOS-to-nMOS width ratio (β) analytically for worst-case performance optimization in the subthreshold domain by considering process variation with random variables.

The main contributions of this work are summarized as follows:

The optimal β targeting at worst-case performance was derived analytically by minimizing the 3σ percentile of propagation delay distribution, which has been validated under various process technologies to demonstrate good agreement with MC SPICE simulation results.
The analytical expression of the optimal β reveals the relation between the optimal worst-case cell delay and the process parameters with physical insight. To be precise, the ratio of mobility, as well as the ratios of mean and variance of threshold voltage for nMOS and pMOS transistors, determine the optimal β for minimal worst-case cell delay, which provides distinct guidance for standard cell design for specific processes without time-consuming MC SPICE simulations.
The standard logic cells designed by the proposed optimization method were validated under the process of TSMC 28 nm technology, which outperforms the competitive approaches with significant worst-case performance improvement and worst-case energy-delay product (EDP) reduction at both the cell level and the circuit level.

This paper is organized as follows: Section 2 derives the subthreshold worst-case delay model analytically considering process variation, and the optimal β for minimal worst-case delay is derived in Section 3. Validation results are given and compared in Section 4. Section 5 draws the conclusions.

2. Subthreshold Worst-Case Propagation Delay Model

The propagation delay (t_p) for the subthreshold region can be modeled by an inverter driving an identical cell, as shown in Figure 2, where the channel widths of the nMOS and pMOS transistors are denoted as W_n and W_p, respectively, and the channel lengths of all transistors are equal to L. The ratio of the pMOS-to-nMOS width is defined as (1).

β = W_{p} / W_{n}

(1)

The load capacitance for the first-stage inverter in Figure 2 is denoted as C_L, which represents all capacitances at node ZN, including the total drain and gate capacitances associated with all nMOS and pMOS transistors, C_n and C_p, and the wire capacitance, C_w. Since C_n and C_p are both proportional to the transistor channel area, i.e., transistor channel width, the value of C_p is β times that of C_n, and C_L can be expressed as

C_{L} = C_{n} + C_{p} + C_{w} = (1 + β) C_{n} + C_{w}

(2)

The propagation delay of the first inverter in Figure 2 can be expressed by [4].

t_{p} = \frac{t_{p H L} + t_{p L H}}{2} = \frac{V_{D D} C_{L}}{4} (\frac{1}{I_{n}} + \frac{1}{I_{p}})

(3)

where t_pHL and t_pLH are the delays of high-to-low and low-to-high voltage transitions of the ZN node, and V_DD is the supply voltage. I_n and I_p are the subthreshold drain currents of the nMOS and pMOS transistors of the first inverter, which are proportional to the ratio of channel width and length and exponentially related to threshold voltage, which can be expressed as [11].

\{\begin{matrix} I_{n} = I_{0} μ_{n} \frac{W_{n}}{L} e^{\frac{V_{g s} - V_{t h n}}{n ϕ_{t}}} (1 - e^{\frac{- V_{d s}}{ϕ_{t}}}) \\ I_{p} = I_{0} μ_{p} \frac{W_{p}}{L} e^{\frac{V_{g s} - V_{t h p}}{n ϕ_{t}}} (1 - e^{\frac{- V_{d s}}{ϕ_{t}}}) \end{matrix}

(4)

with

I_{0} = C_{o x} (n - 1) ϕ_{t}^{2}

(5)

where I₀ is a process-dependent parameter, C_ox refers to the gate oxide capacitance per unit area, n is the subthreshold slope factor, V_gs and V_ds are, respectively, the gate-source voltage and drain-source voltage, μ_n(μ_p) is the charge carrier mobility, V_thn(V_thp) is the threshold voltage, n is the sub-threshold slope factor, and Φ_t is the thermal voltage.

By substituting the subthreshold drain current as (4) into (3) with a step input signal (V_gs = V_DD) and approximating the term

1 - e^{\frac{- V_{d s}}{ϕ_{t}}}

to 1, the propagation delay for the subthreshold region can be written as

t_{p} = k \times [(1 + β) \times α_{n} + (1 + \frac{1}{β}) \times Λ \times α_{p}]

(6)

where the related parameters are defined as

α_{n} = e^{\frac{V_{t h n}}{n ϕ_{t}}}, α_{p} = e^{\frac{V_{t h p}}{n ϕ_{t}}}, Λ = \frac{μ_{n}}{μ_{p}}, k = \frac{V_{D D} e^{- \frac{V_{D D}}{n ϕ_{t}}}}{4 I_{0} \frac{W_{n}}{L} μ_{n}} (C_{n} + \frac{C_{w}}{1 + β})

(7)

With process-related parameters, including α_n/α_p and Λ, it can be seen from (6) that the propagation delay for the subthreshold region is closely related to the pMOS-to-nMOS width ratio (β).

As claimed in prior publications [19,20], the fluctuations of current and propagation delay are dominated by the threshold voltage variation at the subthreshold voltage, which is associated with the parameters α_n and α_p in (6). Since the threshold voltages V_thn and V_thp are Gaussian-distributed [8,12], the random variables α_n and α_p follow log-normal (LN) distributions, whose means and variances can be expressed as

\{\begin{cases} E (α_{n}) = e^{E (V_{t h n}^{'}) + \frac{D (V_{t h n}^{'})}{2}}, D (α_{n}) = (e^{D (V_{t h n}^{'})} - 1) E^{2} (α_{n}) \\ E (α_{p}) = e^{E (V_{t h p}^{'}) + \frac{D (V_{t h p}^{'})}{2 β}}, D (α_{p}) = (e^{\frac{D (V_{t h p}^{'})}{β}} - 1) E^{2} (α_{p}) \end{cases}

(8)

with

\{\begin{cases} E (V_{t h n}^{'}) = \frac{E (V_{t h n})}{n ϕ_{t}}, E (V_{t h p}^{'}) = \frac{E (V_{t h p})}{n ϕ_{t}} \\ D (V_{t h n}^{'}) = \frac{D (V_{t h n})}{{(n ϕ_{t})}^{2}}, D (V_{t h p}^{'}) = \frac{D (V_{t h p})}{{(n ϕ_{t})}^{2}} \end{cases}

(9)

where E(V_thp)/E(V_thn) and D(V_thp)/D(V_thn) are the mean and variance of the threshold voltage of minimum-sized pMOS/nMOS transistors, respectively. The variance of threshold voltage for the pMOS transistor is reversely proportional to β according to Pelgrom’s law [21]. Therefore, the mean and variance of t_p can be analytically derived as

\{\begin{array}{l} E (t_{p}) = & k (1 + β) e^{E (V_{t h n}^{'}) + \frac{D (V_{t h n}^{'})}{2}} + k (1 + \frac{1}{β}) Λ e^{E (V_{t h p}^{'}) + \frac{D (V_{t h p}^{'})}{2 β}} \\ D (t_{p}) = & k^{2} {(1 + β)}^{2} (e^{D (V_{t h n}^{'})} - 1) e^{2 E (V_{t h n}^{'}) + D (V_{t h n}^{'})} + k^{2} {(1 + \frac{1}{β})}^{2} Λ^{2} (e^{\frac{D (V_{t h p}^{'})}{β}} - 1) e^{2 E (V_{t h p}^{'}) + \frac{D (V_{t h p}^{'})}{β}} \end{array}

(10)

which indicates that both the mean and variance of t_p are highly dependent on β, as well as process-related parameters.

By approximating the propagation delay in (6) to follow the LN distribution, the worst-case propagation delay in terms of the 3σ percentile point of the delay distribution can be represented as

t_{p}^{\max} = e^{μ (t_{p}) + 3 σ (t_{p})}

(11)

where the distribution parameters μ and σ can be expressed as (12) and (13), respectively, by E(t_p) and D(t_p) in (10) by considering E(V’_thn) >> D(V_thn) ≈ 0, E(V’_thp) >> D(V_thp) ≈ 0,

μ (t_{p}) = \ln (\frac{E (t_{p})}{\sqrt{1 + \frac{D (t_{p})}{E^{2} (t_{p})}}}) = \ln k + \ln e^{E (V_{t h n}^{'})} + \ln (β + \frac{Λ Γ}{β} + 1 + Λ Γ)

(12)

σ (t_{p}) = \sqrt{\ln (1 + \frac{D (t_{p})}{E^{2} (t_{p})})} = \sqrt{\ln (1 + D (V_{t h n}^{'}) \frac{β^{2} + \frac{Λ^{2} Γ^{2} Ψ^{2}}{β}}{{(β + Λ Γ)}^{2}})}

(13)

with

Γ = \frac{e^{E (V_{t h p}^{'})}}{e^{E (V_{t h n}^{'})}}, Ψ = \sqrt{\frac{D (V_{t h n}^{'})}{D (V_{t h n}^{'})}}

(14)

3. Optimization Method for Subthreshold Worst-Case Propagation Delay

According to the worst-case propagation delay model derived as shown in (11), the minimal value can be achieved with the minimal μ + 3σ, which can be obtained with the optimal β_opt by letting the derivation of μ + 3σ with β equal zero.

{\frac{\partial (μ + 3 σ)}{\partial β}|}_{β = β_{o p t}} = 0

(15)

However, due to the complicated relations between μ + 3σ and β, as shown in (12) and (13), it is almost impossible to derive the expression of μ + 3σ with β so as to solve the optimal β_opt analytically. In order to simplify this problem, the goal of minimizing μ + 3σ is replaced by solving the optimal

β_{o p t}^{μ}

and

β_{o p t}^{σ}

for the minimal μ and σ, respectively, as formulated in (16) in Section 3.1 and Section 3.2. With the optimal

β_{o p t}^{μ}

and

β_{o p t}^{σ}

, the optimal β_opt can be proved in Section 3.3 in detail to be between them and estimated as the average shown in (17).

{\frac{\partial μ}{\partial β}|}_{β = β_{o p t}^{μ}} = 0, {\frac{\partial σ}{\partial β}|}_{β = β_{o p t}^{σ}} = 0

(16)

β_{o p t} = \frac{β_{o p t}^{μ} + β_{o p t}^{σ}}{2}

(17)

3.1. Optimal β Derivation for Minimal μ of Delay Distribution

In order to achieve the minimal μ, it can be easily found from (12) that the value of β only affects the last term of μ. Due to this, minimizing μ is equivalent to the minimization of the exponent of the last term, which can be represented as f_μ(β) in (18).

f_{μ} (β) = (C_{n} + \frac{C_{w}}{1 + β}) (β + \frac{Λ Γ}{β} + (1 + Λ Γ))

(18)

Through (18), the optimal β for the minimal μ can be easily solved by deriving the derivative of the function f_μ(β) and letting it be zero, as follows:

{\frac{\partial f_{μ} (β)}{\partial β}|}_{β = β_{o p t}^{μ}} = 0 \Rightarrow β_{o p t}^{μ} = \sqrt{Λ Γ (1 + \frac{C_{w}}{C_{n}})}

(19)

It is worth noting that the derived

β_{o p t}^{μ}

for the minimal μ is the same as that derived in [11], where it was used to minimize the nominal delay without considering process variation. It can be found from (19) that the total wire capacitance C_w would increase the optimal β for the minimal μ. If C_w could be considered to be negligible compared to C_n, the optimal β for the minimal μ could be simplified to

β_{o p t}^{μ} = \sqrt{Λ Γ}

(20)

3.2. Optimal β Derivation for Minimal σ of Delay Distribution

It can be observed from (13) that minimizing σ is equivalent to the minimization of f_σ(β) as

f_{σ} (β) = \frac{β^{2} + \frac{Λ^{2} Γ^{2} Ψ^{2}}{β}}{{(β + Λ Γ)}^{2}}

(21)

Through (21), the optimal β for the minimal σ is the solution of the following equation by deriving the derivative of the function f_σ(β) and letting it be zero:

{\frac{\partial f_{σ} (β)}{\partial β}|}_{β = β_{o p t}^{σ}} = 0 \Rightarrow g_{σ} (β_{o p t}^{σ}) = h_{σ} (β_{o p t}^{σ})

(22)

where

\{\begin{array}{l} g_{σ} (β) = β^{3} \\ h_{σ} (β) = \frac{Λ Γ Ψ^{2}}{2} (3 β + Λ Γ) \end{array}

(23)

It can be seen from (23) that the optimal

β_{o p t}^{σ}

for minimizing σ can be obtained by solving the intersection of the cubic curve of g_σ(β) and the linear line of h_σ(β), where g_σ(β) is a process-independent function of β, while h_σ(β) is impacted by process-dependent parameters including Λ, Γ, and Ψ.

3.3. Proof of Estimation of Optimal β for Worst-Case Delay with Optimal β for μ and σ of Delay Distribution

Since the differentiation of μ + 3σ is a continuous function of β, the optimal β_opt for the minimal worst-case delay is certain to be between

β_{o p t}^{μ}

and

β_{o p t}^{σ}

if and only if the signs of the derivatives for

β_{o p t}^{μ}

and

β_{o p t}^{σ}

are opposite, as shown in (24) and (25).

{\frac{\partial (μ + 3 σ)}{\partial β}|}_{β = β_{o p t}^{μ}} > 0 and {\frac{\partial (μ + 3 σ)}{\partial β}|}_{β = β_{o p t}^{σ}} < 0

(24)

{\frac{\partial (μ + 3 σ)}{\partial β}|}_{β = β_{o p t}^{μ}} < 0 and {\frac{\partial (μ + 3 σ)}{\partial β}|}_{β = β_{o p t}^{σ}} > 0

(25)

The value of

\frac{\partial (μ + 3 σ)}{\partial β}

for

β_{o p t}^{μ}

and

β_{o p t}^{σ}

can be represented as

\{\begin{array}{l} \begin{array}{l} {\frac{\partial (μ + 3 σ)}{\partial β}|}_{β = β_{o p t}^{μ}} = [\frac{2}{Ψ^{2}} - (β_{o p t}^{μ} + 3)] \times \frac{3 {σ_{p}}^{2}}{2 (1 + β_{o p t}^{μ})} \\ \times \frac{1}{[{(1 + β_{o p t}^{μ})}^{2} + (σ_{n}^{2} + σ_{p}^{2} β_{o p t}^{μ})] \sqrt{\ln (1 + \frac{1 + σ_{p}^{2} β_{o p t}^{μ}}{{(1 + β_{o p t}^{μ})}^{2}})}} \end{array} \\ {\frac{\partial (μ + 3 σ)}{\partial β}|}_{β = β_{o p t}^{σ}} = \frac{1 - {(\frac{β_{o p t}^{μ}}{β_{o p t}^{σ}})}^{2}}{β_{o p t}^{σ} + \frac{{(β_{o p t}^{μ})}^{2}}{β_{o p t}^{σ}} + 1 + {(β_{o p t}^{μ})}^{2}} \end{array}

(26)

By observing (26), the signs of the derivatives for

β_{o p t}^{μ}

and

β_{o p t}^{σ}

are consistent with the signs of S_μ and S_σ as shown in (25), respectively, which can be proven to be opposite by analyzing the relations of g_σ(β) and h_σ(β), as demonstrated in Figure 3.

\{\begin{cases} S_{μ} = \frac{2}{Ψ^{2}} - (β_{o p t}^{μ} + 3) \\ S_{σ} = 1 - {(\frac{β_{o p t}^{μ}}{β_{o p t}^{σ}})}^{2} \end{cases}

(27)

Figure 3 plots the β related functions of h_σ(β) and g_σ(β) as a blue line and a red cubic curve, respectively. By comparing (23) and (27), it can be noticed that the analytical expressions of S_μ/S_σ own similar forms as that of h_σ(β) and g_σ(β); thus, it can be illustrated by Figure 3 that the signs of S_μ and S_σ, e.g., the signs of the derivatives of

β_{o p t}^{μ}

and

β_{o p t}^{σ}

, are absolutely opposite. In order to demonstrate the relative relations between h_σ(β) and g_σ(β) due to various process-dependent parameters, including Λ, Γ, and Ψ, three blue lines are drawn in Figure 3 to respectively indicate all types of cases, including when h_σ(β) is larger than, equal to, and smaller than g_σ(β) when β equals

β_{o p t}^{μ}

. By taking the upper blue line for h_σ(β) as an example, which is larger than g_σ(β) when β is

β_{o p t}^{μ}

, i.e.,

h_{σ} (β_{o p t}^{μ}) > g_{σ} (β_{o p t}^{μ})

, the signs of S_μ and S_σ can be proven to be absolutely negative and positive, respectively, as follows.

First, the sign of S_μ can be proven to be negative when

h_{σ} (β_{o p t}^{μ}) > g_{σ} (β_{o p t}^{μ})

. By joining the expressions of h_σ(β) and g_σ(β) in (23) into the condition of

h_{σ} (β_{o p t}^{μ}) > g_{σ} (β_{o p t}^{μ})

, it can be deduced that

\frac{2}{Ψ^{2}} < β_{o p t}^{μ} + 3

, indicating S_μ is negative according to (27).

Second, the sign of S_σ can be proven to be positive when

h_{σ} (β_{o p t}^{μ}) > g_{σ} (β_{o p t}^{μ})

. It can obviously be found in Figure 3 that, in this case, the x-coordinate of the intersection of h_σ(β) and g_σ(β), i.e.,

β_{o p t}^{σ}

as defined in (22), is certain to be larger than

β_{o p t}^{μ}

, indicating that S_σ is positive according to (27).

Similarly, the signs of S_μ and S_σ can be proven to be absolutely positive and negative, respectively, by taking the lower blue line for h_σ(β) as an example. In all, the signs of the derivatives for

β_{o p t}^{μ}

and

β_{o p t}^{σ}

can be proven to be absolutely opposite so that the minimal worst-case delay is certain to be between

β_{o p t}^{μ}

and

β_{o p t}^{σ}

or even identical with both

β_{o p t}^{μ}

and

β_{o p t}^{σ}

for the case of the middle blue line; thus, it can be estimated as (17).

Several useful conclusions could be drawn based on the above analytical derivation to reveal the relation between the optimal β_opt and process parameters with physical insight.

Firstly, whether the optimal β_opt for minimal worst-case propagation delay would be larger or smaller than

β_{o p t}^{μ}

is determined by the ratio of the standard deviation of threshold voltages of nMOS and pMOS transistors, i.e., Ψ. As can be seen in (27), the magnitude of Ψ impacts the signs of S_μ and S_σ, as well as the relative relation between β_opt and

β_{o p t}^{μ}

.

Secondly, Ψ is also related to the slope and intercept of h_σ(β), so that determines the impact of process variation to the optimal β_opt. Specifically, the smaller Ψ is, the smaller the slope and intercept of h_σ(β) are, and the larger the deviation of the optimal β_opt from

β_{o p t}^{μ}

.

Thirdly, the optimal β_opt for worst-case propagation delay is only dependent on the ratio of mobility, as well as the ratios of mean and variance of threshold voltage for nMOS and pMOS transistors. In other words, it is independent of supply voltage and valid for any corners in the subthreshold domain.

4. Validation Results and Discussion

4.1. Validation of the Proposed Method at Gate Level

The analytically derived optimal β_opt for the worst-case subthreshold operation was validated by MC SPICE simulation results under various process technologies. Compared with the competitive approaches in [4,11,18], which neglect the impact of the process variation in the subthreshold region, the optimal β_opt derived in this work is highly consistent with the MC simulation results for all validated processes, as shown in Table 1. For all processes, 10K trails of MC SPICE simulations were performed by the HSPICE tool at the TT corner with a supply voltage of 0.35 V and temperature of 25 °C to evaluate the worst-case propagation delay of the inverter for each specific β, which was swept by gradually increasing from an initial value of 1.0. It can be seen that for most processes, a higher β is required by the proposed standard cell sizing solution to compensate for the impact of process variation in the subthreshold region. Moreover, only for the process of TSMC 40 nm, the optimal β_opt is smaller than the case of subthreshold optimization without the consideration of process variation [11], indicating that the cell area could be saved to minimize the worst-case propagation delay. The optimal

β_{o p t}^{μ}

and

β_{o p t}^{σ}

for the minimal μ and σ are also compared with the optimal β_opt in Table 1, where the former is adopted as the optimal solution in [11]. It was found that the divergence between the optimal β_opt and optimal

β_{o p t}^{μ}

/

β_{o p t}^{σ}

ranges between 19% and 33% for various processes.

The proposed subthreshold cell sizing method was applied to standard cell design under the process of TSMC 28 nm, as well as the approaches in [3,10,11]. For all designed cells, the transistor channel lengths were kept at the minimum, and the consistent layout area constraint was applied for each cell to make a fair comparison in terms of the worst-case propagation delay, energy consumption, and energy-delay product (EDP).

In order to validate the improvement in the optimal β_opt derived in this work for various logic structures of cells, Table 2 shows the validation results for the standard cells using different methods at 0.35 V, 25 °C, and TT corner with 10K MC SPICE simulations, where Ave. Incr. in the last row indicates the average increase in our method compared with others. Compared with the method derived for the super-threshold region [4], the proposed statistical optimization method reduces the worst-case propagation delay, energy consumption, and EDP by 15.7%, 10.5%, and 26.6% on average, respectively. Compared with the method for the subthreshold region without considering process variation [11], the proposed method shows an average of 8.6% and 7.4% reduction in terms of worst-case propagation delay and EDP, with a slight increase in energy consumption of 2.2%. Compared with the method by balancing the mean of the pMOS and nMOS transistor current distributions in [12] for the subthreshold region, the proposed method reduces the worst-case propagation delay and worst-case EDP by 12.1% and 11.9% at the cost of an additional 3.2% worst-case energy consumption. Compared with the method in [18] to improve the circuit performance with the constraint of a full diffusion layout structure, the proposed method still reduces the worst-case propagation delay, energy consumption, and EDP by 5.6%, 15.8%, and 26.7% on average, respectively.

In order to validate the improvement in the optimal β_opt derived in this work for various subthreshold corners with different voltages and temperatures, the standard cells designed with different methods are further compared at other corners by MC SPICE simulation with a supply voltage between 0.25 V and 0.35 V and temperatures ranging from −40 °C to 125 °C, as shown in Table 3. It can be seen that the proposed method outperforms others in terms of worst-case propagation delay, similar to the corner, at 0.35 V and 25 °C.

4.2. Validation of the Proposed Method at Circuit Level

The standard logic cells designed under the process of TSMC 28 nm technology by different optimization methods were validated and compared at the circuit level by a ring oscillator and several ISCAS’89 benchmark circuits.

The ring oscillator was implemented with nine identically sized inverters, whose worst-case period, worst-case energy consumption, and worst-case EDP are listed in Table 4. It shows a similar tendency as the results for standard cells. In detail, compared with [4,11,12,18], the worst-case period (worst-case EDP) of the ring oscillator using the cells by this work can be reduced by 21.6% (25.2%), 15.5% (15.0%), 25.8% (22.9%), and 5.2% (16.3%), respectively, indicating significant performance improvement compared to prior solutions when considering the nontrivial impact due to process variation in the worst case. Moreover, 4.5% reduction, 0.9% penalty, and 1.1% and 11.6% reduction for the worst-case energy consumption can be observed compared with [4,11,12,18], showing that the energy overhead paid for the optimal β_opt is acceptable.

The standard cell libraries were validated and compared in terms of frequency, power, and area with the synthesis results of ISCAS’89 and OpenCores benchmark circuits, as shown in Table 5, where the number of cells (# Cells) in the synthesized circuit netlist indicates the complexity of each circuit. Ave. Impr. in the last row indicates the average improvement in our method compared with others by increasing frequency and decreasing power and area. It was found that the proposed subthreshold cell sizing method outperforms the competitive methods with at least 6.6% performance improvement, 6.9% power reduction, and 9.4% area reduction on average, indicating the overall performance, power, and area (PPA) enhancement of standard cells optimized with the proposed sizing solution. Owing to the standard cell library designed with the proposed method, the synthesized circuits demonstrate a good balance among performance, power, and area, leading to performance improvement for the subthreshold circuit, as well as power and area cost savings compared with prior methods.

5. Conclusions

Improving the worst-case performance is critical for subthreshold standard cell and circuit design when the impact of process variation cannot be neglected. With the consideration of process variation, the optimal β_opt is derived analytically to minimize the 3σ percentile point of delay distribution, which reveals the relation between the optimal worst-case cell delay and the process parameters with physical insight. Validation results show significant improvement in worst-case delay, energy, and EDP at the gate and circuit levels. In future works, the statistical impact of more layout-dependent effects, such as Reverse Short Channel Effect (RCSE) and Inverse Narrow Width Effect (INWE), will be considered in-depth for the robustness of standard cell design at the subthreshold domain.

Author Contributions

P.C. and J.G. organized this work. P.C. and J.G. performed the modeling, simulation, and experiment work. The manuscript was written and edited by P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant (62174031), in part by the Natural Science Foundation of Jiangsu Province (BK20240637) and in part by the Fundamental Research Funds for the Central Universities.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Paul, S.; Honkote, V.; Kim, R.; Majumder, T.; Aseron, P.; Grossnickle, V.; Sankman, R.; Mallik, D.; Jain, S.; Vangal, S.; et al. An energy harvesting wireless sensor node for IoT systems featuring a near-threshold voltage IA-32 microcontroller in 14 nm tri-gate CMOS. In Proceedings of the 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), Honolulu, HI, USA, 15–17 June 2016; pp. 1–2. [Google Scholar]
Shi, W.; Pan, A.; Yu, S.; Choy, C.-S. A Subthreshold Baseband Processor Core Design With Custom Modules and Cells for Passive RFID Tags. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 159–167. [Google Scholar] [CrossRef]
Dreslinski, R.G.; Wieckowski, M.; Blaauw, D.; Sylvester, D.; Mudge, T. Near-threshold computing: Reclaiming moore’s law through energy efficient integrated circuits. Proc. IEEE 2010, 98, 253–266. [Google Scholar] [CrossRef]
Rabaey, J.M.; Chandrakasan, A.; Nikolic, B. Digital Integrated Circuits: A Design Perspective; Prentice-Hall: Upper Saddle River, NJ, USA, 2003. [Google Scholar]
Singh, K.; De Gyvez, J.P. Twenty years of near/sub-threshold design trends and enablement. IEEE Trans. Circuits Syst. II: Express Briefs 2020, 68, 5–11. [Google Scholar] [CrossRef]
Muker, M.; Shams, M. Designing digital subthreshold CMOS circuits using parallel transistor stacks. Electron. Lett. 2011, 47, 372. [Google Scholar] [CrossRef]
Zhou, J.; Jayapal, S.; Busze, B.; Huang, L.; Stuyt, J. A 40 nm Dual-Width Standard Cell Library for Near/Sub-Threshold Operation. IEEE Trans. Circuits Syst. I Regul. Pap. 2012, 59, 2569–2577. [Google Scholar] [CrossRef]
Liu, B.; Ashouei, M.; Gemmeke, T.; de Gyvez, J.P. Sub-threshold custom standard cell library validation. In Proceedings of the Fifteenth International Symposium on Quality Electronic Design, Santa Clara, CA, USA, 3–5 March 2014; pp. 257–262. [Google Scholar]
Keane, J.; Eom, H.; Kim, T.-H.; Sapatnekar, S.; Kim, C. Subthreshold logical effort: A systematic framework for optimal subthreshold device sizing. In Proceedings of the 43rd Annual Conference on Design Automation, San Francisco, CA, USA, 24–28 July 2006; p. 425. [Google Scholar]
Lin, X.; Wang, Y.; Pedram, M. Joint sizing and adaptive independent gate control for FinFET circuits operating in multiple voltage regimes using the logical effort method. In Proceedings of the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 8–21 November 2013; pp. 444–449. [Google Scholar]
Nabavi, M.; Ramezankhani, F.; Shams, M. Optimum PMOS-to-NMOS Width Ratio for Efficient Subthreshold CMOS Circuits. IEEE Trans. Electron Devices 2016, 63, 916–924. [Google Scholar] [CrossRef]
Liu, B.; Ashouei, M.; Huisken, J.; De Gyvez, J.P. Standard cell sizing for subthreshold operation. In Proceedings of the 49th Annual Design Automation Conference, San Francisco, CA, USA, 3–7 June 2012; p. 962. [Google Scholar]
Kim, T.-H.; Keane, J.; Eom, H.; Kim, C.H. Utilizing Reverse Short-Channel Effect for Optimal Subthreshold Circuit Design. IEEE Trans. VLSI Syst. 2007, 15, 821–829. [Google Scholar]
Jun, J.; Song, J.; Kim, C. A Near-Threshold Voltage Oriented Digital Cell Library for High-Energy Efficiency and Optimized Performance in 65nm CMOS Process. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 65, 1567–1580. [Google Scholar] [CrossRef]
Zhang, H.; He, W.; Sun, Y.; Seok, M.M. An energy-efficient logic cell library design methodology with fine granularity of driving strength for near-and sub-threshold digital circuits. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; pp. 1–5. [Google Scholar]
Sasipriya, P. Design and Characterization of Standard Cell Libraries for Optimal Subthreshold Circuits. In Proceedings of the Innovations in Power and Advanced Computing Technologies (i-PACT), Kuala Lumpur, Malaysia, 8–10 December 2023; pp. 1–5. [Google Scholar]
Chen, Y.; Nie, Y.; Jiao, H. An ultralow-power 65-nm standard cell library for near/subthreshold digital circuits. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2022, 30, 676–680. [Google Scholar] [CrossRef]
Lim, Y.W.; Kamsani, N.A.; Sidek, R.M.; Hashim, S.J.; Rokhani, F.Z. Energy-Performance Optimization via P/N Ratio Sizing With Full Diffusion Layout Structure and Standard Cell Height Tuning in Near-Threshold Voltage Operation. IEEE Access 2022, 11, 12536–12546. [Google Scholar] [CrossRef]
Zhai, B.; Hanson, S.; Blaauw, D.; Sylvester, D. Analysis and mitigation of variability in subthreshold design. In Proceedings of the ISLPED ’05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005, San Diego, CA, USA, 8–10 August 2005; pp. 20–25. [Google Scholar]
Drego, N.; Chandrakasan, A.; Boning, D. Lack of Spatial Correlation in MOSFET Threshold Voltage Variation and Implications for Voltage Scaling. IEEE Trans. Semicond. Manufact. 2009, 22, 245–255. [Google Scholar] [CrossRef]
Pelgrom, M.J.M.; Duinmaijer, A.C.J.; Welbers, A.P.G. Matching properties of MOS transistors. IEEE J. Solid-State Circuits 1989, 24, 1433–1439. [Google Scholar] [CrossRef]

Figure 1. SPICE simulation results of the nominal and worst-case propagation delay for inverter under TSMC 28 nm (a) super-threshold region (1.1 V) and (b) subthreshold region (0.35 V).

Figure 2. Inverter driving an identical inverter.

Figure 3. Derivation of the signs of S_μ and S_σ by the relation of h_σ(β) and g_σ(β).

Table 1. Comparison of optimal β between analytical models and MC SPICE simulation results for various process technologies.

β	TSMC 28 nm	TSMC 40 nm	SMIC 40 nm	TSMC 65 nm
MC SPICE Sim.	2.6 (−2%)	1.7 (−5%)	2.7 (−2%)	2.2 (3%)
[4]	1.25 (−53%)	1.51 (−16%)	2.06 (−27%)	1.58 (−26%)
$β_{o p t}^{μ}$ [11]	1.81 (−31%)	2.38 (33%)	1.98 (−30%)	1.72 (−19%)
$β_{o p t}^{σ}$	3.47 (31%)	1.20 (−33%)	3.66 (30%)	2.54 (19%)
[18]	1.51 (−43%)	1.40 (−22%)	1.72 (−39%)	1.35 (−37%)
This work	2.64	1.79	2.82	2.13

Table 2. Comparison of worst-case propagation delay, energy consumption, and energy-delay product for standard logic cells operating at 0.35 V, 25 °C, TT corners under TSMC 28 nm process.

Cell	Worst-Case Propagation Delay (ps)					Worst-Case Energy Consumption (fJ)					Worst-Case Energy-Delay Product (fJ × ps)
Cell	[4]	[11]	[12]	[18]	Ours	[4]	[11]	[12]	[18]	Ours	[4]	[11]	[12]	[18]	Ours
INV	76.4	71.0	71.3	68.3	64.0	0.211	0.185	0.186	0.213	0.189	15.7	12.7	13.1	14.9	11.6
NAND2	98.6	93.7	102.2	96.4	90.6	0.206	0.179	0.177	0.222	0.182	20.0	16.4	18.0	21.9	16.0
NOR2	198.1	177.8	167.0	162.4	155.0	0.223	0.192	0.181	0.231	0.193	42.3	31.6	31.6	38.4	28.0
AOI21D	215.6	198.0	202.2	195.9	183.6	0.341	0.297	0.291	0.349	0.302	71.9	56.7	62.5	69.7	53.4
OAI21D	93.7	85.1	99.6	81.1	77.0	0.087	0.078	0.081	0.102	0.082	7.7	6.1	6.3	8.5	5.6
Ave. Incr. (%)	15.7	8.6	12.1	5.6	0.0	10.5	−2.2	−3.2	15.8	0.0	26.6	7.4	11.9	26.7	0.0

Table 3. Comparison of worst-case propagation delay for standard logic cells at corners under TSMC 28 nm process (unit: ps).

Cell	0.35 V, −40 °C					0.35 V, 125 °C
Cell	[4]	[11]	[12]	[18]	Ours	[4]	[11]	[12]	[18]	Ours
INV	287	268	265	247	228	33	34	34	35	32
NAND2	346	320	333	304	282	39	39	39	38	36
NOR2	972	892	853	724	756	191	171	177	169	154
AOI21D	1010	903	959	806	767	217	196	205	194	174
OAI21D	456	405	426	377	343	97	89	99	86	80
Ave. Incr. (%)	22.0	14.5	16.1	4.9	0.0	13.5	8.8	12.7	8.2	0.0
Cell	0.25 V, −40 °C					0.25 V, 125 °C
Cell	[4]	[11]	[12]	[18]	Ours	[4]	[11]	[12]	[18]	Ours
INV	4914	4844	4954	4928	4791	154	153	156	158	152
NAND2	8898	8307	8897	6926	6877	129	118	126	121	112
NOR2	22,939	21,278	20,797	18,194	17,212	2023	1855	1919	1505	1589
AOI21D	19,177	17,248	18,517	14,716	14,310	1682	1542	1584	1400	1340
OAI21D	8106	7350	7669	6267	6119	716	644	760	609	554
Ave. Incr. (%)	20.0	14.2	17.2	2.8	0.0	15.8	9.4	14.7	3.8	0.0

Table 4. Comparison of worst-case period, energy consumption, and energy-delay product for ring oscillator operating at 0.35 V, 25 °C, TT corners under TSMC 28 nm process.

Ring Oscillator	[4]	[11]	[12]	[18]	Ours
Worst-case period (ns)	4.64(21.6%)	4.31(15.5%)	4.91(25.8%)	3.84(5.2%)	3.64
Worst-case energy consumption (fJ)	1.12(4.5%)	1.06(−0.9%)	1.08(1.1%)	1.21(11.6%)	1.07
Worst-case Energy-delay product (ns × fJ)	5.08(25.2%)	4.47(15.0%)	4.93(22.9%)	4.53(16.3%)	3.80

Table 5. Comparison of frequency, power consumption, and area for benchmark circuits operating at 0.35 V, 25 °C, TT corner under TSMC 28 nm process.

Ckt	# Cells	Frequency (MHz)					Power (uW)					Area (um²)
Ckt	# Cells	[4]	[11]	[12]	[18]	Ours	[4]	[11]	[12]	[18]	Ours	[4]	[11]	[12]	[18]	Ours
s27	19	117	129	120	115	142	0.38	0.37	0.37	0.37	0.36	9.99	9.61	9.8	9.4	9.21
s382	179	109	114	112	110	122	4.95	4.53	4.73	4.33	4.01	206.6	174.4	178.9	167.4	151.3
s5378	1294	96	101	97	100	106	35.7	33.2	35.10	32.90	30.4	1381.7	1317.2	1342	1298	1140.2
s13207	1219	84	89	85	87	99	102.2	100.1	100.90	98.10	95.2	3575.9	3363.2	3427	3286	3138.1
s38417	8278	81	83	81	78	87	365.6	324.0	332.40	311.39	277.8	13,542	11,479	11,501	10,685	9605
s38584	8324	80	82	80	80	86	373.7	345.7	367.90	321.34	297.2	13,685	11,945	12,501	11,204	10,138
aes_ip	20,795	93	109	97	98	111	220.3	210.50	215.80	186.84	171.90	16,924	14,409	15,809	14,417	12,286
tv80	7161	103	105	103	104	114	109.5	106.30	105.40	85.26	81.00	9698	8330	8893	7878	7000
vga lcd	124,031	119	121	120	122	128	2786.2	2675.1	2690.1	2638.35	2375.2	140,459	120,708	128,375	115,247	103,291
Ave. Impr. (%)	-	12.7	6.6	11.1	11.2	0.0	17.0	12.1	14.2	6.9	0.0	19.9	12.7	15.9	9.4	0.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, P.; Guo, J. Standard Cell Sizing for Worst-Case Performance Optimization Considering Process Variation in Subthreshold Region. Electronics 2024, 13, 4477. https://doi.org/10.3390/electronics13224477

AMA Style

Cao P, Guo J. Standard Cell Sizing for Worst-Case Performance Optimization Considering Process Variation in Subthreshold Region. Electronics. 2024; 13(22):4477. https://doi.org/10.3390/electronics13224477

Chicago/Turabian Style

Cao, Peng, and Jingjing Guo. 2024. "Standard Cell Sizing for Worst-Case Performance Optimization Considering Process Variation in Subthreshold Region" Electronics 13, no. 22: 4477. https://doi.org/10.3390/electronics13224477

APA Style

Cao, P., & Guo, J. (2024). Standard Cell Sizing for Worst-Case Performance Optimization Considering Process Variation in Subthreshold Region. Electronics, 13(22), 4477. https://doi.org/10.3390/electronics13224477

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Standard Cell Sizing for Worst-Case Performance Optimization Considering Process Variation in Subthreshold Region

Abstract

1. Introduction

2. Subthreshold Worst-Case Propagation Delay Model

3. Optimization Method for Subthreshold Worst-Case Propagation Delay

3.1. Optimal β Derivation for Minimal μ of Delay Distribution

3.2. Optimal β Derivation for Minimal σ of Delay Distribution

3.3. Proof of Estimation of Optimal β for Worst-Case Delay with Optimal β for μ and σ of Delay Distribution

4. Validation Results and Discussion

4.1. Validation of the Proposed Method at Gate Level

4.2. Validation of the Proposed Method at Circuit Level

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI