Next Article in Journal
Energy Efficient Supply Boosted Comparator Design
Next Article in Special Issue
Adaptative Techniques to Reduce Power in Digital Circuits
Previous Article in Journal / Special Issue
A Minimum Leakage Quasi-Static RAM Bitcell
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Low Power Clock Network Design

1
Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627, USA
2
Department of Electrical Engineering, Technion–Israel Institute of Technology, Haifa 32000, Israel
*
Author to whom correspondence should be addressed.
J. Low Power Electron. Appl. 2011, 1(1), 219-246; https://doi.org/10.3390/jlpea1010219
Submission received: 14 December 2010 / Revised: 8 April 2011 / Accepted: 30 April 2011 / Published: 19 May 2011
(This article belongs to the Special Issue Selected Topics in Low Power Design - From Circuits to Applications)

Abstract

:
Power is a primary concern in modern circuits. Clock distribution networks, in particular, are an essential element of a synchronous digital circuit and a significant power consumer. Clock distribution networks are subject to clock skew due to process, voltage, and temperature (PVT) variations and load imbalances. A target skew between sequentially-adjacent registers can be obtained in a balanced low power clock tree using techniques such as buffer and wire sizing. Existing skew mitigation techniques in tree-based clock distribution networks, however, are not efficient in coping with post design variations; whereas the latest non-tree mesh-based solutions reliably handle skew variations, albeit with a significant increase in dissipated power. Alternatively, crosslink-based methods provide low power and variation-efficient skew solutions. Existing crosslink-based methods, however, only address skew at the network topology level and do not target low power consumption. Different methods to manage skew and skew variations within tree and non-tree clock distribution networks are reviewed and compared in this paper. Guidelines for inserting crosslinks within a buffered low power clock tree are provided. Metrics to determine the most power efficient technique for a given circuit are discussed and verified with simulation.

1. Introduction

On-chip clock distribution networks toggle the global clock signal between high and low voltages at up to several gigahertz frequencies in modern circuits, dissipating a significant portion of the total power. These networks deliver a clock signal to the sequential elements within an integrated circuit. Accurate circuit operation is therefore highly dependent on the clock skew characteristics [1]. The clock skew within a clock distribution network is, in particular, an important factor that affects timing margins and circuit operation. Thus, the distribution of the clock signal is a critical design issue that affects overall system timing and reliability, and requires power efficiency.
A clock distribution tree can be designed based on specified timing constraints, while using existing skew mitigation techniques such as buffer insertion and sizing [24] and wire sizing [35] to produce the target skews. Localized clock skew scheduling [1] and clock gating techniques [6] can also be applied in tree-based clock topologies for lower power. Clock skew, however, is subject to process, voltage and temperature (PVT) variations that affect the clock skew schedule, limiting the performance and functionality. Furthermore, skew variations have become increasingly significant with smaller clock periods, requiring low power solutions. Non-tree topologies [626] have been introduced for variation-tolerant design of high performance clock distribution networks. The density of the non-tree elements in these topologies may vary from a few additional connections (or crosslinks) [2026] to a completely dense mesh structure [619], covering the entire network with crosslinks. The crosslink connections between the clock tree segments provide alternative paths for the clock signal, maintaining delay balance while mitigating both the skew caused by imbalances and PVT variations between the connected segments. Thus, tolerance to variations increases with a larger number of crosslinks. The dynamic power dissipated by the inserted crosslinks is however also proportional to the number of connections. In addition, short-circuit currents [21] flow between the connected segments, dissipating short-circuit power that also increases with a larger number of crosslinks. Note that clock gating for low power is not applicable in non-tree networks, limiting the local control of the clock distribution network and, therefore, the ability to manage the power consumption.
A qualitative comparison of crosslink-based topologies with different crosslink densities is shown in Figure 1 in terms of power dissipation and skew variations. The power dissipated by the non-tree clock distribution networks can therefore be traded off for skew tolerance. In some integrated circuits, an efficient power-skew tradeoff can be achieved with a mesh-based topology, while in other circuits a crosslink-based network is preferable to produce a variation-tolerant, low power clock distribution network.
In this paper, different clock network topologies to mitigate skew variations under specific skew and power constraints are reviewed and compared. Skew variations and power consumption in crosslink-based clock distribution networks are analyzed based on a simplified clock tree model. The conclusions are generalized and guidelines for inserting crosslinks within a buffered clock tree are provided. Analytic expressions for the upper and lower bound of the energy consumed by a crosslink-based network with specific skew constraints are also provided. The power efficiency of variation-tolerant crosslink-based and mesh-based topologies is compared based on closed-form expressions and simulation results.
The rest of the paper is organized as follows. Skew and power tradeoffs are reviewed in Section 2 for different clock distribution networks in moderate and high speed, low power circuits. Metrics to determine the most power efficient non-tree topology are provided in Section 3 and discussed in Section 4 based on simulation results. The paper is summarized in Section 5. Closed-form expressions for the energy consumed by a clock tree section with a crosslink and the optimum crosslink parameters are derived, respectively, in Appendices A and B.

2. Skew Mitigation Techniques

A clock tree is a common clock distribution topology. Existing design solutions, such as buffer insertion and sizing, and wire sizing are used to balance the propagation delays and skew between sequentially-adjacent registers [6] within a clock tree based on satisfying the permissible range constraints [6]. A buffered clock tree is comprised of a source buffer that drives the trunk of the clock tree, the internal buffer-interconnect-buffer segments, and the sequential gates at the sinks of the clock tree, as shown in Figure 2.
Clock gating techniques can be applied to tree-based clock topologies, producing efficient, low power clock networks. Clock trees are also simpler to model and analyze. Nevertheless, clock trees are sensitive to skew variations that limit performance and may cause circuit malfunctions.
Existing skew variation mitigation techniques include non-tree clock distribution topologies [626], where alternative paths for the clock signal are provided to manage the local skew, thereby maintaining a temporal balance. A crosslink-based topology [2026] is a non-uniform asymmetric tree-based structure with a varying density of wire segments, each connecting two segments within a clock tree. The design of a crosslink-based clock network depends on three characteristics: the location of the crosslinks within a clock tree (in terms of the crosslink connected segments), the specific crosslink location between the connected segments, and the size of the crosslink. Alternatively, crosslinks may connect all or a specific group of adjacent segments within a specific level of a clock tree, forming a regular symmetric mesh-based [619] clock network (see Figure 3).
Skew and power related terms in a clock distribution network are described in Section 2.1. Tree-based methods for skew mitigation are presented in Section 2.2. Mesh- and crosslink-based topologies are discussed, respectively, in Sections 2.3 and 2.4.

2.1. Skew and Power—Definitions and Background

The skew between sequentially-adjacent registers within a clock distribution network is an important design issue. The skew affects the timing margins within the data paths, changing the speed and functional behavior of a circuit. The skew is affected by load imbalances and interconnect coupling within a clock network, which can be controlled and mitigated during the design process [1]. This skew, however, is subject to post-design PVT variations that can significantly change the skew within a balanced clock network, adversely affecting circuit operation. Useful clock skew is only relevant between sequentially-adjacent registers, and can be positive, negative, or zero [1]. A large negative clock skew can cause a race condition between sequentially-adjacent registers; while a large positive clock skew may limit circuit performance. The network should therefore be carefully designed to ensure that the local skew is within the permissible skew range [1]. In current circuits, skew variations can be of the same order of magnitude as the clock period [27]. Thus, post-design skew variations should be mitigated to ensure that the nominal skew with variations is within the permissible skew range.
Low power has recently become a primary design objective. In particular, non-tree clock networks, which tradeoff skew and skew variations for power, are compared to more efficient, low power clock distribution networks. Dynamic αCVDD2f power consumption in clock distribution networks is proportional to the total capacitance of the clock network and load, where α is the switching activity. Adding crosslinks with a total capacitance CCrossliinks to a clock tree increases the dynamic power of the clock network by αCCrossliinksVDD2f. Since the wire capacitance is linearly proportional to the wire length l and increases with larger wire width w and thickness t [28], the dynamic energy consumption increases with longer and wider crosslinks.
Furthermore, the short-circuit current between the crosslink connected segments dissipates additional short-circuit energy. The wire resistance ρl/wt is linearly proportional to the wire length l, and inversely proportional to the wire width w and thickness t, where ρ is the resistivity of the line. Long and narrow resistive crosslinks are therefore less conductive and limit the short-circuit current, mitigating the short-circuit power dissipation.

2.2. Clock Tree Topology

Many guidelines and algorithms have been introduced for designing balanced power efficient clock distribution trees in synchronous integrated circuits [26]. Many different clock tree topologies are used, ranging from asymmetric structures to symmetric trees, such as H-trees and X-trees [6]. A buffered tree is a common approach to distribute a clock signal to the sequential gates to satisfy a specific clock skew schedule. Enhanced control and accuracy of the distributed clock signal waveforms can be obtained by buffer and wire sizing. A tree-based topology can also be accurately modeled by closed-form analytic expressions [6]. Various techniques, such as localized clock skew scheduling and clock gating [6], have been developed to reduce the power consumed by a clock tree. In high performance circuits, however, post-design skew variations adversely affect the nominal target skew, decreasing the reliability of tree-based clock networks. Thus, non-tree alternatives such as a mesh should be considered to mitigate skew variations in high performance circuits.

2.3. Mesh-Based Clock Topology

Mesh structures balance clock delays and effectively lower the skew between nearby segments, mitigating skew variations [619]. Mesh-based clock distribution networks have been utilized in a variety of commercial high performance microprocessors, such as the Power4 [16], Digital Alpha [17], Intel Pentium 4 [18], and Xeon [19], effectively addressing the issue of clock skew and skew variations.
Mesh topologies, however, utilize significant wire length, resulting in a large capacitance and, consequently, significant dynamic power consumption. Additional power is dissipated due to the short-circuit currents flowing within the buffers driving the crosslinks. The short-circuit power is a linear function of the skew between the buffers driving the crosslinks [21], and can dissipate more than 80% of the total power in highly unbalanced mesh networks [9]. Both uniform and non-uniform mesh topologies have been recently investigated, demonstrating lower skew and higher variation tolerance in dense grids. The number of crosslinks and the mesh wire length, however, increase with mesh density, resulting in higher short-circuit and dynamic power [619]. Thus, dynamic and short-circuit power can be traded off for skew. Mesh reduction [10], sizing of the buffers driving the crosslinks [9], and cost function-based algorithms to reduce power consumption have been presented [810]. High power consumption, however, remains the primary disadvantage of mesh-based clock distribution networks.
Modeling mesh-based clock distribution networks is complicated due to the inherent feedback within the topology. Accurate analytic expressions characterizing a mesh are highly complex and require significant computational time. Several techniques, such as the Skew Bound method in [8] and the Sliding Window Scheme in [9], have been recently proposed to estimate the skew and power of mesh-based clock networks. Modeling the buffers driving the crosslinks for low computational complexity in the analysis process has also been considered [9]. To improve the scalability of the clock mesh analysis process, reduced order modeling and port sliding can be used [11,12]. In addition, decomposition of the clock mesh into linear and nonlinear subsystems, and a dynamic time step rounding technique [13] are employed to reduce the number of macromodels required to represent a mesh system.
Connecting the nodes within a clock mesh affects the local clock delays, balancing the skew and skew variations between the sinks. Only a portion of the affected sinks, however, are sequentially-adjacent registers which are sensitive to clock skew and skew variations [1]. Thus, crosslinks between non-sequentially-adjacent sinks do not affect circuit operation and unnecessarily dissipate dynamic and short-circuit power. The regularity of mesh-based topologies however prevents these crosslinks from being removed. An example of the excessive redundancy of mesh-based solutions is illustrated in Figure 4b. For a clock tree with two sequentially-adjacent sinks, Reg1 and Reg2, the tolerance to variations can be improved while dissipating little power by inserting a crosslink connecting Reg1 to Reg2 (see Figure 4a). This crosslink efficiently mitigates variations within the highlighted paths, shown in Figure 4a. A sink-level mesh is depicted in Figure 4b with a crosslink between Reg1 and Reg2 and the additional redundant crosslinks that connect the non-adjacent sinks. The total wirelength of the mesh shown in Figure 4b is therefore significantly greater than the crosslink length, as depicted in Figure 4a. A sink-level mesh-based solution therefore reduces variations; however, at significantly higher power. Alternatively, an intermediate-level mesh mitigates PVT variations primarily between the upper clock tree segments, resulting in higher skew variations between the sequentially-adjacent sinks, Reg1 and Reg2, as depicted in Figure 4b. Additional degrees of design freedom are therefore available in crosslink-based topologies, while potentially dissipating significantly less power.

2.4. Crosslink-Based Clock Topology

Multiple techniques to maintain useful skew in clock distribution trees have been described [2026], exhibiting resource efficient and low power skew solutions. The sensitivity of clock distribution trees to PVT variations, however, increases with circuit speed and technology scaling, resulting in large skew variations. Given a clock tree that satisfies useful skew constraints, crosslinks can be inserted that maintain a useful skew schedule while lowering variations in the skew. Guidelines, however, should be established regarding (1) the selection of which clock tree segments should be connected with a crosslink, (2) the crosslink location between the selected segments, and (3) the crosslink physical characteristics. This topic is considered in this section. Power and skew tradeoffs are reviewed in a simplified clock network (see Figure 5) in Section 2.4.1, where two clock tree segments with the inputs ClkIn1 and ClkIn2, and outputs ClkOut1 and ClkOut2 are connected with a crosslink X, modeled as a lumped RC wire. These results are later generalized in Section 2.4.2 to provide guidelines for multiple crosslink insertion.

2.4.1. Power and Skew Tradeoffs in Simplified Crosslink-Based Clock Networks

Inserting a crosslink within a clock tree reduces the skew between the crosslink connected segments, while consuming additional power. Closed-form expressions for the clock skew and power consumed by two clock tree segments with a crosslink are described in this section based on the simplified clock network shown in Figure 5.
An ideal step input signal driving each CMOS inverter is assumed in these analytic expressions. Under this assumption, a large portion of the transistor operation occurs within the linear region [29], permitting the driver to be modeled as a linear resistor RON. Furthermore, the input capacitance CG1 and CG2 of the output drivers is included in the capacitance model. The wires within the clock tree segments, depicted in Figure 5, are modeled as a lumped RC impedance. A model of the section impedance is depicted in Figure 6. The input resistance of segment 1 (2), represented by R1 (R2) shown in Figure 6, is composed of the wire resistance connected in series with the transistor. The load capacitance, represented by C1 (C2), shown in Figure 6, is composed of the wire capacitance connected in parallel with the input gate capacitance.
The skew at the output of the section, shown in Figure 6b, is caused by the skew T between the inputs ClkIn1 and ClkIn2 of the section plus the difference between the propagation delays τ1 and τ2 between ClkIn1 and ClkOut1, and ClkIn2 and ClkOut2, respectively (due to different RC loads). Assuming VOUT = ½VDD [14], ln 2 | τ1 − τ2| = 0.693 | R1C1R2C2 |. The energy consumed by two clock tree segments forming a section without a crosslink, shown in Figure 6b, is
E = ( C 1 + C 2 ) V DD 2
An ideal crosslink matches the propagation delay from the source of the clock tree to the crosslink connected segments, minimizing the skew between these segments. Inserting a crosslink between two non-zero skew segments may, however, affect the skew between the remaining clock tree segments [21]. Alternatively, zero skew between segments with a crosslink can be effectively maintained by inserting a crosslink between the zero skew segments, ensuring the skews remain unchanged between all of the clock tree segments with and without a crosslink.
A heuristic for inserting crosslinks should therefore be employed in a balanced clock tree: to preserve the useful skews within a balanced clock tree, the crosslinks between the zero skew segments need to be considered. These crosslinks would mitigate post-design skew variations, while producing similar propagation delays to the crosslink connected segments and, therefore, similar time constants,
τ = R 1 ( ½ C x + C 1 ) R 2 ( ½ C x + C 2 )
A crosslink X can be modeled as a lumped RC impedance, exhibiting a non-zero resistance RX and capacitance CX, thereby dissipating dynamic power to charge the crosslink capacitance. Additional power is further dissipated by the short-circuit current ISC through the crosslink when the inputs are at different polarities (e.g., ClkIn1 = 0 and ClkIn2 = 1), as illustrated by the dotted line shown in Figure 7.
The total current flowing through R2, shown in Figure 7, is composed of two currents, one charging the capacitors ½CX + C1 and ½CX + C2, and the other current connected to ground through R1. The short-circuit current with a crosslink increases with lower crosslink resistance RX. As long as the inputs are skewed in time, as shown in Figure 8, the voltage at the output is lower and the transistor (represented by R1) dissipates short-circuit energy. Crosslinks with high resistance RX between low skew segments should therefore be inserted to lower the power dissipation. The current through R1 for a step input and slow input ramp, and for different values of RX is illustrated in Figure 8.
At the open-circuit limit (RX →∞), however, the crosslink does not balance the delay to the connected segments, yet dissipates dynamic power. A circuit model of a simplified network with a crosslink, shown in Figure 6a, is presented in Figure 9 for tT and t > T. Waveforms of the voltage at the output of the clock tree section with and without a crosslink are illustrated in Figure 10, exhibiting a significant reduction in skew with a crosslink.
The total energy consumption once the first input (ClkIn2) switches and until the output capacitors are charged, based on the circuit models depicted in Figure 9, is derived in Appendix A and is
E X Total = V D D 2 ( 1 R 1 + R 2 + R X ) T + τ V D D 2 [ R 1 + R 2 R 1 R 2 1 R 1 + R 2 ( 1 e T τ ) + 1 R 1 + R 2 ( R X R 1 + R 2 + R X ) 2 ( 1 e ( R 1 + R 2 + R X R X ) T τ ) ]
The first term in (3) describes the short-circuit energy E X S H, which increases linearly with T. The derivative of the second term, which is the dynamic energy E X DYN to charge the output capacitance, is negative, yielding the maximum dynamic power consumption at T = 0 and the upper bound of the total energy, E X , MAX Total,
E X Total E X , MAX Total = E X SH ( T ) + E X DYN ( T = 0 ) = ( T R 1 + R 2 + R X ) V DD 2 + τ ( R 1 + R 2 R 1 R 2 ) V DD 2 = ( T R 1 + R 2 + R X ) V DD 2 + 1 2 [ ( 1 + R 1 R 2 ) C 1 + ( 1 + R 2 R 1 ) C 2 + ( 1 + R 1 2 R 2 + R 2 2 R 1 ) C X ] V DD 2 , T
The exponential terms in (3) range between [0,1], exhibiting the lower energy bound E X , MIN Total,
E X Total E X , MIN Total = ( T R 1 + R 2 + R X ) V DD 2 + τ ( R 1 + R 2 R 1 R 2 1 R 1 + R 2 ) V DD 2 = ( T R 1 + R 2 + R X ) V DD 2 + R 1 2 + R 1 R 2 + R 2 2 2 R 1 R 2 ( R 1 R 1 + R 2 C 1 + R 2 R 1 + R 2 C 2 + 1 2 C X ) V DD 2 , T
Note that not all of this dynamic energy consumed during tT is useful; the total current (shown in Figure 11) comprises the current that charges the output capacitors (the solid arrow in the figure), the current that discharges the output capacitors (the dashed arrows), and the short-circuit current (the crossed arrows).
The short-circuit energy E X S H increases as RX is reduced, while the dynamic energy E X DYN increases with increasing CX (decreasing RX). The derivative ∂E/∂RX, therefore, is negative, exhibiting lower energy for higher RX.
Similar to TX = T·2−2R/Rx [15], where the expression for the skew with a crosslink assumes R1 = R2 = R and C1 = C2 = C, the skew TX based on an assumption of equal propagation delays, τ = R1CX + C1) ≈ R2CX + C2) (2), is
T X = V 1 ( t = t 50 % ) V 2 ( t = t 50 % ) V 2 ( t = t 50 % ) = V 1 ( t = ln 2 R 1 ( 1 2 C X + C 1 ) ) V 2 ( t = ln 2 R 1 ( 1 2 C X + C 1 ) ) V 2 ( t = ln 2 R 1 ( 1 2 C X + C 1 ) ) = T 2 R 1 + R 2 R X
where t50% is V1(t = t50%) = ½VDD.

2.4.2. Guidelines for Crosslink Insertion in a Clock Distribution Network

To design an efficient crosslink-based network, decisions should be determined regarding the crosslinks; (1) which pairs of clock tree segments should be connected by a crosslink, (2) where within each pair of segments should the crosslink be placed, and (3) the physical characteristics of the crosslinks. Guiding principles for crosslink insertion are provided in this section based on the analytic expressions described in Section 2.4.1.

Rule 1: Location of Crosslinks within a Clock Tree

The first design issue is determining which segments to insert a crosslink to reduce skew variations between sequentially-adjacent registers, while preserving useful skew in balanced clock trees. Any two clock tree segments located upstream to a pair of sequentially-adjacent registers, Reg1 and Reg2, can be connected with a crosslink to mitigate skew variations between the sequentially-adjacent sinks, as depicted in Figure 12. Inserting a crosslink between two segments lowers the delay variations within the clock signal paths in the upper levels (the dashed lines, shown in Figure 12), and, as a result, reduces skew variations between the registers (the shaded nodes at the sink level, shown in Figure 12). Segments connected with a crosslink at the upper clock tree levels affect the clock delay to all of the downstream registers, mitigating skew variations within a larger group of sequentially-adjacent registers, as illustrated in Figure 12a. Alternatively, lower skew variations at the sinks are observed in those segments with crosslinks connected close to the sinks (see Figure 12b). However, by applying the heuristic for crosslink insertion (see Section 2.4.1), only zero skew segments should be connected to preserve the skew between sequentially-adjacent registers. Thus, to minimize skew variations while preserving useful skews, crosslinks should be inserted close to the sinks between zero skew segments with expected skew variations greater than the allowed skew variation threshold TTH.

Rule 2: Location of Crosslink within a Clock Tree Section

The second design issue is determining the location of the crosslink between two zero skew segments. Skew variations between two zero skew nodes can be regulated by inserting a crosslink between the nodes. Thus, the primary objective in choosing the specific location of the crosslink within a clock tree section is to lower the total energy consumption. The additional energy from the crosslink is the sum of the dynamic energy due to the added wire capacitance and the short-circuit energy dissipated between the crosslink-connected segments. The additional dynamic energy is not significantly affected by the specific crosslink location within the clock tree segment. Alternatively, inserting a crosslink far from the input driver of a section increases the short-circuit path resistance, decreasing the total energy consumption.

Rule 3: Crosslink Parameters

The third design issue is the type of crosslink to place between segments. Given a crosslink X of specific length l and resistivity ρ, an increase in either the width w or thickness t results in a higher capacitance CX and lower resistance RX (see Appendix B). A higher RX and lower CX should therefore be used to reduce both the short-circuit and total power consumption. Thus, crosslinks with a smaller width and thickness, and therefore higher resistance, should be inserted in low power circuits. Alternatively, a lower RX and therefore a higher CX should be used to reduce skew at the expense of higher power. The crosslink characteristics for efficient crosslink-based networks are described quantitatively in Section 3 under specific skew and power constraints.

3. Metrics for Power Efficient Clock Networks: Crosslink vs. Mesh-Based Topologies

Low wirelength utilization, the availability of efficient techniques for locally controlling skew, and lower power are important advantages of tree-based clock distribution networks as compared to non-tree topologies. The reliability of clock trees in high performance variation-sensitive circuits is however reduced. Thus, in moderate and low performance circuits with aggressive power and area constraints, clock trees are preferable. Alternatively, in high performance circuits, non-tree topologies are preferred.
Non-tree clock networks are shown here to be an efficient alternative to a tree topology for coping with skew variations within clock distribution trees. Two zero skew segments upstream from a sequentially-adjacent variation sensitive pair of registers should be connected with a crosslink to mitigate these variations. Thus, to attenuate predicted skew variations between N pairs of registers, at most N pairs of zero skew segments should be connected by N crosslinks, as shown in Figure 13.
At the limit, for large values of N, a crosslink-based topology utilizes longer wirelength as compared to a mesh, dissipating higher power, as illustrated in Figure 14. Alternatively, mitigation of skew variations at lower power can be achieved by inserting crosslinks in those circuits with fewer sequentially-adjacent registers (smaller N) (see Figure 14). A comparison between crosslink and mesh-based topologies is discussed in this section.
Given an energy consumption of a clock tree ETree and an energy consumed by a mesh-based network E Mesh Total, the differential energy consumption due to the mesh is E Mesh = E Mesh Total E Tree. Thus, the energy budget available for adding crosslinks should not exceed EMesh. Skew and skew variations between connected segments are reduced with smaller RX and, therefore, with increasing CX (see (6)), thereby dissipating more power (see (3)). To minimize the power dissipated by low power clock networks, crosslinks with the largest possible RX and smallest CX should be used under the zero skew TXTTH constraint, yielding, based on (6),
R X = R 1 + R 2 log 2 ( T / T X ) T X T TH R 1 + R 2 log 2 ( T / T H ) = R X , OPT T X T TH
Given a crosslink X of specific length l and resistivity ρ, the width w and thickness t are the only factors that affect the crosslink resistance RX. The constraint w t = R X , OPT T X T TH / ρ l should therefore be considered. Applying the Lagrange multipliers method for determining the constrained minima of closed-form formulae [28], the minimum crosslink capacitance C X , OPT T X T TH can be determined (see Appendix B). Crosslinks with the maximum crosslink resistance R X , OPT T X T TH and minimum capacitance C X , OPT T X T TH should be used in low power clock networks, while satisfying the zero skew TXTTH constraint, as described by (8)(14). Up to hundreds of micrometers crosslinks are routed in the lower metal layers. The capacitance of these crosslinks is determined from local and intermediate interconnect models, (9), that consider wire coupling between the upper and lower metal layers [28]. Alternatively, crosslinks that connect distant segments (thousands of micrometers and longer) should be routed on the top metal layer, and modeled as a global interconnect, (11), that only couples with the lower metal layer [28].

Local and intermediate interconnect

{ R X , OPT T X T TH = ( R 1 + R 2 ) / log 2 ( T T TH ) C X , OPT T X T TH = 2 ɛ l [ W OPT h + 2.04 ( s s + 0.54 h ) 1.77 ( t OPT t OPT + 0.453 h ) 0.07 + 1.41 t OPT s e 4 s s + 8.01 h + 2.37 ( w OPT w OPT + 0.31 s ) 0.28 ( h h + 8.96 s ) 0.76 e 2 s s + 6 h ]

Global interconnect

{ R X , OPT T X T TH = ( R 1 + R 2 ) / log 2 ( T T TH ) C X , OPT T X T TH = 2 ɛ l [ w OPT 2 h + 1.11 ( s s + 0.70 h ) 3.19 + 0.58 ( s s + 1.51 h ) 0.76 ( t OPT t + 4.53 h ) 0.12 + 1.14 t OPT s ( h h + 2.06 s ) 0.09 + 0.74 ( w OPT w OPT + 1.59 s ) 1.14 + 1.16 ( w OPT w OPT + 1.87 s ) 0.16 ( h h + 0.98 s ) 1.18 ]
where wOpt and tOPT are the crosslink width and thickness, respectively, that exhibit minimum power under the specific timing constraint.

Optimum width and thickness (wOPT, tOPT)

( w OPT , t OPT ) = ( R X , OPT T X T TH ρ l β ( ρ l ) γ ( R X , OPT T X T TH ) δ ( ρ l ) α ( R X , OPT T X T TH ) , R X , OPT T X T TH ρ l δ ( ρ l ) α ( R X , OPT T X T TH ) β ( ρ l ) γ ( R X , OPT T X T TH ) )
if (wOPT,tOPT) ∈ ([wmin,M/tmin], [M/tmin,tmin]), and Cx(wOPT, tOPT) is the minimum value of the crosslink capacitance within that interval. Otherwise,
( w OPT , t OPT ) = { ( w min , t max ) = ( w min , R X , OPT T X T TH ρ l 1 w min ) , if C X ( w min , t max ) C X ( w max , t min ) ( w max , t min ) = ( R X , OPT T X T TH ρ l 1 t min , t min ) , if C X ( w min , t max ) > C X ( w max , t min )
The variables α, β, γ, and δ vary with technology-dependent parameters, such as the interconnect resistivity ρ, horizontal spacing s, and vertical spacing h, as described in Appendix B. The upper bound on the minimum total energy in a clock tree section with a crosslink under the TXTTH zero skew constraint is determined by substituting R X , OPT T X T TH and C X , OPT T X T TH in (3),
E X , MAX Total = [ T R 1 + R 2 + R X , OPT T X T TH ] V DD 2 + 1 2 [ ( 1 + R 1 R 2 ) C 1 + ( 1 + R 2 R 1 ) C 2 + ( 1 + R 1 2 R 2 + R 2 2 R 1 ) C X , OPT T X T TH ] V DD 2
The upper bound on the additional energy EX,Max from inserting a crosslink is determined by subtracting the energy consumed by a clock tree section without a crosslink ETree = (C1 + C2)VDD2 from (15), yielding
E X , MAX = [ T R 1 + R 2 + R X , OPT T X T TH ] V DD 2 + 1 2 [ ( R 1 R 2 1 ) C 1 + ( R 2 R 1 1 ) C 2 + ( 1 + R 1 2 R 2 + R 2 2 R 1 ) C X , OPT T X T TH ] V DD 2
Finally, the total additional energy from inserting a crosslink within a clock tree with N crosslinks,
E X , MAX = V DD 2 i = 1 N [ T i R 1 , i + R 2 , i + R X , OPT , i T X T TH + 1 2 ( R 1 , i R 2 , i 1 ) C 1 , i + 1 2 ( R 2 , i R 1 , i 1 ) C 2 , i + 1 2 ( 1 + R 1 , i 2 R 2 , i + R 2 , i 2 R 1 , i ) C X , OPT , i T X T TH ]
is compared with the additional mesh energy EMesh.
The expression in (17) can be further simplified for R1 = R2 = R and C1 = C2 = C, yielding
E X , MAX = V DD 2 i = 1 N [ T i 2 R i + R X , OPT , i T X T TH + C X , OPT , i T X T TH ]
A crosslink-based topology should therefore be used to provide low power while mitigating skew variations when EX,MAX < EMesh. Otherwise, a mesh-based clock distribution network is preferable.

4. Simulation Results

Several examples of moderate and large skew variations that effectively exploit non-tree topologies are described in this section for a zero skew clock tree and a clock tree with certain useful non-zero skew constraints. Different mesh- and crosslink-based topologies are considered. The decision regarding a preferable non-tree topology is based on the energy efficiency metric (EX,MAX < EMesh) and is corroborated with SPICE simulations. Crosslink-based networks have been designed based on the analytic expressions for the optimal crosslink parameters, (8)(11), and validated with simulations. A portion of a clock tree with four levels of buffers and 16 sequentially-adjacent registers in a 180 nm CMOS technology is considered. The source of the clock distribution network is driven by a 1 GHz clock signal. Transistor and interconnect parameters from [28] are used to model the drivers and wires within the clock network. The wires at the top most and lowest clock tree levels are modeled, respectively, by the global and local interconnect parameters [28]. The interconnect parameters for the intermediate layers [28] are used to model the clock lines within the second and third clock tree levels. The threshold for the allowed skew variations is set to 5% of the clock period (TTH = 5%·TP). The transistor and wire widths within the clock tree are varied between 20% to 50% of the nominal value. As a result, skew variations as high as 10% of the clock period (TP) are observed at the registers, exceeding the 5% threshold, TTH. To mitigate skew variations between sequentially-adjacent registers, crosslink and mesh-based solutions are compared. The crosslinks are inserted according to the guidelines provided in Sections 2.4.1 and 3. Both intermediate- and sink-level sparse and dense meshes [8] are used in the zero skew clock tree. For the clock tree with a specific useful skew, all of the meshes and crosslinks are restricted to the upper clock tree levels to maintain the non-zero skew between the specific registers. To determine the preferred non-tree solution for the example networks, the power efficiency of the proposed methods is evaluated based on (18).
For the zero skew clock tree, the largest skew, number of skew violations between sequentially-adjacent registers, and additional energy due to the inserted crosslinks or mesh connections are listed in Tables 1 and 2 for, respectively, moderate (up to 20%) and large (up to 50%) skew variations. Analogous results for the non-zero skew clock tree are listed in Tables 3 and 4, respectively, for moderate and large skew variations. In each example, locally and globally routed crosslinks are considered, respectively, for close and distant crosslink connected segments. Both uniform sparse and dense meshes are considered. Typical mesh parameters are based on [8]. For crosslink-based topologies, the crosslink parameters are based on (8)(11), exhibiting skew variations slightly below the allowed threshold TTH, while satisfying the zero skew constraint between the crosslink connected nodes. High correlation is observed between the analytic expressions and the simulation results.
Based on SPICE simulations for the case of moderate skew variations (Tables 1 and 3), skew mitigation with crosslinks and with a mesh is similar. However, higher power is consumed by the mesh-based clock distribution network. Alternatively, in clock trees with larger skew variations (Tables 2 and 4), the target skew cannot always be achieved with an intermediate- or sink-level sparse mesh (Table 2). Thus, a dense mesh is used at the expense of higher power. Specifically, in the case of the zero skew clock tree (Table 2), the crosslink-based solution is preferred due to the lower power dissipated by the crosslinks as compared to the dense sink-level mesh (compliant with EX,MAX < EMESH). Alternatively, for the clock tree with non-zero skew constraints (Table 4), the maximum skew with crosslinks exceeds the required 50 ps threshold. Hence, an intermediate-level mesh is preferable.
An analytic estimate of the energy is also listed in Tables 14, which is used to determine the preferred clock topology for the specific examples. In Tables 13, the upper bound for the energy consumed by the additional crosslinks EX,MAX is lower than EMESH, demonstrating that a crosslink-based solution is preferable in these specific cases. In Table 4, however, the skew requirements cannot be satisfied with the proposed crosslinks-based solution. Additional crosslinks would increase the dissipated power so that eventually EX,MAX would be greater than EMESH. The decision regarding the choice of topology based on the energy efficiency metric is thereby confirmed by SPICE simulations in these examples.
Note that a more efficient solution may be achieved by either a mesh or a crosslink-based topology in certain clock networks, as shown in the aforementioned examples. The purpose of this work, as demonstrated by the example networks, is to provide metrics for determining the more efficient non-tree method to mitigate skew variations rather than suggest a general topology for any clock network.

5. Summary

Different topologies and techniques to design power efficient clock distribution networks at several operating frequencies are reviewed in this paper. For low power circuits that operate at moderate and low frequencies, a buffered clock tree may be the preferable method. To satisfy a specific set of timing constraints, a balanced, low power clock tree can be efficiently designed using existing techniques. Existing skew solutions in tree-based networks, however, are not efficient in mitigating manufacturing induced variations. Thus, in modern circuits with aggressive timing requirements, non-tree topologies should be considered to cope with skew variations. Mesh-based solutions have recently been shown to reliably mitigate skew variations through the use of a symmetric mesh structure, albeit at significantly higher power. Alternatively, mesh redundancy can be avoided in crosslink-based topologies to mitigate skew variations at lower power.
Guidelines for crosslink insertion in a balanced clock tree are presented in this paper. To maintain a target skew between sequentially-adjacent registers, a heuristic is proposed for inserting crosslinks within a balanced clock tree between upstream zero skew segments to those sequentially-adjacent registers that violate timing constraints. In addition, the crosslink should be inserted as far as possible from the section drivers for enhanced tolerance to variations at lower power. The optimum crosslink parameters under zero skew constraints are also presented. Tradeoffs between energy consumption and skew variations in crosslink-based topologies are investigated in this paper based on analytic expressions and simulations, demonstrating that crosslinks with lower resistance should be used to enhance the tolerance of a circuit to manufacturing induced variations; whereas crosslinks with high resistance and therefore low capacitance should be used in low power clock networks. Analytic expressions are also described to determine the most power efficient clock network topology under specific timing constraints. Simulation results are presented, confirming the conclusions of the theoretical analysis regarding the choice of topology for low power, variation-tolerant clock networks.
Figure 1. Power vs. clock skew variations for different clock network topologies.
Figure 1. Power vs. clock skew variations for different clock network topologies.
Jlpea 01 00219f1
Figure 2. Clock tree composed of the source, trunk, segments, and sinks.
Figure 2. Clock tree composed of the source, trunk, segments, and sinks.
Jlpea 01 00219f2
Figure 3. Non-tree clock network topologies, (a) crosslink insertion and (b) intermediate-level and sink-level meshes.
Figure 3. Non-tree clock network topologies, (a) crosslink insertion and (b) intermediate-level and sink-level meshes.
Jlpea 01 00219f3
Figure 4. An example of the excessive wirelength and power of a mesh as compared to a crosslink-based topology.
Figure 4. An example of the excessive wirelength and power of a mesh as compared to a crosslink-based topology.
Jlpea 01 00219f4
Figure 5. Two clock tree segments (a) with an impedance model of the crosslink and (b) without a crosslink.
Figure 5. Two clock tree segments (a) with an impedance model of the crosslink and (b) without a crosslink.
Jlpea 01 00219f5
Figure 6. Two clock tree segments with impedance model (a) with a crosslink and (b) without a crosslink.
Figure 6. Two clock tree segments with impedance model (a) with a crosslink and (b) without a crosslink.
Jlpea 01 00219f6
Figure 7. Two clock tree segments connected with a crosslink. The dotted line illustrates the short-circuit current path for ClkIn1 = “0” and ClkIn2 = “1”.
Figure 7. Two clock tree segments connected with a crosslink. The dotted line illustrates the short-circuit current path for ClkIn1 = “0” and ClkIn2 = “1”.
Jlpea 01 00219f7
Figure 8. Current through R1 for (a) step input and (b) slow ramp input. The negative currents prior to T = 500 ps degrade the performance.
Figure 8. Current through R1 for (a) step input and (b) slow ramp input. The negative currents prior to T = 500 ps degrade the performance.
Jlpea 01 00219f8
Figure 9. Circuit model of clock tree section for (a) tT (ClkIn1 = “0”, ClkIn2 = “1”) and (b) t > T (ClkIn1 = “1”, ClkIn2 = “1”).
Figure 9. Circuit model of clock tree section for (a) tT (ClkIn1 = “0”, ClkIn2 = “1”) and (b) t > T (ClkIn1 = “1”, ClkIn2 = “1”).
Jlpea 01 00219f9
Figure 10. Output voltage waveforms VClkOut1(t) and VClkOut2(t) with and without a crosslink.
Figure 10. Output voltage waveforms VClkOut1(t) and VClkOut2(t) with and without a crosslink.
Jlpea 01 00219f10
Figure 11. Current components for tT (ClkIn1 = 1, ClkIn2 = 0).
Figure 11. Current components for tT (ClkIn1 = 1, ClkIn2 = 0).
Jlpea 01 00219f11
Figure 12. Mitigation of skew variations between Reg1 and Reg2 with a crosslink. A crosslink should be inserted at (a) the upper clock tree level to reduce variations within a larger group of four registers or (b) closer to the sinks to effectively cancel variations between Reg1 and Reg2.
Figure 12. Mitigation of skew variations between Reg1 and Reg2 with a crosslink. A crosslink should be inserted at (a) the upper clock tree level to reduce variations within a larger group of four registers or (b) closer to the sinks to effectively cancel variations between Reg1 and Reg2.
Jlpea 01 00219f12
Figure 13. Mitigating skew variations within three pairs of registers, (Reg1, Reg2), (Reg1, Reg4), and (Reg2, Reg3), with (a) three crosslinks and (b) two crosslinks.
Figure 13. Mitigating skew variations within three pairs of registers, (Reg1, Reg2), (Reg1, Reg4), and (Reg2, Reg3), with (a) three crosslinks and (b) two crosslinks.
Jlpea 01 00219f13
Figure 14. Power efficient non-tree topologies to mitigate skew variations (a) between four sequentially-adjacent registers with a crosslink, (b) within a large group of sequentially-adjacent registers with a mesh, as opposed to (c) an inefficient crosslink-based clock network.
Figure 14. Power efficient non-tree topologies to mitigate skew variations (a) between four sequentially-adjacent registers with a crosslink, (b) within a large group of sequentially-adjacent registers with a mesh, as opposed to (c) an inefficient crosslink-based clock network.
Jlpea 01 00219f14
Table 1. Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a zero skew clock tree.
Table 1. Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a zero skew clock tree.
Maximum Skew Due to Moderate VariationsSkew Violations CountEnergy per Cycle by Added Non-Tree Elements (%)
(ps)(% of TP)(#)(%)SPICEAnalytic
Clock tree51.565.166453.330.000.00
With local crosslinks31.263.1300.000.070.23 (EX,MAX)
With global crosslinks32.033.2000.001.202.53 (EX,MAX)
With intermediate-level sparse mesh34.913.4900.003.76 (EMESH)N/A
With intermediate-level dense mesh35.623.5600.005.97 (EMESH)N/A
Table 2. Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a zero skew clock tree.
Table 2. Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a zero skew clock tree.
Maximum to Skew Due Large VariationsSkew Violations CountEnergy per Cycle Added by Non-Tree Elements (%)
(ps)(% of TP)(#)(%)SPICEAnalytic
Clock tree71.287.136453.330.000.00
With local crosslinks35.963.6000.000.080.24 (EX,MAX)
With global crosslinks34.613.4600.001.342.80 (EX,MAX)
With intermediate-level sparse mesh67.186.722823.333.75 (EMESH)N/A
With intermediate-level dense mesh66.496.652823.335.91 (EMESH)N/A
With sink-level sparse mesh53.275.3321.674.07 (EMESH)N/A
With sink-level dense mesh46.164.6200.006.28 (EMESH)N/A
Table 3. Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a clock tree with a useful skew schedule.
Table 3. Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a clock tree with a useful skew schedule.
Maximum Skew Due to Moderate VariationsSkew Violations CountEnergy per Cycle Added by Non-Tree Elements (%)
(ps)(% of TP)(#)(%)SPICEAnalytic
Clock tree77.817.786453.330.000.00
With local crosslinks45.014.5000.000.800.82 (EX,MAX)
With global crosslinks43.644.3600.000.982.64 (EX,MAX)
With intermediate-level sparse mesh43.004.3000.003.45 (EMESH)N/A
With intermediate-level dense mesh43.004.3000.005.48 (EMESH)N/A
Table 4. Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a clock tree with a useful skew schedule.
Table 4. Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a clock tree with a useful skew schedule.
Maximum Skew Due to Large VariationsSkew Violations CountEnergy per Cycle Added by Non-Tree Elements (%)
(ps)(% of TP)(#)(%)SPICEAnalytic
Clock tree96.839.686453.330.000.00
With local crosslinks61.866.191613.33>0.79>1.39 (EX,MAX)
With global crosslinks60.776.081613.33>1.07>3.31 (EX,MAX)
With intermediate-level sparse mesh47.844.7800.003.43 (EMESH)N/A
With intermediate-level dense mesh39.393.9400.005.44 (EMESH)N/A

Appendix A: Total Energy Consumed in a Clock Tree Section with a Crosslink

The voltage at the output of a clock tree section and energy expressions for t > 0 are derived in this section based on the circuit model shown in Figure 9. The circuit model of a simplified clock tree section with a crosslink is shown in Figure 9 for 0 < tT with each input at a different polarity (ClkIn1ClkIn2) and for t > T with identical inputs (ClkIn1 = ClkIn2).
Differential equations are determined from Figure 9a, for ClkIn1ClkIn2, tT,
V DD V 2 t T ( t ) R 2 = V 2 t T ( t ) V 1 t T ( t ) R X + C 2 d V 2 t T ( t ) d t
V 2 t T ( t ) V 1 t T ( t ) R X = V 1 t T ( t ) R 1 + C 1 d V 1 t T ( t ) d t
with initial conditions, V 1 t T ( t = 0 ) = V 2 t T ( t = 0 ) = 0. The solution of (A.1)(A.2) for ClkIn1ClkIn2, tT with these initial conditions is
V 1 t T ( t ) = V DD [ R 1 R 1 + R 2 + R X R 1 R 1 + R 2 e t τ + R 1 R X ( R 1 + R 2 ) ( R 1 + R 2 + R X ) e ( R 1 + R 2 + R X R X ) t τ ]
V 2 t T ( t ) = V DD [ R 1 + R X R 1 + R 2 + R X R 1 R 1 + R 2 e t τ R 2 R X ( R 1 + R 2 ) ( R 1 + R 2 + R X ) e ( R 1 + R 2 + R X R X ) t τ ]
where V1tT(t) = VClkOut1(t), V2tT(t) = VClkOut2(t), and τ = R1CX + C1) ≈ R2CX + C2). The total energy consumed during the time interval [0,T] is
E t T = V DD 2 ( 1 R 1 + R 2 + R X ) T + τ V DD 2 R 1 + R 2 [ R 1 R 2 ( 1 e T τ ) + ( R X R 1 + R 2 + R X ) 2 ( 1 e ( R 1 + R 2 + R X R X ) T τ ) ]
To determine the additional energy consumed for t > T, two differential equations from Figure 9b for ClkIn1 = ClkIn2, t > T are
V DD V 2 t > T ( t ) R 2 = V 2 t > T ( t ) V 1 t > T ( t ) R X + C 2 d V 2 t > T ( t ) d t
V 2 t > T ( t ) V 1 t > T ( t ) R X + V D D V 1 t > T ( t ) R 1 = C 1 d V 1 t > T ( t ) d t
with initial conditions, V 1 t T ( t = 0 ) = V 1 t T ( t = T ) and V 2 t T ( t = 0 ) = V 2 t T ( t = T ).
The solution of (A.6)(A.7) for ClkIn1 = ClkIn2, t > T with these initial conditions is
V 1 t > T ( t ) = V DD [ 1 ( R 2 + R 1 e T τ R 1 + R 2 ) e t τ R 1 R X ( R 1 + R 2 ) ( R 1 + R 2 + R X ) ( 1 e ( R 1 + R 2 + R X R X ) T τ ) e ( R 1 + R 2 + R X R X ) t τ ]
V 2 t T ( t ) = V DD [ 1 ( R 2 + R 1 e T τ R 1 + R 2 ) e t τ + R 2 R X ( R 1 + R 2 ) ( R 1 + R 2 + R X ) ( 1 e ( R 1 + R 2 + R X R X ) T τ ) e ( R 1 + R 2 + R X R X ) t τ ]
where V1t > T(t) = VClkOut1(t), V2t > T(t) = VClkOut2(t), and τ = R1CX + C1) ≈ R2CX + C2).
The energy consumed for t > T is dynamic energy and converges to
E t > T = τ V DD 2 ( 1 R 1 + 1 R 2 e T τ ) ( 1 e t τ ) t τ V DD 2 ( 1 R 1 + 1 R 2 e T τ )
The total energy consumption once the first input ClkIn2 switches until the output capacitors are fully charged is
E X Total = V DD 2 ( 1 R 1 + R 2 + R X ) T + τ V DD 2 [ R 1 + R 2 R 1 R 2 1 R 1 + R 2 ( 1 e T τ ) + 1 R 1 + R 2 ( R X R 1 + R 2 + R X ) 2 ( 1 e ( R 1 + R 2 + R X R X ) T τ ) ]

Appendix B: Crosslink Parameters for Low Power Design under the Zero Skew Constraint

The optimum crosslink resistance under the zero skew TXTTH constraint is described in Section 3, permitting the optimum crosslink capacitance to be determined based on the wire capacitance. Capacitive coupling should however be considered to produce an accurate wire model, complicating the analytic expressions for the total wire capacitance. The optimum crosslink capacitance is derived in this section for minimum power under the TXTTH constraint and for a specific crosslink resistance.
Skew variations between crosslink connected segments decrease with lower crosslink resistance RX = ρl/wt, where l, w, and t are the wire length, width, and thickness, respectively, as illustrated in Figure B1. However, when mitigating variations between zero skew segments, the zero skew constraint TXTTH should be enforced with a crosslink, yielding the optimum crosslink resistance R X , OPT T X T TH that satisfies TXTTH at the minimum power, as shown in Figure B2.
Figure B1. Interconnect parameters.
Figure B1. Interconnect parameters.
Jlpea 01 00219f15
Figure B2. Optimum crosslink resistance R X , OPT T X T TH under the TXTTH constraint.
Figure B2. Optimum crosslink resistance R X , OPT T X T TH under the TXTTH constraint.
Jlpea 01 00219f16
Given length l and resistivity ρ of a wire, the product of the crosslink width w and thickness t is constant. The optimum crosslink capacitance C X , OPT T X T TH is derived in this section under the TXTTH constraint,
R X , OPT T X T TH = ρ l wt g ( w , t ) = w t = R X , OPT T X T TH ρ l = M
where w·t = M, as illustrated in Figure B3.
Figure B3. Optimum crosslink capacitance C X , OPT T X T TH under the TXTTH constraint for w·t = M.
Figure B3. Optimum crosslink capacitance C X , OPT T X T TH under the TXTTH constraint for w·t = M.
Jlpea 01 00219f17
Note that w and t range between [wmin,M/tmin] and [tmin,M/wmin], respectively, as illustrated in Figure B4, where wmin and tmin are determined from the minimum geometric feature size. Thus, based on the Weierstrass extreme value theorem [30], the crosslink capacitance CX in the closed and bounded interval [(wmin, tmin),(M/tmin,M/wmin)] must produce the minimum value within that interval. Based on technology parameters from [28] and the interconnect geometry (see Figure B1), the crosslink capacitance is CX1 = 2Cg1 + 2CC1 for the local and intermediate layers and CX2 = Cg2 + 2CC2 for the global interconnect, where
C g 1 = ɛ l [ w h + 2.04 ( s s + 0.54 h ) 1.77 ( t t + 4.53 h ) 0.07 ]
and
C g 2 = ɛ l [ w h + 2.22 ( s s + 0.70 h ) 3.19 + 1.17 ( s s + 1.51 h ) 0.76 ( t t + 4.53 h ) 0.12 ]
are, respectively, the area and fringe capacitance to the underlying plane for the local and intermediate (Cg1), and global (Cg2) layers and
C C 1 = ɛ l [ 1.41 t s e 4 s s + 8.01 h + 2.37 ( w w + 0.31 s ) 0.28 ( h h + 8.96 s ) 0.76 e 2 s s + 6 h ]
and
C C 2 = ɛ l [ 1.41 t s ( h h + 2.06 s ) 0.09 + 0.74 ( w w + 1.59 s ) 1.14 + 1.16 ( w w + 1.87 s ) 0.16 ( h h + 0.98 s ) 1.18 ]
are, respectively, the coupling capacitance for the local and intermediate (CC1), and global (CC2) interconnect. Thus, given an interconnect length l, spacing s, and distance to the ground h, the crosslink capacitance as a function of the width w and thickness t is
f ( w , t ) = C X ( w , t ) = { 2 C g 1 + 2 C C 1 , for local and intermediate interconnect C g 2 + 2 C C 2 , for global interconnect
Figure B4. Bounds of the crosslink width and thickness based on the minimum feature size and w·t = M constraint.
Figure B4. Bounds of the crosslink width and thickness based on the minimum feature size and w·t = M constraint.
Jlpea 01 00219f18
Note that the derivatives ∂CX /∂w and ∂CX /∂t are always positive. Therefore, the crosslink capacitance CX increases with wider and thicker crosslinks. Furthermore, the optimum crosslink capacitance C X , OPT T X T TH can be derived based on the Lagrange method for determining the minimum f(w, t) under the constraint, g(w, t) = M. To optimize y = f(w, t) subject to M = g(w, t), the auxiliary function L(w, t, λ) = f(w, t) + λ(Mg(w, t)) is
L ( w , t , λ ) = { 2 C g 1 ( w , t ) + 2 C C 1 ( w , t ) + λ ( R X , OPT T X T TH ρ l 1 wt ) , for local and intermediate interconnect C g 2 ( w , t ) + 2 C C 2 ( w , t ) + λ ( R X , MAX T X T TH ρ l 1 wt ) , for global interconnect
The partial derivative of L is determined with respect to each of the variables, assuming ws and t = η·h, and set to zero, yielding
L ( w , t ) λ = R X , OPT T X T TH ρ l 1 wt = 0
L ( w , t ) w w s l ( α + β w 2 ) + λ w 2 t = 0
L ( w , t ) t = t = η h l ( γ + δ t 2 ) + λ w t 2 = 0
where
α = { 2 ɛ h , for local and intermediate interconnect ɛ h , for global interconnect
β = { 2 ɛ 0.1456 s ( h h + 8.96 s ) 0.76 e 2 s s + 6 h , for local and intermediate interconnect 2 ɛ [ 0.175 s + 0.102 s ( h h + 0.98 s ) 1.18 ] , for global interconnect
γ = { 2 ɛ 1.41 1 s e 4 s s + 8.01 h , for local and intermediate interconnect 2 ɛ 1.41 1 s ( h h + 2.06 s ) 0.09 , for global interconnect
and
δ = { 2 ɛ 0.6469 h ( 1 + 4.53 η ) 1.07 ( s s + 0.54 h ) 1.77 , where 0.7 η 1.75 , for local and intermediate interconnect 2 ɛ 0.3173 h ( 1 + 4.53 η ) 1.07 ( s s + 1.51 h ) 0.76 , where 2 η 6 , for global interconnect
Higher (lower) values of η within the ranges of (B.19) and (B.20) should be used for thicker (thinner) wires. ∂L/∂w = 0, ∂L/∂t = 0, and ∂L/∂A = 0 are solved based on (B.1) and (B.10)-(B.12), producing
w T = γ + δ / t 2 α + β / w 2 = w 2 t 2 γ t 2 + δ α w 2 + β { t w = γ t 2 + δ α w 2 + β wt = ρ l R X , OPT T X T TH
[ α ( ρ l R X , OPT T X T TH t ) 2 + β ] R X , OPT T X T TH t 2 ρ l = α ( ρ l R X , OPT T X T TH ) + β ( R X , OPT T X T TH ρ l ) t 2 = γ t 2 + δ
[ β ( R X , OPT T X T TH ρ l ) γ ] t 2 = δ α ( ρ l R X , OPT T X T TH )
and the stationary point (wSTAT, tSTAT),such that the gradient of CX(wSTAT, tSTAT) equals zero, is
{ w STAT = ρ l R X , OPT T X T TH 1 t = ρ l R X , OPT T X T TH R X , OPT T X T TH ρ l β ( R X , OPT T X T TH ) γ ( ρ l ) δ ( R X , OPT T X T TH ) α ( ρ l ) = ρ l R X , OPT T X T TH β ( R X , OPT T X T TH ) γ ( ρ l ) δ ( R X , OPT T X T TH ) α ( ρ l ) t STAT = | ± δ α ( ρ l R X , OPT T X T TH ) β ( R X , OPT T X T TH ρ l ) γ | = δ ( ρ l R X , MAX T X T TH ) α ( ρ l ) 2 β ( R X , OPT T X T TH ) 2 γ ( ρ l R X , OPT T X T TH ) = ρ l R X , OPT T X T TH δ ( R X , OPT T X T TH ) α ( ρ l ) β ( R X , OPT T X T TH ) γ ( ρ l )
where α, β, γ, and δ are based on the technology dependent parameters, such as the interconnect resistivity ρ, horizontal spacing s, and vertical spacing h [28], as described by (B.13)(B.20). If the stationary point (wSTAT, tSTAT) ranges within the interval [wmin,M/tmin] and [M/tmin,tmin], and CX(wSTAT, tSTAT) is the minimum value of the crosslink capacitance within that interval, (wOPT, tOPT) = (wSTAT, tSTAT), otherwise
( w OPT , t OPT ) = { ( w min , t max ) = ( w min , ρ l R X , OPT T X T TH 1 w min ) , if C X ( w min , t max ) < C X ( w max , t min ) ( w max , t min ) = ( ρ l R X , OPT T X T TH 1 t min , t min ) , if C X ( w min , t max ) C X ( w max , t min )
Finally, wOPT and tOPT are substituted into CX to determine the optimum capacitance C X , OPT T X T TH under the constraint TXTTH for a crosslink of specific resistance R X , OPT T X T TH and length l, yielding
C X = { For local and intermediate interconnect 2 ɛ l [ w OPT h + 2.04 ( s s + 0.54 h ) 1.77 ( t OPT t OPT + 4.53 h ) 0.07 + 1.41 t OPT s e 4 s s + 8.01 h + 2.37 ( w OPT w OPT + 0.31 s ) 0.28 ( h h + 8.96 s ) 0.76 e 2 s s + 6 h ] For global interconnect 2 ɛ l [ w OPT h + 1.11 ( s s + 0.70 h ) 3.19 + 0.58 ( s s + 1.51 h ) 0.76 ( t OPT t + 4.53 h ) 0.12 + 1.14 t OPT s ( h h + 2.06 s ) 0.09 + 0.74 ( w OPT w OPT + 1.59 s ) 1.14 + 1.16 ( w OPT w OPT + 1.87 s ) 0.16 ( h h + 0.98 s ) 1.18 ]

References

  1. Kourtev, I.S.; Friedman, E.G. Timing Optimization through Clock Skew Scheduling, 2nd ed.; Springer Science + Business Media: Boston, MA, USA, 2009. [Google Scholar]
  2. Xi, J.G.; Dai, W.W.M. Buffer Insertion and Sizing under Process Variations for Low Power Clock Distribution. Proceedings of the 32st Conference on Design Automation, San Francisco, CA, USA, 12–16 June 1995; pp. 491–496.
  3. Tsai, J.L.; Chen, T.H.; Chen, C.C.P. Zero skew clock-tree optimization with buffer insertion/sizing and wire sizing. IEEE Trans. Comput. Aid. Des. Int. 2004, 23, 565–572. [Google Scholar]
  4. Pullela, S.; Menezes, N.; Omar, J.; Pillage, L.T. Skew and delay optimization for reliable buffered clock trees. Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided Design, Santa Clara, CA, USA, 7–11 November 1993; pp. 556–562.
  5. Li, Z.; Zhou, Y.; Shi, W. Wire sizing for non-tree topology. IEEE Trans. Comput. Aided Des. Int. 2007, 26, 872–880. [Google Scholar]
  6. Friedman, E.G. Clock distribution networks in synchronous digital integrated circuits. Proc. IEEE 2001, 89, 665–692. [Google Scholar]
  7. Abdelhadi, A.; Ginosar, R.; Kolodny, A.; Friedman, E.G. Timing-Driven Variation-Aware Nonuniform Clock MeshSynthesis. Proceedings of the 20th ACM Great Lakes Symposium on VLSI 2009, Providence, RI, USA; 2010; pp. 250–257. [Google Scholar]
  8. Rajaram, A.; Pan, D.Z. MeshWorks: A comprehensive framework for optimized clock mesh networks synthesis. IEEE Trans. Comput. Aid. Des. Int. 2010, 29, 1945–1958. [Google Scholar]
  9. Wilke, G.R. Analysis and Optimization of Mesh-Based Clock Distribution Architectures. Ph.D. Thesis, Federal University of Rio Grande do Sul, Porte Alegre, Brazil, 2008. [Google Scholar]
  10. Venkataraman, G.; Feng, Z.; Hu, J.; Li, P. Combinatorial Algorithms for Fast Clock Mesh Optimization. Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, USA, 7–11 November 2010; pp. 563–567.
  11. Ye, X.; Li, P.; Zhao, M.; Panda, R.; Hu, J. Scalable analysis of mesh-based clock distribution networks using application-specific reduced order modeling. IEEE Trans. Comput. Aided Des. Int. 2010, 29, 1342–1353. [Google Scholar]
  12. Feng, Z.; Li, P.; Hu, J. Efficient Model Update for General Link-Insertion Networks. Proceedings of the 7th IEEE International Symposium on Quality Electronic Design, San Jose, CA, USA, 27–29 March 2006; pp. 43–50.
  13. Ye, X.; Zhao, M.; Panda, R.; Li, P.; Hu, J. Accelerating Clock Mesh Simulation Using Matrix-Level Macromodels and Dynamic Time Step Rounding. Proceedings of the 9th IEEE International Symposium on Quality Electronic Design, San Jose, CA, USA, 17–19 March 2008; pp. 627–632.
  14. Sobczyk, A.L.; Łuczyk, A.W.; Pleskacz, W.A. Power Dissipation in Basic Global Clock Distribution Networks. Proceedings of the 10th IEEE Workshop Design and Diagnostics of Electronic Circuits and Systems, Kraków, Poland, 11–13 April 2007; pp. 1–4.
  15. Mori, M.; Chen, H.; Yao, B.; Cheng, C.K. A Multiple Level Network Approach for Clock Skew Minimization with Process Variations. Proceedings of the IEEE Asia and South Pacific Design Automation Conference, Yokohama, Japan, 27–30 January 2004; pp. 263–268.
  16. Restle, P.J.; Carter, C.A.; Eckhardt, J.P.; Krauter, B.L.; McCredie, B.D.; Jenkins, K.A.; Weger, A.J.; Mule, A.V. The Clock Distribution of the Power4 Microprocessor. Proceedings of the IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 4–6 February 2002; pp. 1.144–1.145.
  17. Xanthopoulos, T.; Bailey, D.W.; Gangwar, A.K.; Gowan, M.K.; Jain, A.K.; Prewitt, B.K. The Design and Analysis of the Clock Distribution Network for a 1.2 GHz Alpha Microprocessor. Proceedings of the IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 5–7 February 2001; pp. 402–403.
  18. Kurd, N.A.; Barkarullah, J.S.; Dizon, R.O.; Fletcher, T.D.; Madland, P.D. A multigigahertz clocking scheme for the pentium 4 microprocessor. IEEE J. Solid-State Circuits 2001, 36, 1647–1653. [Google Scholar]
  19. Tam, S.; Leung, J.; Limaye, R.; Choy, S.; Vora, S.; Adachi, M. Clock Generation and Distribution of a Dual-Core Xeon Processor with 16MB L3 Cache. Proceedings of the IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 6–9 February 2006; pp. 1512–1521.
  20. Rajaram, A.; Pan, D.Z. Variation Tolerant Buffered Clock Network Synthesis with CrossLinks. Proceedings of the ACM International Symposium on Physical Design, San Jose, CA, USA, 9–12 April 2006; pp. 157–164.
  21. Vaisband, I.; Ginosar, R.; Kolodny, A.; Friedman, E.G. Power Efficient Tree-Based Crosslinks for Skew Reduction. Proceedings of the 19th ACM Great Lakes Symposium on VLSI, Boston, MA, USA, 10–12 May 2009; pp. 285–290.
  22. Venkataraman, G.; Jayakumar, N.; Hu, J.; Li, P.; Sunil, K.; Anand, R.; McGuinness, P.; Alpert, C. Practical Techniques to Reduce Skew and its Variations in Buffered Clock Networks. Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, USA, 6–10 November 2005; pp. 592–596.
  23. Hu, S.; Li, Q.; Hu, J.; Li, P. Utilizing redundancy for timing critical interconnect. IEEE Trans. Very Large Scale Integr. Syst. 2007, 15, 1067–1080. [Google Scholar]
  24. Samanta, R.; Hu, J.; Li, P. Discrete buffer and wire sizing for link-based non-tree clock networks. IEEE Trans. Very Large Scale Integr. Syst. 2010, 18, 1025–1035. [Google Scholar]
  25. Rajaram, A.; Pan, D.Z.; Hu, J. Improved Algorithms for Link Based Non-Tree Clock Network for Skew Variability Reduction. Proceedings of the ACM International Symposium on Physical Design, San Francisco, CA, USA, 3–6 April 2005; pp. 55–62.
  26. Rajaram, A.; Hu, J.; Mahapatra, R. Reducing Clock Skew Variability via CrossLinks. Proceedings of the 41st ACM/IEEE Design Automation Conference, San Diego, CA, USA, 7–11 June 2004; pp. 18–23.
  27. Mehrotra, V.; Boning, D. Technology Scaling Impact of Variation on Clock Skew and. Interconnect Delay. Proceedings of the IEEE International Interconnect Technology Conference, Burlingame, CA, USA, 3–6 June 2001; pp. 122–124.
  28. Predictive Technology Model. Available online: http://ptm.asu.edu (accessed on 14 May 2011).
  29. Adler, V.; Friedman, E.G. Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load. Analog Integr. Circuit Signal 1997, 14, 29–39. [Google Scholar]
  30. Keisler, H.J. Elementary Calculus. An Infinitesimal Approach, 2nd ed.; Prindle, Weber & Schmidt: Boston, MA, USA, 1986. [Google Scholar]

Share and Cite

MDPI and ACS Style

Vaisband, I.; Friedman, E.G.; Ginosar, R.; Kolodny, A. Low Power Clock Network Design. J. Low Power Electron. Appl. 2011, 1, 219-246. https://doi.org/10.3390/jlpea1010219

AMA Style

Vaisband I, Friedman EG, Ginosar R, Kolodny A. Low Power Clock Network Design. Journal of Low Power Electronics and Applications. 2011; 1(1):219-246. https://doi.org/10.3390/jlpea1010219

Chicago/Turabian Style

Vaisband, Inna, Eby G. Friedman, Ran Ginosar, and Avinoam Kolodny. 2011. "Low Power Clock Network Design" Journal of Low Power Electronics and Applications 1, no. 1: 219-246. https://doi.org/10.3390/jlpea1010219

Article Metrics

Back to TopTop