1. Introduction
Process variation during manufacturing impacts the oxide thickness, threshold voltage, etc. of the transistor in integrated circuits, resulting in path delay fluctuations [
1,
2]. Due to these fluctuations, the performance of chips is different; hence, chips (microprocessors, DSPs, micro controllers, and sometimes even ASICs) can be placed into different speed bins [
3,
4]. Chips with higher performance are placed into higher speed bins, which can bring more profit. For instance, the price of the fastest Intel Prescott
TM and AMD64 Venice
TM device is about three times higher than that of the slowest parts [
5]. To obtain more profit, it is necessary to promote the proportion of faster chips. Hence, efficiently and accurately
tests are required to prevent high-performance chips from being placed into low bins. Another approach is to make efforts to promote low-end chips to higher bins and improve the proportion of high-end chips.
Generally, speed binning can be achieved by performing at-speed
tests [
6], which can be divided into functional, structural (scan-based), and sensor-based tests. The functional
test is to find the maximum operation frequency at which the chip can operate normally by applying test patterns at different clock frequencies in the functional mode [
7]. This requires the use of high-end automated test equipment (ATE) to apply and analyze a large number of test patterns at high speed, resulting in high test overhead. In order to reduce the cost of testing, part of the work adopts a Software-Based Self-Test (SBST) [
8,
9,
10] to perform on-chip storing and analyzing of test patterns, which reduces the requirements for high-end ATE. SBST usually requires large on-chip memory storage.
The structural (scan-based)
test includes LOC (launch on shift), LOS (launch on capture), and LOES (launch on extra shift) tests [
11,
12,
13]. During the structural
test, at-speed test patterns are shifted through scan chains with a low speed scan clock, and then, the test is performed with one or two high-speed functional clock cycles [
14,
15,
16]. The test responses are scanned out for analyzing to get the actual speed bins. [
17] investigates the correlation between functional test frequency and that of structural test patterns. In [
18], a formula relating structural critical path testing frequency to system operation frequency is offered, which shows that a structural test can be employed to reduce the speed binning dependency on functional tests. Using the on-chip programmable PLL circuitry to obtain high-frequency clocks for an at-speed scan test is presented in [
19]. Ref. [
20] focuses on generating high-quality structural binning patterns considering the impact of process variations and proposes a new pattern-generation methodology for speed binning.
Generally, the speed bin of a chip is determined by the delay of critical paths. Hence, on-chip sensors, which can measure the delay of the critical paths [
21,
22,
23,
24,
25,
26], or monitor the worst slack of critical paths [
27,
28,
29], are adopted to infer the speed bins of the chip. A low-overhead solution for characterizing the
of a circuit is proposed in [
30], which chooses a small set of representative paths in a circuit and dynamically configures them into ring oscillators to compute the
.Sensors combined with machine learning [
31,
32,
33] and data mining are also applied to the speed binning test. Refs. [
34,
35] predict
through data mining of the measured on-chip performance monitors. In addition, for higher profit, a methodology is proposed in [
36,
37] to adjust the bin boundaries according to the test result. An
test based on on-chip sensors has a lower requirement for high-end external equipment than a functional test [
38], and it takes less time than a structural test. This type of test has gradually become popular in recent years.
Due to process variations and noise, a path fails at a binning frequency with a certain statistical probability. At the same time, the failing paths are different from one chip to another. Therefore, to increase the yield (here, yield is defined as minimizing misplacing chips at lower bins and increasing the number of chips placed at higher bins), it is important to accurately identify and adapt the failing paths at a binning frequency. Hence, in this paper, a novel on-chip adaptive speed-binning system for yield optimization is proposed, which can be used to optimize the yield of a digital circuit when it has redundant timing in clock tree, the advantages of which are summarized below:
Based on the test results, the proposed system can improve the yield and increase the overall profit by promoting chips from lower speed bins to higher speed bins.
The proposed system can work seamlessly with existing tests, including functional, structural, and sensor-based tests.
The proposed on-chip adaptive speed-binning system is all digital with negligible area and test overhead.
It should be noted that this paper is an extended version of our paper published in the 2017 2nd IEEE International Conference on Integrated Circuits and Microsystems (ICICM) [
29]. The ICICM 2017 paper [
29] only focuses on a novel binning sensor for low-cost and accurate speed binning, while this paper focuses on promoting chips placed in the lower bins into higher bins. This paper improves the binning sensor architecture in the ICICM paper, and it proposes a novel on-chip adaptation binning and
yield optimization system. Moreover, it presents a novel and adaptive methodology for yield optimization, in which the paths impacting the speed bin of a specific IC are identified and adapted by our proposed on-chip
Binning Checker and
Binning Adaptor. As a result, some ICs placed in the lower bins can be promoted into higher bins. Hence, the overall profit can be increased. Due to the different design purposes and test method, the two papers provide different experimental results. In summary, there is about a 70–80% difference between the ICICM paper [
29] and this paper.
The rest of the paper is organized as follows. The architecture of the proposed system is described in
Section 2. The system implementation and
yield optimization flow is presented in
Section 3.
Section 4 shows the experimental and measurement results. Finally,
Section 5 gives the concluding remarks.
2. Architecture
The proposed on-chip adaptive binning and
yield optimization system is composed of
Binning Checkers,
Binning Adaptors, and some on-chip flash memory, as shown in
Figure 1.
2.1. The Binning Checker
Generally, the speed bin of a chip is determined by some of the longest paths, which are named as
Binning Critical Paths. The
Binning Critical Path selection methodology is presented in detail in
Section 4.3. Due to process variations and noise, a
Binning Critical Path’s delay may exceed the bin boundary, which means the output of the path switches after the capture clock. Therefore, the capture flip-flop obtain obtains wrong data, making the chip drops drop to a lower bin during response analysis. The proposed
Binning Checker utilizes this feature to monitor whether the timing critical path’s delay is longer than the applied clock frequency. The detailed structure of the
Binning Checker is shown in
Figure 2. As the
Binning Checker only includes a few standard gates, a dedicated
Binning Checker can be attach attached at the end of each
Binning Critical Path. The
Binning Checker has two tasks:
Task 1: It locates the paths causing the binning failure on silicon, where is the frequency boundary between and its higher bin .
Task 2: It evaluates whether the located Binning Critical Paths can be adapted to by the proposed Binning Adaptor.
If both (1) and (2) are satisfied, the output of
Binning Checker (
), as shown in
Figure 2, is switched to
1, which initiates the adaptive binning and
yield optimization.
As shown in
Figure 2, the
Binning Checker is used to monitor the output of
Binning Critical Path. The clock frequency of the critical path is the binning frequency
. Assume the adaptable margin is
, which is equal to the delay of
inside the
Binning Checker. To achieve Task 1, the two inputs of the
gate come from the output of the critical path (node
in
Figure 2) and the critical path output signal through the
, respectively. Thus, the
gate outputs
1 if the data transit from
0 to
1 or
1 to
0 within
. The
gate and
together form a “sticky” structure, which means that once
outputs
1, the value of
remains
1 until it is reset. Before binning optimization, all
s in
Binning Checker are reset.
is composed by several buffer cells, and its delay is equal to the sum of the delay of
,
, and
, as shown in
Figure 2. Hence,
can monitor data transition after the capture edge of
within
. If the
(the output of
Binning Critical Path) transition occurs within
after the clock capturing, the output of
Binning Checker (
) would turn to
1.
means that (i) the delay of the
Binning Critical Path is greater than
, and (ii) the delay of
Binning Critical Path is smaller than
. To make the detected failing path recoverable,
inside the
Binning Checker needs to be the same as
inside the
Binning Adaptor (
Figure 2). The
Binning Adaptor is introduced in detail in
Section 2.2. By employing the design above, Task 1 and 2 mentioned above are fully achieved.
Figure 3 shows the output of
under different
transition conditions. In
Figure 3a, the variable
turns to
1 before clock capturing, which means that the monitored path does not cause binning failure at
. Hence, the output of
stays at
0. In
Figure 3b, the variable
turns to
1 within
after clock capturing, which means that the monitored path is an adaptable
Binning Critical Path. Thus, the output of
turns to
1 to initiate adaptive binning. In
Figure 3c, there is a glitch in
after capturing. Then, the output of
stays at
0 to avoid misjudgement and wrong adaptation. In
Figure 3d, the moment that the variable
turns to
1 is out of adaptable range, which means that the monitored path is an unadaptable
Binning Critical Path. Thus, the output of
stays at
0.
2.2. The Binning Adaptor
The
Binning Adaptor is designed to recover the identified adaptable failure
Binning Critical Paths. The circuit for
Binning Adaptor is also shown in
Figure 2. The
Binning Adaptors are inserted into the selected
Binning Critical Paths, which moves the launching clock ahead to achieve a longer clock cycle. In other words, the
Binning Adaptor can borrow a redundant margin from its upper stream path when necessary. Gate
of the
Binning Adaptor is inserted at the end of the original clock network for
. To make the
Binning Adaptor insertion have little impact on the already closed timing, a number of buffers in the original clock tree need to be removed to cancel the effect of MUX insertion. From
Figure 2, it can be seen that there are two configurable routes for the clock (
) going through the
Binning Adaptor, namely the
Timing Closure Clock Route and the
After Adaptation Clock Route. Clearly, the clock cycle of
After Adaptation Clock Route is
longer than the
Timing Closure Clock Route. The
is controlled by the
Binning Checker inserted into the same path. When the adaptation decision is made by the
Binning Adaptor, the
Binning Checker switches the route of launch clock to
After Adaptation Clock Route. Thus, the failed adaptable
Binning Critical Path can be recovered. Then,
is written into a directly accessible flash memory to guarantee the function of the path for the later powering ups.
Figure 4 shows the timing of a
Binning Critical Path and the margin borrowed from its upper stream path. To make the margin borrowing possible, the upper stream path should still function after lending margin
to the failed
Binning Critical Path. Hence, the upper stream path should have a margin larger than
. Therefore, the upper stream path and
Binning Critical Path could both support binning boundary frequency
. It should be noted that when the upper stream path has sufficient margin, only one
Binning Adaptor is needed to be inserted, as shown in
Figure 5a. However, sometimes, the timing margin of the upper stream path does not meet this requirement. Then, multiple
Binning Adaptors are needed, as shown in
Figure 5b.
Binning Adaptor I is inserted into the launch flip-flop of the
Binning Critical Path and
Binning Adaptor II is inserted into the launch flip-flop of the upper stream path. The
Binning Adaptor II keeps borrowing extra slack
from the higher upper stream path (
), which ensures that the upper stream path (
) has an abundant margin to lend to the binning critical path. In addition, if the timing margin of both
and
is less than
, then
should be reduced.
It should be noted that there could be more than one upper stream path ending at FF0 in
Figure 2. Therefore, we should certify that the longest one has a slack larger than
.
2.3. Utilized Flash Memory
To permanently place the chip into the promoted bin, it is essential to store the
into non-volatile memory such as flash after the binning adaptation, as shown in
Figure 1 and
Figure 2. It needs to be noted that flash memory is not easy to be integrated in the same die with digital electronics. Considering that most chips have to load some configuration information at boot time, adaptation results (values of
) could be included in the configuration information, which is stored in external memory, such as flash. The
Binning Adaptors can read proper adaptation results from flash through on-chip registers after powering on or rebooting. The flash should be directly accessible by the proposed scheme. Therefore, the
Binning Adaptor can read a proper adaptation signal directly from flash after powering on or rebooting. It should be noted that the utilized flash memory addresses should be writable only during the
optimization process, which means the output of
Binning Checker loses the control of
Binning Adaptor after the optimization. In addition, if cost allows, the designer can integrate the one-time programmable (OTP) memory directly into the chip for
signals storage.
2.4. The Limitation and Yield Optimization Rate Estimation
The bin promotion may not be successful for all devices.
Figure 6 shows the silicon delay distributions of all speed-paths of a device, which can be binned at
. In other words, all speed-paths should have a probability of locating on the right side of the binning boundary. However, some Gaussian curves almost fully locate on the right side of the binning boundary, which represents the paths with little probability of causing binning failure at
. Meanwhile, the other Gaussian curves have a non-negligible part locating on the left side of the binning boundary, which represents the paths causing
binning failure frequently. The shaded Gaussian curves in
Figure 6 represent the selected
Binning Critical Paths, and the
boundary marks the adaptability of
Binning Adaptor with process variations.
Hence, there are two cases in which the bin promotion may not be achieved:
Case 1: A chip can be promoted to the higher bin only if all silicon failure paths are successfully adapted. However, the selected
Binning Critical Paths may not cover all silicon failure paths. If the delay of an unselected path exceeds the binning boundary on silicon, such as
in
Figure 6, then the chip cannot be promoted to the higher bin.
Case 2: Even if all silicon failure paths are selected as
Binning Critical Paths, and equipped with
Binning Checkers and
Adaptors, if the actual slack of a
Binning Critical Path is smaller than
, such as
in
Figure 6, which means some failing paths are out of the adaptation range, then the bin promotion of the device also fails.
Hence, if we define the
Yield Optimization Rate as the probability of successfully promoting a device to the higher bin, then it can be calculated as Equation (
1), where
m is the number of unselected paths with a probability of silicon failure belonging to Case 1, and
n is the number of selected paths with a probability of being unadaptable belonging to Case 2. Here,
n depends on the manufacturing technology, which is uncontrollable during the design stage. Reducing the number of
m and adjusting
are the best ways to improve the
Yield Optimization Rate, which is introduced in
Section 4.3.
Note that process variations may affect the
Binning Checkers and
Binning Adaptors. The impacts include (i) the scope of critical paths that the
Binning Checker can recognize and (ii) the extra margin the
Binning Adaptor can give. (i) and (ii) are designed to be the same as
; however, variations may make (i) and (ii) deviate from
. Hence, to reduce the impact of process variations, it is suggested to use the lowest variation rate (LVT large cell) to build the
Binning Checkers and
Binning Adaptors.
Section 4 provides the simulated and measured yield optimization results considering process variations. In
Section 4, the measurement results shows that the binning promotion rate is 7–16%.
2.5. Application Scenarios
The proposed method is to reduce the impact of process variations on chip performance through post-manufacturing adaptation. During the design phase, the designer can optimize the timing of the circuit as much as possible. For example, by inserting registers, paths with large delay can be transformed into a two or multi-stage pipeline. However, due to design and optimization constraints, there will always be some paths that have greater delay than most paths, which are critical paths. These paths are the ones that need to be focused on. Even if the clock distribution tree is highly optimized, there may still be a sufficient margin in the upper stream path of the critical path.
To ensure that the critical path can run at the target frequency, if there is redundant timing in the clock tree, then we can adjust the clock tree so that the critical path can use the redundant timing of the clock distribution tree. This step can be performed during the design phase, or after being manufactured using the proposed method.
However, it is know that the process variation causes a gap between design and manufacturing. Even if the designer has highly optimized the timing at the design phase, the critical path or even non-critical path after being manufactured still has the probability of falling to a lower bin. If the designer improper utilizes redundant timing of the clock distribution tree at the design phase, it may make the upper stream paths of critical paths fail at the binning boundary after being manufactured, which means there are still limitations to timing optimization at the design phase.
For example, if the delay of a critical path is 980 ps, the delay of its upper stream path is 950 ps, the adaptable margin () is 10 ps, and the target clock period is 1000 ps. The adaptable margin is the margin that the upper stream path can lend to the critical path. Considering process variations, after being manufactured, the actual delay of the critical path may be 990 ps, the delay of its upper stream path may be 990 ps, and the adaptable margin () may be 11 ps. It is possible if the critical path is built with low variation cells (LVT large cells) for small delay, and the upper stream path is built with relatively higher variation cells (HVT small cells) for low area and low power overhead. If we perform timing adaptation at the design phase, both the critical path and its upper stream path could work at the target clock cycle during simulation. However, for the actual chip, the upper stream path would fail at the target clock period. However, if the designer adopts the proposed binning optimization method, he/she can perform adaptation according to the actual testing result. In this case, according to the test result, no critical path adjustment is needed, and both the critical path and its upper stream path could work at the target clock period. Hence, timing adaptation based on manufacturing test results can effectively improve the performance of the chip (speed bins).
In a digital circuit, optimizing and equalizing the pipeline stages is a common way to improve chip performance. Then, the upper stream path of the critical path may not have enough timing margin for adaptation. For this case, multiple
Binning Adaptors are needed, as shown in
Figure 5b. It should be noted that the proposed method is not applicable if there are a large number of deep pipelines with close delay per stage in the circuit to be optimized.
In addition, there are also some circuits that use pipelines with as few stages as possible for low power consumption. There are some critical paths in such circuits that have a large impact on the circuit speed bins. In such design, retiming is limited because of the higher area and power consumption overhead it introduces. As a result, there will be some redundant timing of the clock distribution tree. At the same time, timing optimization during design stage, such as employing the redundant timing of the clock distribution tree, is impacted by the process variations as discussed above. In this case, using our proposed method, which adjusts the critical path clock after being manufactured, the performance of the chip can be improved with low overhead. The proposed Binning Checker only works during the speed binning test, which brings little power consumption.
As discussed above, considering the process variations, timing optimization using EDA tools during the design phase combined with timing adaptation using the proposed method after being manufactured is a better solution to achieve high chip performance.
It should be noted that there are limitations in the generality of the proposed adaptation methodology. In detail, the proposed methodology can be used to optimize the
yield of a digital circuit when it has redundant timing in the clock tree, so that the critical paths can borrow sufficient timing margin from their upper stream paths for adaptation. If there is not enough timing margin for adaptation, multiple Binning Adaptors can be employed, as shown in
Figure 5b. Our proposed methodology is not applicable if the circuit to be optimized has poor redundant timing in the clock tree, such as a circuit using a large number of deep pipelines with close delay per stage, which means that it would be hard for one stage to obtain sufficient timing margin from the previous stage.
3. The Flow for Binning and Yield Optimization
The
binning and yield optimization flow based on the adaptation system is shown in
Figure 7.
Step 1: Binning Critical Path Selection. As discussed above, the
Binning Critical Path group size is limited by the acceptable overhead. However, to maximize the
Yield Optimization Rate, the
Binning Critical Path group should cover the paths causing speed binning failure with the highest probability. Therefore, designers should perform statistical timing analysis (STA) after layout generation and timing closure to select the most critical paths. The
Binning Critical Path selection details and results for the implementation of this paper are shown in
Section 4.3.
Step 2: Binning Checker and
Binning Adaptor Insertion. The
Binning Checkers and
Binning Adaptors are inserted at the selected
Binning Critical Paths. As discussed in
Section 2, by replacing the clock buffers belonging to the original clock tree with the cells needed by the
Binning Adaptor, the insertion process should not affect the already closed timing, and it requires minimum layout adjustments. The small overhead of the
Binning Checker and
Binning Adaptor helps to keep the overall area overhead of the adaptation system low.
Step 3: Binning the Chip Under Test at . In this step, the fabricated chip can be binned at frequency using functional, structural, or sensor-based speed binning methodologies. At the same time, the adaptable Binning Critical Paths are identified by the proposed system.
Step 4: Get Preliminary Binning Result. In this step, if the chip under test passes the test at , then the binning frequency is increased until reaching . However, if the chip fails at , the Binning Checkers identify adaptable Binning Critical Paths.
Step 5: Perform Binning Adaptation. In this step, the Adapt_EN signals, which are the outputs of the Binning Checkers, are piped to a directly accessible non-volatile memory (DMA), and the binning adaptation is performed at the same time. The identified adaptable failing Binning Critical Paths in Step 4 are adapted.
Step 6: Re-Binning at . In this step, the chips under test are binned again at frequency . If all of the silicon failure paths of a device have been adapted successfully, the re-binning passes. Hence, a percentage of the chips falling into the lower bin can be promoted to the higher bins. However, if the re-binning failed, the data in DMA should be cleared to ensure the chip still functions at the already passed lower bin.
Step 7: Speed Bin Decision and Yield Optimization Rate Calculation. The speed bin of the chip under test can be decided according to whether the re-binning after adaptation is successful or not. By comparing the binning yield at Step 6 and Step 3, the Yield Optimization Rate can be obtained.
Step 8: Label chips with Marketing Frequency. Aging and other factors would lead to performance degradation. Therefore, the marketing frequency of a device needs to add some additional margin based on the measured .
As discussed above, the yield optimization flow can be integrated into all
tests, such as the methods mentioned in [
7,
14], and the binning adaptation system brings little impact on the existing tests.