Incremental Placement Technology Based on Front-End Design

Zhang, Zihang; Chen, Gang

doi:10.3390/electronics13142745

Open AccessArticle

Incremental Placement Technology Based on Front-End Design

by

Zihang Zhang

^* and

Gang Chen

^*

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(14), 2745; https://doi.org/10.3390/electronics13142745

Submission received: 18 June 2024 / Revised: 10 July 2024 / Accepted: 11 July 2024 / Published: 12 July 2024

(This article belongs to the Special Issue Advanced Technologies and Applications in Computer Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

As the scale and complexity of chips continue to increase, chip design becomes increasingly challenging. Designers typically need multiple iterations to achieve satisfactory results, but the substantial time required for each modification exacerbates the time pressure in the chip design process. Incremental methods are an effective technique to shorten the development iteration time. Therefore, this paper proposes a module-based incremental layout technique utilizing the hierarchical structure of the unflattened netlist. We have developed an incremental EDA tool for mid-version evaluation, covering the process from RTL to placement DEF. This tool enables faster synthesis and layout, assisting designers in assessing the feasibility of the current RTL design, thereby accelerating the estimation of the PPA (Power, Performance, and Area) during version iterations. It aids in making better choices for RTL design and logic synthesis, consequently shortening the chip development iteration time.

Keywords:

design–technology co-optimization (DTCO); electronic design automation (EDA); incremental algorithms; physical design; placement

1. Introduction

As the scale and complexity of VLSI (Very-Large-Scale Integration) continue to grow, the time required for design, verification, and testing has correspondingly increased. Complex designs necessitate numerous revisions, and the extensive time consumed by these iterative modifications significantly impacts the design and production efficiency. Layout, being one of the most time-consuming steps in the VLSI design process, has prompted researchers and engineers to explore various strategies to enhance the efficiency in this phase, such as algorithm optimization, parallel computing, and design partitioning. However, the widely used placement techniques in the industry, whether heuristic algorithms or analytical algorithms [1], all flatten the logic synthesis netlist and solve for the optimal positions of individual devices based on net connections. Even though many partitioning algorithms have been proposed in the past to improve placement efficiency, their fundamental logic still relies on partitioning based on net connections.

Twenty years ago, Pongstorn Maidee [2] proposed a partition-based FPGA placement algorithm. In the following decade, such methods fell out of widespread use due to their inferior performance in terms of runtime and placement quality compared to analytical algorithms. Consequently, these algorithms are now primarily used to partition the complete netlist into a series of sub-netlists before placement, which is then sequentially (or in parallel) applied to each sub-netlist and subsequently integrated to form the final placement. The module-based clustering partitioning method proposed in [3] achieves excellent partitioning results in a relatively short time. The approach in [4] attempts to find suitable clusters using machine learning. Both demonstrate that clustering methods can significantly reduce the placement time. However, for today’s problem scales, the clustering algorithms themselves require considerable time (as they are also NP problems). If readily available, well-founded partitioning can be utilized, and this portion of the time can be saved, enabling faster placement.

In our search for a reasonable existing partitioning approach, we found that, although the concept of full-flow EDA tools [5,6] is emphasized in both academia and industry, most placement tools still choose to use the flattened Verilog netlist as the input. This approach completely discards the module hierarchy defined during the early design phase. While this flattening is performed to optimize the logic synthesis stage, preserving module characteristics during the placement and routing phase can offer significant benefits. Specifically, it helps ensure that components within the same module are placed closer together, which reduces the power consumption, minimizes the signal interference, and enhances the signal integrity. Additionally, this preservation can facilitate the development of modular layout-based incremental placement techniques.

Modular placement technology refers to a placement improvement approach where, during the logic synthesis stage, the netlist is not flattened. Instead, the unflattened netlist is used as the input for the placement and routing. During the placement stage, the netlist is partitioned based on the module structure, with the internal placement performed for each module first, followed by inter-module placement, and ultimately improving the layout for the entire project. Since the partitioning of modules is implemented at the RTL level, directly performing module partitioning at the netlist level requires significant computation, increasing the time cost. Additionally, users find it challenging to control the partitioning results, and pure netlist-level incremental tools lack module correlation analysis. Therefore, discussing modular incremental placement alone is unreasonable. By utilizing the incremental logic synthesis techniques that we previously developed [7], this paper explores the combined approach of incremental logic synthesis and placement. Our tool takes RTL-level code as input, allowing designers to iteratively modify the RTL code based on the results. Upon receiving the new version of the modified code, the proposed tool uses an integrated Verilog syntax analyzer to check the input, identify modifications, and perform incremental logic synthesis and single-module placement for the modified modules. It then performs inter-module placement and outputs a DEF file for designers to evaluate the PPA.

Incremental techniques have not yet been applied in open-source EDA for placement. One of the main reasons is that performing partial incremental placement after arbitrary modifications is challenging. Therefore, we propose a hierarchical modular placement method and advocate for designers to adopt good design practices. This includes reasonably partitioning modules at the RTL level and making modifications on a per-module basis during the modification phase. This allows for individual module placement and integration, thereby shortening the overall development time and standardizing the development process. Our experiments have demonstrated that such a design flow can significantly reduce time consumption, especially in large-scale layout problems where module partitioning is reasonable. For example, in the testing cases from the OpenLane source code [6], or processor chips designed in the “One Student One Chip” project [8] (with their RTL code used as input), the time taken by our proposed incremental placement technique is approximately 10% of that required by conventional placement methods. Based on the output DEF, designers can quickly assess the approximate area and wire length of the current design. Although the HPWL (Half-Perimeter Wire Length) may differ from the results obtained using a fully flattened method, it can still serve as a viable metric for mid-version evaluation. It can linearly demonstrate the design differences between various versions, aiding designers in selecting better design options during the early design stages. It is inevitable that module-based rapid placement will sacrifice some quality to save time. Therefore, we recommend that designers use our technique for quick design during the development process and adopt other advanced techniques for full synthesis and placement and routing before final production.

2. Previous Work and Our Contributions

VLSI placement is an NP-complete combinatorial optimization problem. To improve the efficiency and quality of solutions to this problem, related algorithm research can be divided into three development stages. In the first stage, heuristic algorithms such as simulated annealing [9] and genetic algorithms [10] are primarily employed. These algorithms gradually approach the optimal solution by exchanging component positions and locally moving components. However, the excellent solutions obtained by these algorithms come at the cost of long running times. As a result, they are gradually replaced by placement methods based on partitioning. During this stage of development, the placement problem is simplified by recursively dividing netlists and layout areas [11,12,13]. For example, commonly used methods include First Choice (FC) [14], Best Choice (BC) [15], and cut–overlay [16]. However, since these algorithms treat the entire netlist as a single hypergraph, it becomes challenging to evaluate the rationality of the partitioning during the placement stage. Crucially, as the problem size increases, the running time of the partitioning algorithm significantly impacts the overall algorithm’s runtime. Naturally, numerous techniques [3,17] have been proposed in recent years to address these two issues. While these methods have significantly reduced the time compared to past algorithms, the time overhead remains non-negligible.

The current mainstream approach in both academia and industry is the nonlinear analysis method. Wire length and density are individually modeled as smooth mathematical functions and combined into a single objective function through a weighting parameter. The layout optimizer then employs methods such as conjugate gradient descent to solve for the layout results. In the early stages, due to the high complexity of the modeling functions, the nonlinear method utilizes multi-level cell clustering to simplify the problem and accelerate the algorithm [18]. However, in recent years, modeling methods based on electrostatic forces, such as FFTPL [19], ePlace [1], and RePlAce [20], have been more widely adopted in academia. In these methods, the clustering process is abandoned, and the entire netlist is modeled as a hypergraph for global placement. As the problem size continues to grow, improving the placement efficiency for very-large-scale problems remains a highly concerned issue in industry. In addition to algorithmic innovations, GPU acceleration methods have also been widely used to enhance efficiency [21,22].

Combining partitioning algorithms with analytical algorithms is an ideal approach to maximize placement efficiency. However, even advanced clustering algorithms [3,4] consume a significant amount of time for very-large-scale problems and can produce uncertain and uncontrollable partitioning results. Therefore, the best method is to fully utilize the information from previous steps for direct partitioning. The hierarchical partitioning method proposed in this paper not only eliminates the need for time-consuming partitioning but also makes the partitioning process controllable.

Today, both academia and industry are striving to establish RTL to Graphic Data System II (GDSII) design frameworks. Open-source tools like Openlane [6] and IEDA [5] boast capabilities spanning from RTL to GDSII. However, these tools sacrifice significant information between logic synthesis and physical design, particularly during the process from technology mapping to floorplanning. For example, the hierarchical netlist structure defined during the early design process is flattened into a hypergraph during the placement stage, resulting in the direct loss of all hierarchical information. The primary reason behind this is the widespread adoption of mainstream placement tools. Some common RTL to GDSII flow design frameworks like Openlane opt for the direct integration of mature placement tools. This abandonment is regrettable. This is because even advanced analytical algorithms require a significant amount of time to handle large-scale layout problems. Previous partitioning layout algorithms have demonstrated that partitioning layout problems can effectively reduce time consumption. Moreover, the front-end design stage can provide ready-made, reasonable module-based partitioning results for the subsequent physical design process. Compared to minimizing partitions, function module-based partitioning is more beneficial for subsequent production and manufacturing. Fully utilizing these results and encouraging designers to divide appropriately sized modules based on functionality during the early design stages according to the Design–Technology Co-Optimization (DTCO) [23] concept, and then directly partitioning circuits based on these modules in the back-end design process, will maximize the value of the RTL to GDSII flow design frameworks.

The design of VLSI is a complex process, often involving multiple iterations. Traditional placement methods require a complete redesign of the placement after each modification. Incremental programming techniques emphasize the gradual construction and improvement of the design, and introducing them into the physical design stage can help identify and address design issues earlier, thereby saving time spent on design modifications. To our understanding, currently, open-source EDA placers have not yet incorporated incremental placement technology, which is primarily due to numerous technical challenges. We have analyzed and addressed two of these challenges. Firstly, the incremental placement algorithm is difficult to implement for netlists that undergo arbitrary modifications. Secondly, without technical support from upstream stages, implementing incremental placement solely at the placement level lacks practical significance. Therefore, we propose a module-based incremental placement technique, which encourages designers to rationally partition their designs into modules during the initial design phase and make modifications by module during the subsequent iterations. The rationale behind this approach is that modular design not only facilitates the implementation of incremental techniques but also benefits various aspects of the project, including scalability, portability, manufacturing, and timing, making it a win–win proposition. To address the second issue, we suggest that the hierarchical netlist structure based on logic synthesis results can serve as a foundation for incremental placement techniques. Starting from RTL-level code, we implement change detection and incremental logic synthesis to support this. This approach ensures that only the affected modules undergo synthesis and placement, significantly reducing computational overhead and improving the overall efficiency. Our work is based on the OpenROAD [24] framework, with certain placement operations within modules leveraging the support of OpenROAD’s placement module. Our main technical contributions are as follows:

We have proposed a modular placement method and developed an incremental placement tool based on this method. This tool can save 90% of the time while ensuring that the HPWL difference is only around 5–20%. This technology is intended solely for the design evaluation of intermediate iteration versions, meaning it can save a significant amount of time during version modifications. After making repeated adjustments to the initial design to achieve a satisfactory final design, users can choose conventional methods to obtain the final placement solution. This design process can save more than 50% of the time spent on performance evaluation during the intermediate design stages.
We advocate for the thorough utilization of intermediate results generated during the front-end design process for module-based layout. We hope that the RTL to GDSII flow EDA tools will refrain from flattening an increasing amount of information in the design process, thus maximizing the value of full-flow design. We have also developed an incremental logic synthesis tool in earlier work, which, when combined with the work presented in this paper, forms an incremental EDA tool for RTL to DEF/LEF.

3. Proposed Framework

3.1. Problem Definition

Based on the unmerged Verilog netlist output after logic synthesis, partition the entire netlist and model each module in the netlist as a hypergraph

H_{i} = (V_{i}, E_{i}) .

In this hypergraph, the vertex set

V_{i} = {v_{i} | 1 \leq i \leq a}

within the hypergraph represents all device instances within this module, totaling

a

instances of devices. The internal hyperedge

E_{i} = {e_{i} | 1 \leq i \leq b}, e_{i} = {v_{e i 1}, v_{e i 2}, . . ., v_{e i n} | v_{e i 1}, v_{e i 2}, . . ., v_{e i n} \in V_{i}}

represents all nets within this module, with each net being modeled as a hyperedge. The physical placement of edges is meaningless during the placement phase; therefore, a topological structure is used to represent an edge. Each net is represented as a collection of device instances to which it connects. Based on this foundation, the entire circuit is modeled as a hypergraph,

C = {(H_{i}, T_{i}) | 1 \leq i \leq n}

. In Verilog netlists, module invocations establish a hierarchical relationship between modules, resulting in a tree-like structure. Therefore, each module can be represented by a list of its instantiated submodules, represented here by

T_{i} = {t_{k} | 1 \leq k \leq c}

, to represent all submodules called by module

i

. The specific definition of A comprises the following two parts: the first part defines the hypergraph of the called submodules, and the second part defines the interconnection of nets between modules,

t_{k} = (H_{j}, {(c a l l_{i}, c a l l e d_{i}) | 1 \leq i \leq m})

. Each binary tuple in

t_{k}

is used to represent a hyperedge between modules, indicating the connection relationship between the calling module and the called module. That is to say,

c a l l_{1}, c a l l_{2}, . . ., c a l l_{m} \in E_{i}

,

c a l l e d_{1}, c a l l e d_{2}, . . ., c a l l e d_{f} \in E_{j}

.

Figure 1 is utilized to provide a clearer depiction of the problem definition. If the entire diagram is modeled as a hypergraph,

e_{1}

,

e_{2}

, and

e_{3}

will be defined as the same hyperedge. However, in this context,

e_{1}

will belong to the edge set of MODULE 1,

e_{2}

will belong to the edge set of MODULE 2, and

e_{3}

represents the net connection between the two modules. Therefore, (

e_{1}

,

e_{2}

) can be used here to denote

e_{1}

,

e_{2}

, and

e_{3}

as a single hyperedge.

Using

x_{i}

and

y_{i}

to represent the bottom-left coordinates of module i, and given that the placement initialization has already been prioritized within each module, the coordinates of each device instance can be defined as the sum of its module i’s bottom-left coordinates and its relative coordinates

(x_{i} + x_{j}, y_{i} + y_{j})

, where

x_{j}

represents the relative horizontal coordinate with respect to the bottom-left corner of its module, and the same applies for the vertical coordinate. The module-based placement problem can be modeled as a constrained minimization problem, aiming to minimize the total wire length of all inter-module nets while ensuring the non-overlapping placement of modules.

3.2. Subsection

The incremental placement technique proposed in this paper shares similarities with conventional placement tools in terms of accepting the netlist after technology mapping as the input. The distinction lies in the initial design instance, where the complete netlist is segmented according to the hierarchical structure presented in the front-end output results. Subsequently, an inter-module placement algorithm (to be introduced in Section D of this chapter) is employed for the placement, generating a Placed DEF file. Naturally, this approach imposes requirements on the front-end design phase. To facilitate the use of the incremental placement technique and save iteration time, designers need to divide the complete netlist into several modules of a relatively uniform size from the outset and, whenever possible, modify the individual modules one at a time during the modification process.

In addition to relying on conventional logic synthesis and the incremental placement that divides the netlist produced by it, we also propose an incremental logic synthesis and placement technique. As illustrated in Figure 2, this technique positions the incremental placement as a downstream process of the incremental logic synthesis. Following synthesis completion [7], instead of flattening, we directly proceed with the intra-module placement of the synthesized modules, followed by inter-module placement. During each iteration, only the modified modules need to undergo individual synthesis, placement, and inter-module positioning. This enables the rapid evaluation of the PPA within a significantly reduced time frame. Consequently, designers can spend less time iterating over RTL code and can quickly assess the design outcomes using incremental synthesis and placement.

3.3. Definition and Usage of the Placeable Matrix Array

General placement techniques use density constraints to ensure the even distribution of devices, followed by legalization algorithms to prevent overlap. However, this approach is unnecessary for module placements because the number of modules is significantly smaller than the number of devices, and the structure is a tree rather than a hypergraph. Traditional methods would therefore waste time. To address this, we propose the placeable matrix array to satisfy non-overlapping placement constraints.

Pei-Ning Guo et al. [25] proposed that “An admissible placement is a compacted placement where all blocks can neither move down nor move left,” defining it as an acceptable legal placement. Our proposed placeable matrix array is based on this idea to minimize the placement area as much as possible. All rectangular blocks (each module is laid out within itself before the inter-module placement, so each module can be approximated as a rectangle with fixed dimensions after the internal placement) are compactly placed in the lower-left corner of the entire placement area. The placeable matrix array consists of a series of elements in the form of

(x, y), w, h

. Each

(x, y), w, h

element is called a placeable matrix, representing a rectangular region with the lower-left corner coordinates

(x, y)

, width

w

, and height

h

.

Based on the placement strategy using a placeable matrix array, it is stipulated that, within the entire placement space, the lower-left corner coordinates of all matrices to be placed must correspond to the x and y coordinates of a current placeable matrix. In other words, the lower-left corner coordinates must align with those of the selected placeable matrix, and the width and height of the selected placeable matrix must be greater than or equal to those of the matrix being placed. Figure 3 illustrates these stipulations in detail, where the light-gray-shaded blocks represent a placeable matrix and the dark-gray rectangular blocks represent the matrices to be placed. Among the four placements, only (a) meets the requirements.

By maintaining the placeable matrix array, the optimal position for a given module can be found in linear time (this process primarily involves identifying within all placeable matrices the one that minimizes the objective function, which will be elaborated on later). Maintenance of the placeable matrix array involves adjusting other placeable matrices besides the chosen one (aligned with the lower-left corner coordinates) whenever a new module is placed within the current placement region. This is because each placeable matrix represents a potential placement area, and, in practical scenarios, these matrices may overlap, necessitating modifications to all affected matrices after each placement.

The maintenance process is illustrated using the following simple example in Figure 4: in (a), the entire green-shaded area represents the current placement region, initially comprising only one placeable matrix, depicted by the green rectangular block. Upon placing a module, the only available placement position aligns the module’s lower-left corner with that of the green-shaded placeable matrix. After completing this placement, the placeable matrix array requires maintenance through segmentation (the specific segmentation possibilities will be detailed later; here, the focus is solely on outlining the complete maintenance process). Post-segmentation, the placeable matrix array contains two elements, as depicted in (b): a green placeable matrix and a red placeable matrix. Subsequent placements offer two available positions; for instance, if the next module is placed to overlap with the lower-left corner of the green rectangular block, it only affects the green placeable matrix. Therefore, segmentation is required only for the green placeable matrix. After segmentation, as shown in (c), the placeable matrix array contains three elements, with the original green placeable matrix segmented into blue and purple placeable matrices, while the original red placeable matrix remains unchanged. This design aims to maintain the left-lower compactness of placements by updating the placeable matrix array after each placement. The intention is for the placeable matrix to precisely represent all viable placement positions under the constraint of left-lower compactness, with the width and height of the placeable matrix primarily restricting the size of the available placement positions, ensuring that only the blocks that fit can be placed. However, during the segmentation process, some placeable matrices may deviate from this premise.

For instance, in Figure 4, the placement of module C affects three current placeable matrices, resulting in the original purple placeable matrix being segmented into three matrices in (d1), the original red placeable matrix being segmented into two in (d2), and the original blue placeable matrix being segmented into three in (d3). Notably, many of these segmented placeable matrices do not conform to the requirement of preserving the left-lower compactness, such as the third matrix in (d1), which, if chosen for placement, would disrupt the lower-left alignment. Fortunately, it can be observed that all such illegal placeable matrices are parts of valid placeable matrices under the current segmentation, as substantiated by the definition of left-lower compactness.

Therefore, the maintenance algorithm for the placeable matrix array consists of two steps: segmentation and merging. Segmentation is necessary because each placeable matrix represents a potential placement area, and if it is partially occupied by the current placement, subsequent placements cannot occupy that portion of the area. Thus, the placeable matrix needs to be segmented to retain only the portion where future modules can be placed. Figure 5 illustrates all possible occupancy scenarios and their corresponding segmentation strategies. On the other hand, merging occurs after each segmentation to incorporate the illegal placeable matrices generated by the segmentation process into their corresponding legal ones. For example, in Figure 4, (e) depicts the result of merging (d1), (d2), and (d3).

3.4. Finding Optimal Module Positions with Placeable Matrix Arrays

The hierarchical structure in Verilog netlists consists of a main module that calls multiple submodules. The hierarchical relationships between these modules can be abstracted into a tree structure, where all modules in the netlist can be represented as a multi-branch tree. Figure 6 demonstrates how a Verilog netlist can be abstracted into a tree structure through a simple example.

The optimal module placement algorithm aims to minimize the wire length while being constrained by the area. It traverses the multi-branch tree layer by layer to find the optimal position for each module. B* Tree [26] and O-Tree [25] are commonly used data structures for minimizing the placement area of rectangular modules, but they are not suitable for the problem described in this paper. Therefore, we propose a new algorithm based on the placeable matrix array, which can maximize the reduction in the wire length between modules while ensuring the total area is as small and as close to a rectangle as possible.

The algorithm takes a Verilog netlist as the input, segmenting it by modules and constructing a corresponding multi-branch tree structure during the file reading process. Each module undergoes internal placement analysis. Once the internal placement of each module is completed, each module can be viewed as a rectangle, with the relative x and y coordinates of each device instance within the module determined relative to the lower-left corner of the module.

The algorithm then traverses the module tree layer by layer. For example, in the example shown in Figure 6, the traversal would proceed through root, root\x1, root\x2, root\x3, root\x2\x1, root\x2\x2, root\x3\x1, root\x2\x1\x1. (In practical cases, Verilog code allows for complex and potentially redefined module names. For algorithmic efficiency, the full path of a module, from the root node to the current node, is used as the module index. This method is also used in this paper for ease of description.) Based on the traversal order, the optimal position for each node is found using the placeable matrix array.

When placing each rectangular module, the algorithm sequentially searches for available positions within the placeable matrix array. For instance, given a placeable position

(x, y), w, h

, it requires that the matrix to be placed must have a width less than

w

and a height less than

h

to be eligible for placement there. If a matrix is placed within this placeable matrix, its lower-left corner must be positioned at

(x, y)

, ensuring the compactness of the placement.

As for how to choose which placeable matrix to place a module in, all the eligible placeable matrices with the required dimensions are evaluated to find the position that minimizes the objective function. The current placement state is defined by the placeable matrix array PM. When PM contains

p m_n u m

elements, we have the following:

P M = {((x_{i}, y_{i}), w_{i}, h_{i}) ∣ 1 \leq i \leq p m_n u m}

(1)

The following defines the objective function described in this paper. For a given placeable matrix

((x, y), w, h)

, choosing to place the current module at this position means the lower-left corner coordinates of the current module will be

(x, y)

. Since the placement follows a level-order traversal, the parent module of the current module (i.e., the parent node in the tree structure) has already been placed, with its lower-left corner coordinates set to

(x_{r}, y_{r})

. According to the definition in the Problem Definition section, let the current module be

(H_{t}, T_{t})

, where

H_{t} = (V_{t}, E_{t})

, and let the parent module be

(H_{r}, T_{r})

, where

H_{r} = (V_{r}, E_{r})

. Then, we have the following:

t = (H_{t}, {(c a l l_{i}, c a l l e d_{i}) | 1 \leq i \leq n}), t \in T_{r}

(2)

{c a l l_{i} | 1 \leq i \leq n} \in E_{r} {c a l l e d_{i} | 1 \leq i \leq n} \in E_{t}

(3)

f (i) = (\sum_{m \in n} \sum_{i n s r \in c a l l s_{m}} \sum_{i n s t \in c a l l e d s_{m}} (\begin{array}{l} |(x_{r} + x_{i n s r}) - (x_{i} + x_{i n s t})| \\ + |(y_{r} + y_{i n s r}) - (y_{i} + y_{i n s t})| \end{array})) \times γ

(4)

In (4), both insr and inst denote device instances. When these appear as subscripts in

x

and

y

, they represent the relative horizontal and vertical distances of the device instance from the lower-left corner of its respective module. For example,

(x_{r} + x_{i n s r})

indicates the x-coordinate of a device belonging to the parent module, and

(x_{i} + x_{i n s t})

indicates the x-coordinate of a device belonging to the current module; the same logic applies to the y-coordinates. Thus, the meaning of this equation can be understood as follows: there exist multiple cross-module nets between

H_{t}

and

H_{r}

. A cross-module net connects some devices in

H_{t}

to some devices in

H_{r}

. For these devices, the sum of the Manhattan distances between devices that are not in the same module is termed the inter-module estimated wire length of this cross-module net. Here, (4) represents the sum of the inter-module estimated wire lengths for all cross-module nets between the current module and its parent module. Figure 7 provides a simple example to illustrate this definition.

Here, the area parameter is represented by

γ

. If placing the module at this position increases the area of the placement (by extending it horizontally or vertically), the total inter-module estimated wire length is multiplied by a scaling factor based on the extension value. This design encourages the overall placement to be more rectangular rather than L-shaped. Here, (5) states that the lower-left corner of the placeable matrix is the optimal placement position for the current module, as follows:

((x_{\arg \min f (i)}, y_{\arg \min f (i)}), w_{\arg \min f (i)}, h_{\arg \min f (i)})

(5)

4. Experimental Result

4.1. Experimental Setup

The experiment environment is Ubuntu22.04 with 16 GB of memory in the VMware. The experimental code was written in OCaml. While OCaml’s runtime efficiency does not match that of C, it offers better safety and is significantly more efficient than languages like Python. The test cases we selected were sourced from test cases in the OpenLane repository [6] or from actual projects, such as the processor chip designed in the “One Student One Chip” project [8]. The technique proposed in this paper requires designers to reasonably partition modules during the initial design phase. We selected test cases that adhered to this requirement, choosing designs with similar module sizes, functional module partitioning, and an appropriate number of wires between modules.

We propose an incremental EDA tool that encompasses the RTL to Placed DEF flow, taking RTL-level designs as input. The incremental placement experiment is divided into the following two steps: initial processing and subsequent iterations. During the initial design phase, the complete netlist is synthesized, partitioned into modules, and each module is placed. This process is referred to as Initial Incremental Placement (IP). In subsequent iterations, as illustrated in Figure 2, modifications are identified directly before synthesis, and only the modified modules undergo synthesis and placement. The corresponding module DEF files are then replaced, followed by inter-module placement (IMP), culminating in the final DEF output. The process of identifying modifications and synthesizing the modified modules is collectively referred to as Iterative Incremental Synthesis (ISS). The process of placing the modified modules, performing inter-module placement, and generating the DEF file is collectively referred to as Iterative Incremental Placement (ITP). The entire process is collectively referred to as Iterative Incremental Synthesis and Placement (ISP).

In the comparative experiment, Yosys is used for global synthesis and flattening, followed by OpenROAD for global placement, legalization, detailed placement, and DEF file generation. This process is referred to as General Complete Logical Synthesis and Placement (GCSP). This section tests GCSP using a set of RISC-CPU projects.

ISS builds upon the techniques from our previous work [7]. However, the definition of ISS differs from that in [7]. The new ISS incorporates additional optimization steps within the original synthesis process and includes technology mapping to achieve direct connection with the placement stage. Consequently, the time consumption is slightly higher than in the previous version, but it results in superior placement outcomes. Due to the suboptimal (the runtime is similar to directly synthesizing with Yosys [27]) performance of the ISS technique on smaller cases (the first and second rows in Table 1), we directly compared the placement results using Yosys for the synthesis in the testing process for the smaller cases. This involved comparing the layout results using the same synthesized netlist as the input.

The fundamental goal of our placement algorithm is to enable designers to redesign, synthesize, and place only the modified modules during iterative design changes, and then integrate them back into the complete layout. This allows for the rapid evaluation of the current design results. Table 1 shows that, compared to the time required for a complete logic synthesis and placement using GCSP, the advantage and practical significance of ISP in small examples are not substantial. However, in larger examples, the time spent on inter-module placement with ISP is significantly shorter than with GCSP, and the performance improves as the example size increases. Although using the HPWL as a metric, this placement method is slightly inferior to Yosys+RePlAce (OpenROAD) in some cases, and it is sufficiently effective for evaluating intermediate iterative versions. After all, the most crucial aspect during iterations is quickly assessing the current design and providing direction for the next modifications.

4.2. Discussion

We consider it reasonable and acceptable for the HPWL to be slightly behind industry-leading tools. The time savings are undeniable, allowing designers to focus more on RTL-level design modifications and reducing the time pressure in the chip design process. For the current version, we recommend that designers use our technology for quick evaluations of wire length and area during intermediate version iterations, and then use appropriate tools for the complete flow after the final design completion. This approach saves time while achieving excellent layout results. Based on the concept of DTCO, for the practical application of the incremental placement technology, designers should perform the logical partitioning of modules early in the design phase. For example, the number of modules should be kept within a reasonable range, and efforts should be made to ensure similar sizes for each module. This helps avoid situations where one module is disproportionately larger than others or where a large design contains only a few modules.

5. Conclusions

Currently, most open-source RTL to GDSII flow EDA tools loosely link the tools for each step together without achieving complete data transfer and utilization. However, fully leveraging all stored information within the database in integrated design tools is essential to harnessing their full potential, thereby enhancing the efficiency and quality across all stages of the design process. This paper demonstrates through placement stage experiments that directly accessing front-end information can save time in subsequent algorithm design stages. As demonstrated in our experiments, the quality of incremental placement results may not yet rival the industry’s best placers. However, as an interim evaluation metric, it is sufficient. In the future, we will also explore better clustering algorithms to achieve superior placement results.

Incremental development methods are already well-established in the software field. Promoting such methods in the open-source EDA domain could undoubtedly shorten chip development cycles and alleviate the current time pressures in chip design.

Currently, we are continuing our research in two directions. First, we plan to optimize the intra-module placement algorithm. By guiding devices with related I/Os to be placed at fixed positions around the module periphery based on inter-module placement results, we aim to further reduce the wire length and strive to approach the layout results of advanced industry tools. Secondly, in our future work, we plan to integrate various stages of chip design, aiming to develop an open-source incremental EDA toolchain from RTL to GDSII.

Author Contributions

Methodology, Z.Z. and G.C.; software, Z.Z.; validation and experiment, Z.Z.; writing, Z.Z.; writing—review and editing, G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in openLane at https://github.com/The-OpenROAD-Project/OpenLane (accessed on 17 June 2024). Part of the data in this paper comes from the RTL code designed by Zhengpu Shi et al. in the ysys project, which can be directly consulted by the corresponding author.

Acknowledgments

We would like to express our gratitude to Xiangli Chen and Zhengpu Shi for the technical guidance and experimental assistance provided.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lu, J.; Chen, P.; Chang, C.C.; Sha, L.; Huang, D.J.H.; Teng, C.C.; Cheng, C.K. ePlace: Electrostatics-based placement using fast Fourier transform and Nesterov’s method. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2015, 20, 1–34. [Google Scholar] [CrossRef]
Maidee, P.; Ababei, C.; Bazargan, K. Fast timing-driven partitioning-based placement for island style fpgas. In Proceedings of the 40th Annual Design Automation Conference, Anaheim, CA, USA, 2–6 June 2003; pp. 598–603. [Google Scholar]
Fogaça, M.; Kahng, A.B.; Reis, R.; Wang, L. Finding placement-relevant clusters with fast modularity-based clustering. In Proceedings of the 24th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 21–24 January 2019; pp. 569–576. [Google Scholar]
Lu, Y.C.; Yang, T.; Lim, S.K.; Ren, H. Placement optimization via PPA-directed graph clustering. In Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD, Snowbird, UT, USA, 12–13 September 2022. [Google Scholar]
Li, X.; Huang, Z.; Tao, S.; Huang, Z.; Zhuang, C.; Wang, H.; Bao, Y. iEDA: An Open-source infrastructure of EDA. In Proceedings of the 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 77–82. [Google Scholar]
Ghazy, A.; Shalan, M. Openlane: The open-source digital ASIC implementation flow. In Proceedings of the Workshop on Open-Source EDA Technology, (WOSET). 2020. Available online: https://woset-workshop.github.io/WOSET2020.html#article-21 (accessed on 17 June 2024).
Chen, X.; Chen, G. A topology-flattening-based automated incremental synthesis method. In Proceedings of the International Symposium of Electronics Design Automation (ISEDA), Xi’an, China, 10–13 May 2024. [Google Scholar]
YSYX. Available online: https://ysyx.oscc.cc/ (accessed on 1 April 2024).
Su, L.; Buntine, W.; Newton, A.R.; Peters, B.S. Learning as applied to stochastic optimization for standard-cell placement. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2001, 20, 516–527. [Google Scholar]
Yoshikawa, M.; Terai, H. A novel performance-driven placement based on hybrid genetic algorithm. IEEE Int. Conf. Mechatron. Autom. 2005, 3, 1203–1208. [Google Scholar]
Roy, J.A.; Adya, S.N.; Papa, D.A.; Markov, I.L. Min-cut floorplacement. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2006, 25, 1313–1326. [Google Scholar] [CrossRef]
Taghavi, T.; Yang, X.; Choi, B.K. Dragon2005: Large-scale mixed-size placement tool. In Proceedings of the International Symposium on Physical Design, San Francisco, CA, USA, 3–6 April 2005. [Google Scholar]
Agnihotri, A.R.; Ono, S.; Li, C.; Yildiz, M.C.; Khatkhate, A.; Koh, C.K.; Madden, P.H. Mixed block placement via fractional cut recursive bisection. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2005, 24, 377–390. [Google Scholar] [CrossRef]
Karypis, G.; Kumar, V. Multilevel K-Way Hypergraph Partitioning. In Proceedings of the Design Automation Conference (DAC), New Orleans, LA, USA, 21–25 June 1999; pp. 343–348. [Google Scholar]
Alpert, C.J.; Kahng, A.B.; Nam, G.-J.; Reda, S.; Villarrubia, P. A Semi-Persistent Clustering Technique for VLSI Circuit Placement. In Proceedings of the International Symposium on Physical Design (ISPD), San Francisco, CA, USA, 3–6 April 2005; pp. 200–207. [Google Scholar]
Bustany, I.; Kahng, A.B.; Koutis, Y.; Pramanik, B.; Wang, Z. SpecPart: A Supervised Spectral Framework for Hyper-graph Partitioning Solution Improvement. In Proceedings of the International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 29 October–3 November 2022; pp. 1–9. [Google Scholar]
Mirhoseini, A.; Goldie, A.; Yazgan, M.; Jiang, J.W.; Songhori, E.; Wang, S.; Lee, Y.J.; Johnson, E.; Pathak, O.; Nazi, A.; et al. A graph placement methodology for fast chip design. Nature 2021, 594, 207–212. [Google Scholar] [CrossRef] [PubMed]
Chen, T.C.; Hsu, T.C.; Jiang, Z.W.; Chang, Y.W. NTUplace: A ratio partitioning based placement algorithm for large-scale mixed-size designs. In Proceedings of the 2005 International Symposium on Physical Design, San Francisco, CA, USA, 3–6 April 2005; pp. 236–238. [Google Scholar]
Lu, J.; Chen, P.; Chang, C.C.; Sha, L.; Huang, D.J.; Teng, C.C.; Cheng, C.K. FFTPL: An Analytic Placement Algorithm Using Fast Fourier Transform for Density Equalization. In Proceedings of the Asia and South Pacific Design Automation Conference (ASPDAC), Shenzhen, China, 28–31 October 2013. [Google Scholar]
Cheng, C.K.; Kahng, A.B.; Kang, I.; Wang, L. Replace: Advancing solution quality and routability validation in global placement. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 38, 1717–1730. [Google Scholar] [CrossRef]
Huang, T.W. Machine learning system-enabled GPU acceleration for EDA. In Proceedings of the 2021 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, 19–22 April 2021; p. 1. [Google Scholar]
Lin, Y.; Dhar, S.; Li, W.; Ren, H.; Khailany, B.; Pan, D.Z. Dreamplace: Deep learning toolkit-enabled GPU acceleration for modern VLSI placement. In Proceedings of the 56th Annual Design Automation Conference, Las Vegas, NV, USA, 2–6 June 2019; pp. 1–6. [Google Scholar]
Yeric, G.; Cline, B.; Sinha, S.; Pietromonaco, D.; Chandra, V.; Aitken, R. The past, present and future of design-technology co-optimization. In Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, San Jose, CA, USA, 22–25 September 2013; pp. 1–8. [Google Scholar]
Kahng, A.B.; Spyrou, T. The OpenROAD project: Unleashing hardware innovation. In Proceedings of the Government Microcircuit Applications and Critical Technology, Virtual, 29 March–1 April 2021; pp. 1–6. [Google Scholar]
Guo, P.N.; Cheng, C.K.; Yoshimura, T. An O-tree representation of non-slicing floorplan and its applications. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference, New Orleans, LA, USA, 21–25 June 1999; pp. 268–273. [Google Scholar]
Chen, T.C.; Chang, Y.W. Modern floorplanning based on B*-tree and fast simulated annealing. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2006, 25, 637–650. [Google Scholar] [CrossRef]
Wolf, C.; Glaser, J.; Kepler, J. Yosys—A free Verilog synthesis suite. In Proceedings of the 21st Austrian Workshop on Microelectronics (Austrochip), Linz, Austria, 10 October 2013; Volume 97. [Google Scholar]

Figure 1. An example of an explicit problem definition for inter-module calls, where module 2 is a module that is called by module 1.

Figure 2. Incremental placement process.

Figure 3. Here, (a) is legal and (b) is illegal. The width of the placed block exceeds the width of the current placeable matrix. (c) is illegal. The lower-left corner coordinates of the placed block do not align with the lower-left corner coordinates of the current placeable matrix. (d) is illegal. The alignment must be with the lower-left corner, not any other corner.

Figure 4. Maintain instances of the placeable matrix array. (a) is the initial available layout area. (b) is the layout space after placing the first module A, where the available layout area is represented by two rectangles, green and red, which overlap with each other. (c) is the layout space after placing the second module B, where the green available layout area in (b) is no longer fully available due to the influence of B, and is divided into two smaller available layout areas, purple and blue. The original red available area is not affected. (d1–d3) collectively show the affected and divided situation of the blue, purple, and red available layout areas in (c) after placing the C module. (e) shows the overall available layout area after placing the C module.

Figure 5. The green square represents a specific placeable matrix within the placeable matrix array. When a rectangular module is placed at the lower-left corner of another placeable matrix, it may affect the current placeable matrix in one of sixteen possible ways. This diagram illustrates how the placeable matrix will be segmented for each type of impact.

Figure 6. A simple example demonstrating how Verilog netlists with a hierarchical structure can be abstracted into a tree-like structure. The Verilog code presented here may not be strictly correct, but it suffices to illustrate the conceptual logic at play.

Figure 7. In (a), there is an illustration of an example of a cross-module net. In (a), the left diagram represents the parent module, while the right diagram represents the submodule. The tricolor lines together form a net, with each color indicating its respective segment within the parent module, the submodule, and between the modules. In (b), the gray lines connect groups of device instances that are not within the same module. The red lines indicate the Manhattan distance of these connections. Consequently, the total length of all the red lines constitutes the inter-module estimated wire length of this net.

Table 1. Experimental results presentation.

	Number of Components (After the Technology Mapping)	#Time (ISS [7])	#Time (IMP)	#Time (ITP)	#Time (ISP)	#Time (yosys [27])	#Time (Openroad [24])	#Time (GCSP)	$\frac{# T i m e (I T P)}{# T i m e (o p e n r o a d)}$	$\frac{# T i m e (I S P)}{# T i m e (G C S P)}$	hpwl Use GCSP (u)	hpwl Use ISP/ITP (u)
1	984	/	0.092 s	0.164 s	/	/	0.590 s	/	16.949%	/	16,548.1	15,368.2
2	12,444	/	0.272 s	2.186 s	/	/	5.126 s	/	42.645%	/	370,964.0	405,214.5
3	219,367	17.931 s	9.363 s	12.110 s	33.041 s	29.008 s	1 m 57.400 s	2 m 26.408 s	10.320%	22.567%	5,999,533.9	7,138,129.1
4	263,381	19.289 s	11.721 s	15.795 s	35.084 s	32.812 s	2 m 11.594 s	2 m 44.406 s	9.607%	21.339%	6,872,721.6	7,929,372.2
5	368,739	20.635 s	20.234 s	24.528 s	45.163 s	31.198 s	5 m 23.425 s	5 m 54.623 s	6.916%	12.735%	9,600,852.9	12,318,737.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Chen, G. Incremental Placement Technology Based on Front-End Design. Electronics 2024, 13, 2745. https://doi.org/10.3390/electronics13142745

AMA Style

Zhang Z, Chen G. Incremental Placement Technology Based on Front-End Design. Electronics. 2024; 13(14):2745. https://doi.org/10.3390/electronics13142745

Chicago/Turabian Style

Zhang, Zihang, and Gang Chen. 2024. "Incremental Placement Technology Based on Front-End Design" Electronics 13, no. 14: 2745. https://doi.org/10.3390/electronics13142745

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Incremental Placement Technology Based on Front-End Design

Abstract

1. Introduction

2. Previous Work and Our Contributions

3. Proposed Framework

3.1. Problem Definition

3.2. Subsection

3.3. Definition and Usage of the Placeable Matrix Array

3.4. Finding Optimal Module Positions with Placeable Matrix Arrays

4. Experimental Result

4.1. Experimental Setup

4.2. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI