AiMap+: Guiding Technology Mapping for ASICs via Learning Delay Prediction

Liu, Junfeng; Zhao, Qinghua

doi:10.3390/electronics13183614

Open AccessArticle

AiMap⁺: Guiding Technology Mapping for ASICs via Learning Delay Prediction^†

by

Junfeng Liu

¹

and

Qinghua Zhao

^1,2,*

¹

State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing 100191, China

²

School of Artificial Intelligence and Big Data, Hefei University, Jinxiu Avenue 99, Hefei 230092, China

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in 2023 IEEE International Conference on Computer Design, Washington, DC, USA, 6–8 November 2023, under the title “AiMap: Learning to Improve Technology Mapping for ASICs via Delay Prediction”.

Electronics 2024, 13(18), 3614; https://doi.org/10.3390/electronics13183614

Submission received: 9 August 2024 / Revised: 3 September 2024 / Accepted: 10 September 2024 / Published: 11 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Technology mapping is an essential process in the Electronic Design Automation (EDA) flow which aims to find an optimal implementation of a logic network from a technology library. In application-specific integrated circuit (ASIC) designs, the non-linear delay behaviors of cells in the library essentially guide the search direction of technology mappers. Existing methods for cell delay estimation, however, rely on approximate simplifications that significantly compromise accuracy, thereby limiting the achievement of better Quality-of-Result (QoR). To address this challenge, we propose formulating cell delay estimation as a regression learning task by incorporating multiple perspective features, such as the structure of logic networks and non-linear cell delays, to guide the mapper search. We design a learning model that incorporates a customized attention mechanism to be aware of the pin delay and jointly learns the hierarchy between the logic network and library through a Neural Tensor Network, with the help of proposed parameterizable strategies to generate learning labels. Experimental results show that (i) our proposed method noticeably improves area by 9.3% and delay by 1.5%, and (ii) improves area by 12.0% for delay-oriented mapping, compared with the well-known mapper.

Keywords:

electronic design automation; technology mapping; ASIC design; timing analysis; machine learning

1. Introduction

The explosive growth of machine learning has significantly propelled its critical role across various application domains, notably accelerating the development of Electronic Design Automation (EDA) [1,2,3]. Leveraging machine learning techniques to aid the EDA flow, including classification, prediction, and design space exploration, has emerged as a contemporary research trend within both the academic and industrial communities [4,5,6,7,8,9,10,11].

Technology mapping is an essential process in the EDA flow for digital systems, e.g., application-specific integrated circuits (ASICs) [12,13], and field-programmable gate arrays (FPGAs). It transforms a technology-independent logical network into a circuit implemented by primitive gates chosen from a technology library, with its objective to achieve better performance, power, and area (PPA) [14].

The approaches for technology mapping in both FPGAs and ASICs have witnessed a shift from tree-based mapping [15] to cut-based mapping [4,6,14,16,17,18,19,20,21,22,23,24], as cut-based methods enable better exploration of different logic structures and optimization with the scalability to large-scale designs. Due to the enumeration of cuts for each node, they mainly focus on ranking, filtering, and merging the cut candidates [16,17,20] and designing heuristic area estimation to facilitate area optimization [6,21].

One important distinction between ASIC and FPGA technology mapping lies in the fact that the former maps the logic network to a predefined set of standard cells, while the latter maps it to lookup-tables [4,22,25]. Thereby, by the considerations of standard cell properties, works for ASIC technology mapping mainly focus on the two main categories: (1) supergate generation to mitigate the structural bias [26,27,28], and (2) accurate delay estimation of the cells (supergates) as their delays are unknown before mapping [14,29,30]. For the first category, the studies of [26,27,28] build the supergate library by combining gates from the standard-cell library which enables matching with different logic graph structures. For the second category, the studies of [14,30] present load-independent timing models that employ simple gain-based delay models to approximate the delay by assuming all gates with the same input slew and output load. The work proposed by [29] introduces a load-dependent timing model that approximates the loads by utilizing fanout estimation to account for the impacts of loads on supergate delay, during the mapping process.

However, estimating the delay of the supergate has a significant impact on the Quality-of-Result (QoR) of the mapped network, since all existing cut-based methods were performed following the dynamic programming (DP) paradigm, where the delay of the supergate w.r.t. the cut, directly guide the search direction of DP. Unfortunately, all existing approaches to estimating the supergate delay involve too many approximate simplifications, which are far from the accurate delay. They either assume that all supergates have the same input slew and output load for delay estimation [14,20,25] or approximate the delay merely considering its fanout [29]. Take the well-known ASIC-technology mapper ABC [14] as an example, due to the accumulation of inaccurate supergate delay estimation, the estimated circuit delays on six circuits from the EPFL benchmark [31] exhibit an average discrepancy exceeding 140%, as shown in Table 1. Thereby, this limits the existing mapping algorithm to achieve better QoR.

In this work, we attempt to formulate the estimation delay of the supergate as a regression learning task. The node–cut–supergate tuples have already partially revealed the graph structure of the mapped network, encompassing elements such as node and cut fanouts, as well as the structure of cuts. Furthermore, these tuples capture the non-linear delay behavior of supergates, which is instrumental in deriving the input slew and output load, thereby revealing the delay characteristics of the supergates. However, it is not trivial when learning to predict the supergate delay, as the following two issues need to be dealt with.

First, how to generate supergate delays under the better circuit delay as the learning labels, since the training labels provide the upper bound for model performance and generalization capability. Second, how to build the learning model to be aware of the varying contributions of different pins to supergate delay estimation and the hierarchy of node–cut–supergate tuples.

To this end, we present a novel data-driven framework AiMap⁺ for ASIC mapping by leveraging the learned supergate delay to guide the supergate selection of technology mapping. The main contributions are listed as follows.

Rather than employing a heuristic strategy to solely model the fanout feature for supergate delay estimation, we first convert it as a regression learning task. This approach involves integrating both the load-independent gate delay and the learned supergate delay to jointly guide the mapper’s search process.
To facilitate the capability of the learning model, we further propose three parameterizable strategies from the perspectives of load-independent, load-dependent delay estimation, and the cut sample, to generate supergate delays with better circuit delay as learning labels.
Benefiting from the learned labels, we design a learning model to predict the supergate delay. It perceives the pin delay through a customized attention mechanism and jointly learns the cut node and cut supergate features through a Neural Tensor Network.
Experimental results from a wide range of benchmarks show that (i) AiMap⁺ noticeably improves area by 9.3% and delay by 1.5%, and improves the area by 12.3% with a 9.7% delay penalty compared with ABC [14] and SLAP [20], respectively, AiMap⁺ also improves delay by an average of 2.6% without any area penalty compared with AiMap [4]; (ii) in terms of delay-oriented mapping, AiMap⁺ considerably better than ABC, with an area improvement of 12%; (iii) our training strategies for label generation are able to achieve delay improvements of 26.6%, on average.

The remainder of this paper is organized as follows: Section 2 presents the necessary background and the problem definition. Section 3 discusses the proposed strategies to generate learning labels to facilitate the training process. Section 4 presents the proposed learning-based framework AiMap⁺ and its details. Section 5 presents the results achieved by our method, followed by the conclusion in Section 6.

2. Preliminary and Related Works

In this section, we illustrate the preliminaries from Boolean networks, technology mapping for ASICs, and the timing analysis, followed by the problem definition.

2.1. Boolean Networks

A Boolean network is a directed acyclic graph (DAG)

G (V, E)

, where each node

v \in V

corresponds to a Boolean function and the edge

(u, v) \in E

corresponds to the wire connecting the nodes u and v. The incoming/outgoing edge of node v is referred to as its fanin/fanout. Some of V are identified as primary inputs (PIs) if they have zero fanin, and similarly, the nodes are identified as primary outputs (POs) if they have zero fanout. Typically, we refer to a Boolean network as an And-Inverter Graph (AIG) when the Boolean function of the DAG is implemented using AND gates, with the edges denoting the inverted or non-inverted signals.

k-feasible Cut. A feasible cut C of the node $v \in V$ of the AIG is a set of nodes in its transitive fanin such that every path from PIs to v passes through at least one node in C. Each node $u \in C$ is called the leaf of the cut c, and a cut is considered k-feasible if it has a maximum of k leaves. A trivial cut of node v consists solely of v itself. Thus, for a node n with two fanins u, v, the set of k-feasible cuts $Φ (n)$ can be computed by combining the set of cuts of u and v [32]:

$Φ (n) = {{n}} \cup {x \cup y | x \in Φ (u), y \in Φ (v), | x \cup y | \leq k}$

(1)

That is, the k-feasible cuts of $\forall v \in V$ can be iteratively computed by traversing from PIs to POs. Besides, Figure 1a shows an AIG example and the set ${{5}, {3, 4},$ ${3, c, d}, {1, 2, 4},$ ${4, a, b}}$ is the 3-feasible Cut of node 5.

2.2. Technology Mapping for ASICs

Technology mapping transforms a technology-independent Boolean network, e.g., AIG into a network comprised of primitive gates selected from a technology library. For ASIC mapping, the library composes a set of standard cells (predefined and reusable logic gates: e.g., NAND, INV, OR, and AOI) and their electrical and physical properties (e.g., functionality, pins, timing information, and power consumption).

The goal of ASIC mapping could include minimizing the area of the mapped netlist, minimizing delay, or minimizing the area subject to delay constraints [33,34]. The detail of the timing (delay) analysis is explained in Section 2.3. Note, that prior to mapping, the delay of the gates cannot be determined due to unknown input slew and output capacitance. However, during the technology mapping process, estimating gate delays is essential to guide gate selection [14].

Supergate Generation. Before the mapping method is invoked, supergates are typically generated by combining several standard cells into a single-out gate [14,27]. Specifically, the supergate is proposed to mitigate the structural bias problem which is the composition of the standard cells and contains many gates with diverse functionality [26]. Thus, after supergate generation, there are many supergates with the same functionality (usually more than 10), and the available supergates with the same functionality are sorted by their maximum pin-to-pin delay. These supergates are utilized to match the cut with different graph structures during mapping. Note, that the term supergate delay encompasses cell delay, as the set of standard cells is a subset of generated supergates.
Mapping Flow in ABC. We next review the four main steps in a well-known technology mapper ABC for ASICs [14] which follows a dynamic programming paradigm based on the pre-computed k-feasible cuts and supergates. (1) ABC first computes all the k-feasible cuts for each node with $k = 5$ . (2) For each cut, the truth table is computed to check whether it can be implemented by the pre-computed supergates using Boolean matching. (3) It further updates the best arrival time for each node from PIs to POs, where the time of each cut implemented by the supergate is estimated based on the standard cell library (i.e., using estimated gate delay without knowing actual arrival times and output loads). (4) Finally, the best cover is chosen from POs to PIs. Subsequently, area recovery is performed on the non-critical path using area flow and exact area heuristic. Figure 1b depicts a mapped circuit of the AIG with Boolean function f.
Summary of related work. After introducing the preliminaries for ASIC mapping, we next review the related work in ASIC technology mapping, as shown in Table 2. From Table 2, we have summarized and compared various methods in terms of gate delay estimation, method keyword, optimization objective, and main contributions.

The first algorithmic approach to technology mapping is DAGON [15], which partitions the circuit (DAG) into a forest of trees and uses tree pattern matching to match individual trees. Subsequent methods have proposed more precise load estimation techniques aimed at minimizing delays, e.g., Touati [35] and Huang [29]. In recent years, the focus of ASIC technology mapping has increasingly shifted towards the application of decision intelligence, e.g., SLAP [20], MapTune [23] and AiMap [4]. This aims to reduce manual labor and facilitate the design convergence process within modern tool flows.

Table 2. Comparison of existing ASIC mappers.

Paper	Gate Delay Estimation	Method Keyword	Optimization Objective	Main Contribution
Rudell [36]	LD-Delay	dynamic programming	delay minimization	first solve it in linear time for tree mapping
Touati [35]	LD-Delay	load estimation; dynamic programming	delay minimization	piecewise linear functions for considering any load values
Huang [29]	LD-Delay	adaptive load estimation	delay minimization	more accurate load estimation using piecewise linear funciton
DAGON [15]	LI-Delay	tree covering problem; dynamic programming	delay minimization	partition into a forest of trees; using a tree pattern matching to match individual trees
ABC [14]	LI-Delay	iterative gate selection; priority cuts	delay minimization	using delay/area-oriented flow to address structural bias problem on AIG
SLAP [20]	LI-Delay	supervised learning; priority cut classification	QoR Improvement	use CNN to learn to classify priority cut
MapTune [23]	LI-Delay	reinforcement learning	QoR Improvement	use reinforcement learning to guide mapping library tuning
E-Syn [24]	LI-Delay	equivalence graph; regression for cost model	QoR Improvement	use E-Graph to rewrite AIG with technology-aware cost
AiMap [4]	LI-Delay	regression learning; gate delay prediction	QoR Improvement	learn to improve mapping via delay prediction
AiMap⁺ (Ours)	LI-Delay	regression learning; gate delay prediction	QoR Improvement	guide gate selection via learning delay prediction

Note: LI-Delay and LD-Delay refer to the load-independent and load-dependent delay estimation, respectively.

2.3. Timing Analysis

After the mapping, the timing of the mapped network is performed to analyze and verify the timing characteristics of the designed circuit. This analysis involves the consideration of various models that capture the behavior of different circuit elements, e.g., (i) non-linear delay model, NLDM, arising due to factors such as input slew, output loads, capacitances, and voltage variations and (ii) wire load model, WLM, modeling the impact of resistance, capacitance, and inductance effects on signal propagation delay in interconnections.

Note, that during the DP of the technology mapping, the delay of the cells (supergates) is usually computed by load-independent or load-dependent linear timing models [14,29,30] to approximate the non-linear behavior as the actual input slew and output load of the gate are both unknown. A load-independent model for the gate delay estimation

d (g)

in ABC is defined as follows [14]:

d (g) = LD \times Ga + P

(2)

where P is the load-independent parasitic delay w.r.t. the gate, LD is the induced delay per unit load, and Ga is a pre-defined gain factor for load. The estimated delay has a significant impact on the QoR of the mapped circuit, as the search direction of the dynamic programming for the mapper process is guided by the Equation (2), which involves selecting suitable supergates, i.e., the gate with the minimum estimated delay.

2.4. Problem Formulation

Problem:

Given a Boolean network, its k-feasible cuts for each node, and the supergates matched for each cut, our goal is to make an accurate estimation delay of supergates, w.r.t the cut to guide the search of the mapper to achieve the better QoR of the mapped circuit.

The main notations are summarized in Table 3.

3. Learning Label Generation

In order to generate supergate delays under the better circuit delay as learning labels to facilitate the capability of learning models, we propose three parameterizable strategies from the perspectives of load-independent, load-dependent supergate delay estimation, and cut samples.

Strategy 1: Load-Independent Supergate Delay Estimation. We extend the load-indepen- dent supergate delay estimation by considering the impact of input slews. As illustrated in Equation (2), ABC provides a load-independent linear estimation of the non-linear delay behavior by providing a parameter Ga to measure the gain of pre-unit load. However, Equation (2) implies an assumption that the input slew w.r.t. the gate is already known which, therefore, fails to model the non-linear behavior of the input slew.

Hence, we model a linear delay estimation

d_{s} (g)

when input slew changes, similar to

d (g)

:

\begin{matrix} d_{s} (g) & = {LD}_{s} \times {Ga}_{s} + P_{s} \\ \bar{d} (g) & = α \cdot d (g) + (1 - α) \cdot d_{s} (g) \end{matrix}

(3)

where

P_{s}

is the load-independent parasitic delay from the view of slew,

{LD}_{s}

is the induced delay per unit slew, and

{Ga}_{s}

is a pre-defined gain factor for slew. Further, we assign weights

α

and

1 - α

to

d (g)

and

d_{s} (g)

, respectively, in order to accurately access the delay sensitivity of various supergates towards slews and loads. Finally, our load-independent supergate delay estimation

\bar{d} (g)

is obtained.

Strategy 2: Load-dependent Supergate Delay Estimation. In order to enhance the accuracy of delay estimation by incorporating the impact of load, we propose a parameterizable strategy that takes into account the number of supergate fanouts w.r.t. the cut.

Intuitively, the load of a supergate is expected to be positively correlated with the number of its fanouts. However, during the dynamic programming from PIs to POs, we are unable to calculate the fanout of the current matching supergate in advance. We approximate the fanout of the cut’s root node as the fanout of the supergate since a larger fanout of the cut root node usually implies a higher fanout of the supergate. Thus, the delay

\bar{d} (g, l)

of supergate g w.r.t. the cut under output capacitive load l is defined as follows:

\bar{d} (g, l) = \bar{d} (g) \cdot (β \cdot γ \cdot \log RF + (1 - β) \cdot τ \cdot \log LF)

(4)

where

\bar{d} (g)

is our load-independent delay estimation, and RF and LF are the fanout of the root nodes and leaf nodes of the cut, respectively. The parameters

γ

and

τ

scales

\log RF

and

\log LF

to the same scale, and

β

, weigh the influence of the root nodes and leaf nodes on the output load. That is, the term

(1 - β) \cdot τ \cdot \log LF

in Equation (4) can be interpreted as a penalty term associated with the area. Since the mapped network often duplicates leaf nodes when their quantity is more than 1, this duplication contributes to an increase in the overall area of the circuit. Hence, our load-dependent delay estimation formulated also incorporates the consideration of area impact. That is, the formulation of our load-dependent delay estimation takes into account the balance between the area and delay of the mapped circuit.

Benefiting from the load-dependent delay estimation, assuming the arrival time of all cut leaves is already given, we can update the arrival time

Arr (v)

of the node v:

Arr (v) = max_{\forall g \in CutLeaves (v)} \{Arr (g) + \bar{d} (g, l)\}

(5)

Strategy 3: Cut Sample. Due to the sophisticated estimation of supergate delay, increasing the number of cuts in a node cannot directly lead to improvements in the area or delay optimization, and often results in settling for a suboptimal solution [20]. This insight inspires us can enhance the sensitivity of the quality of results to the cut size by filtering the cuts of each node. Specifically, we can randomly sample a relatively smaller number of cuts (with parameter r) as priority cuts for each node in this strategy. In practice, for the purpose of reproducibility of the best QoR, we indeed use five ranking criteria (with parameter t) to sort the candidate cuts to replace the random shuffle manner.

$Filtered Cuts = sort (select (Cuts (v), r), t),$

(6)

Finally, we combine three strategies to generate supergate delays as learning labels under the best circuit delay. Specifically, predefined sets of parameters

Ga, α, β, γ, τ, r, t

are established for three strategies. Then, an iterative process is employed, by performing a grid search on the parameters for each strategy. During the process, the mapping for delay-oriented (“map -r” in ABC) and static timing analysis (“stime” in ABC) is performed iteratively. Upon completion of the grid search, we obtain the optimal circuit delay of the mapped netlist. Consequently, the delay of supergates corresponding to the optimal circuit delay can be determined by traversing the mapped netlist in topological order using Equation (5). That is, the delays w.r.t. node–cut–supergate are obtained.

Note, that our parameterization strategies can be utilized not only to generate the mapping netlist with better delay but also to produce better mapping for the area or Area-Delay-Product (ADP), as the consideration of area has already been incorporated, especially in Equation (4). Thus, one can easily retain the best parameters w.r.t. area or ADP, during the iterative parameter search process.

4. Approach–AiMap⁺

We first overview our proposed AiMap⁺, then describe the details of the feature embedder and our learning model.

4.1. Framework Overview

The significant impact of supergate delay estimation on the Quality of Results of the mapping network arouses the demand for predicting supergate delay. As previously stated, the hierarchical, graph structural, and physical features of node–cut–supergate tuples all sophisticatedly affect the delay estimation. Consequently, existing load-independent and load-dependent delay (only heuristically including fanout) estimations all limit the mapping QoR. Therefore, we present a regression learning-based framework AiMap⁺ to predict the supergate delay w.r.t. the cut by deeply fusing these fine-grained features, which guides the search direction of the mapper, as shown in Figure 2.

Specifically, (1) we first generate training and testing node–cut–supergate tuples based on ABC, which include fine-grained features related to the delay. The delay labels for these node–cut–supergate tuples are determined under the conditions of an improved circuit delay, as explained in Section 3. (2) We then construct a learning model by extracting hierarchical features of tuples from two perspectives: cut–node and cut–supergate. (i) In the cut–node view, we generate the representation based on the customized attention mechanism to be aware of the pin delay. (ii) In the cut–supergate view, we jointly learn the physical features related to delay in the supergate and the structural features of the cut based on a carefully designed module Neural Tensor Network. (iii) We finally predict the supergate delay w.r.t. the cut by fusing the representations of cut–node and cut–supergate. (3) By updating the delay of the supergate w.r.t. the cut using the prediction and the load-independent supergate delay, we thus guide the search of the ASIC mapper.

Note, that each cut of the node in AIG can indeed match with multiple supergates. However, supergates of the cut are sorted in descending order based on its pre-computed delay, which is already a relatively good cut implementation w.r.t. the delay. In practice, we restrict the number of supergate of one cut to 1 and select the supergate with the minimum delay as the candidate, which can reduce the computation cost for training and reference. Therefore, for each cut in AiMap⁺, it predicts only one delay value for a specific supergate. That is, in node–cut–supergate tuples, a node corresponds to multiple cuts, and a cut corresponds to one supergate.

The details of AiMap⁺ are introduced as follows.

4.2. Feature Embedder

Given an AIG structurally represented as a DAG, the candidate cuts for each node of AIG, and the candidate supergates matched with each cut, we consider how to explore and encode the fine-grained features of the node–cut–supergate tuples that contribute to predicting the supergates delay w.r.t. the cut including node/cut fanouts, cut structure and non-linear delay behavior of supergates.

Node Embedding. The node feature extraction in AiMap⁺ mostly follows SLAP [20] that captures the node of AIG structural information from the node itself and from its two children. From the considerations of model efficiency, we opted not to employ graph representation techniques, e.g., Graph Convolutional Networks [37,38], and instead concentrated on the inherent features related to the supergate delay of the AIG itself. Specifically, the node u itself contains four features, i.e., level of u, fanout of u, whether an outgoing edge of u has an inverter or not, and the relative level of u. The features of two children $u_{1}$ and $u_{2}$ only include the first three features of u. The node features are summarized in Table 4.

As mentioned, fanout and level of the node are relatively important as they potentially reveal the characteristics of output load. Thus, we encode the fanout and level of u,

u_{1}

, and

u_{2}

into learnable embeddings (with four dimensions in practice) based on their values. That is, after normalizing the remaining four features in node features, the embedding

h_{u} \in R^{1 \times 28}

of node u is obtained by appending these together.

Cut Embedding. The features related to supergate delay of k-feasible cuts are also extracted from the fine-grained, where k is typically equal to 5 in the mapper of ABC. We collect 19 features for each cut from the cut–node and cut-structure aspects. In terms of cut–node, it includes six nodes as the features, i.e., the root node of the cut and its five leaves nodes. These features are stacked to create a representation $C_{n} \in R^{6 \times 28}$ by assembling each node embeddings $h_{u} \in R^{1 \times 28}$ . In terms of cut structure, it contains 13 features, i.e., (i) the root fanouts, (ii) the cut leaf number and the cut volume for the cut itself and its two parents, (iii) the max, min, and gap of the cut levels and fanouts, respectively. Similar to node embedding, the feature of root fanout in the cut-structure view is also encoded into a four-dimensional learnable embedding. The remaining 12 features of the cut-structure view are further normalized and the representation is denoted as: $C_{s} \in R^{1 \times 16}$ . In conclusion, the cut embedding is a combination of the cut–node $C_{n} \in R^{6 \times 28}$ and cut-structure $C_{s} \in R^{1 \times 16}$ .
Supergate Embedding. We extract 60 features associated with the mapped supergate delay for each supergate from an open-source ASAP 7nm PDK standard cell library [39] and ABC. Below, we provide a brief introduction to these 60 features. They contain (i) the basic descriptions of the standard cell, e.g., the area, leakage power, and the number of inputs/outputs, (ii) features related to delay on each pin, e.g., the estimation of load-independent delay using Equation (2) and our estimation using Equation (3) for different pre-defined gain factors, and (iii) the overall features of the supergate, e.g., max and sum delay on all pins. Thereby, the supergate embedding $S_{g} \in R^{1 \times 60}$ is obtained followed by the normalization.

4.3. Learning Supergate Delay

Taking advantage of the embedding of node–cut–supergate tuples and the generated supergate delay labels under the better circuit delay, we design the regression model to capture two inherent characteristics of supergates: (1) the varying contributions of different pins to the estimation of supergate delay and (2) the hierarchy of node–cut–supergate tuples.

Pin-delay-aware Attention based on Cut-Node. Due to various factors such as fanout, and load capacitance of the supergate, different pins should receive different attention to estimating the supergate delay. Essentially, the six-dimension features of cut–node embedding $C_{n} \in R^{6 \times 28}$ correspond to the representation of one output and five input pins, such as fanout and other graph structure features. Thus, we propose the following attention mechanism to learn the different weights of pins to be adhering the varying contributions of different pins on the supergate delay, which is guided by a better overall circuit delay, as shown in Figure 2.

First, a global context embedding

K \in R^{1 \times 28}

is computed, with an average of cut–node embeddings

C_{n}

followed by a nonlinear transformation:

\begin{matrix} K & = \tanh (mean (C_{n}) W) \\ CN & = σ (K C_{n}^{⊤} C_{n}) \end{matrix}

(7)

where

W \in R^{28 \times 28}

is a learnable weight matrix. Second, based on the context

K

, the pin-delay-aware supergate representation

CN \in R^{1 \times 6}

is obtained by performing weighted summation of the cut–node embeddings based on the aggregation coefficients

K C_{n}^{⊤}

, where

σ

is the sigmoid function. Equation (7) implies that the pin similar to the global supergate context should receive higher attention weights.

Cut-Supergate Fusion. The cut-structure view embedding $C_{s}$ provides the structural features potentially related to the supergate delay, e.g., root fanout of the cut, while the supergate embedding $S_{g}$ provides the physical characteristic of the standard cell directly related to the supergate delay.

Building upon the previously generated cut-structure

C_{s}

and supergate embedding

S_{g}

, a straightforward fusion method is to compute the inner product of the two embeddings or to model their relationship through a CNN [4]. However, as discussed in [40,41], such simple data representation methods often result in insufficient or weakened interaction between the two elements. Following the findings of [40,41], we employ the Neural Tensor Network to better jointly learn the structural features and the physical characteristics of supergates that contribute to supergate delay. Note, that the cut-structure embedding

C_{s} \in R^{1 \times 16}

is encoded by a linear feature mapping transformation into

C_{s} \in R^{1 \times 60}

before the NTN module.

CS = f_{2} (C_{s}^{T} W_{[1 : K]}^{2} S_{g} + V \cdot concat (C_{s}^{T}, S_{g}) + b_{2})

(8)

From Equation (8), the term

CS

fundamentally defines a relational function that describes a learnable distance in the representation space between

C_{s}

and

S_{g}

. Equation (8) specifies a learnable distance function, where

W_{[1 : K]}^{2} \in R^{60 \times 60 \times K}

represents a weight tensor, square brackets denote the concatenation operation,

V \in R^{K \times 120}

is a weight vector,

b_{2} \in R^{K}

is a bias vector, and

f_{2} (\cdot)

is an activation function. The hyperparameter K, often set to 3, controls the number of interaction (similarity) scores generated by the learning model.

Delay Prediction. Finally, the embedding for the supergate delay estimation is obtained by concatenating the cut–node $CN$ and cut–supergate $CS$ embeddings. A multi-layer perceptron (MLP) regression model is employed to gradually reduce the concatenated embedding for predicting the supergate delay w.r.t. the cut. That is, AiMap⁺ learns the delays of node–cut–supergate tuples from two perspectives: cut–node and cut–supergate, with labels corresponding to an improved circuit delay as shown in Section 3. These predictions are then utilized to guide the mapper’s search process, ensuring more accurate and efficient mapping by incorporating both load-independent and learned delay estimations.

The training process is guided by minimizing the Mean Squared Error (MSE) loss:

\begin{matrix} L = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - p_{i})}^{2} \end{matrix}

(9)

where

y_{i}

and

p_{i}

are our generated supergate delay labels and predicted the supergate delay, respectively.

Different from AiMap, in AiMap⁺, after using a learning model to predict the delay of supergates, we integrate these predictions with heuristic delay estimation (load-independent gate model) to collaboratively guide the ASIC mapper in gate selection. This approach is adopted because, as discussed by Wang et al. [9], utilizing learning models to generalize to unseen circuits often faces challenges, specifically the out-of-distribution (OOD) generalization problem. Therefore, by combining the load-independent gate model with predicted delays, we can achieve better experimental results on a broader range of unseen circuits, as discussed in Section 5.

5. Experiments

5.1. Experimental Settings

Datasets. We perform the evaluation on the EPFL benchmark [31] and ISCAS’85 benchmark [42]. As for the circuit selection to evaluate our proposed approach AiMap⁺, we have chosen mostly arithmetic designs and partially control logic designs, which contain 20 circuits in total, i.e., 17 arithmetic circuits and three control circuits, as shown in Table 5.

This benchmark dataset includes both arithmetic and control circuits, which exhibit significant differences in functionality and structure; the former focuses on numerical computations while the latter handles logic and control flows. Furthermore, the arithmetic circuits tested have a higher proportion of arithmetic circuits. To mitigate model dependency on specific circuit types and balance the learning load, we have opted to construct our training set using four classic circuits (i.e., adder, bar, max, and sin) and three control circuits (i.e., i2c, priority, and router). There are a total of almost 9000 node–cut–supergate tuples, as only the mapped supergates are selected for consideration. Each circuit label has undergone 900 iterations of parameter search based on the three strategies we proposed. 80% of the tuples are used as the training set, while the remaining 20% constitute the test set. For a fair comparison, our three training strategies are not used in the inference, though these strategies can be easily extended to improve the mapping QoR.

Implementation. We implement the framework AiMap⁺ based on the open source logic synthesis tool ABC [14] and the PyTorch for learning model [43]. In ABC, we implement two commands “gen_train” and “gen_inf” to generate the data collections for the training and testing phases, respectively. We also enhance the command “map” such that it supports updating the supergate delay with the predicted value and iteratively searching with the predicted delay. We employed the Adam optimizer with a learning rate of 0.001 to adjust the weights during training gradually. Additionally, we incorporated a weight decay of 0.005 to regularize the model, mitigating the risk of overfitting by penalizing large weights.

After training the learning model in the AiMap⁺ framework with the help of the generated training dataset using the “gen_train” command, we then perform technology mapping for test circuits with commands “gen_inf; map; topo; stime;”. Note, that we do not perform postmapping optimizations, e.g., fanout optimization/buffering [44,45,46], gate sizing [47,48,49], and gate replication [50], along the same setting with SLAP [20].

5.2. The Results of Training Model

Figure 3 depicts the convergence of the loss during the training process of AiMap⁺. The labels for these training datasets were obtained through three proposed training strategies (Section 3). The dataset was split into 80% for training and 20% for validation, used for model training and evaluation, respectively. After 750 training iterations (i.e., after three epochs), it is ensured that both the training loss and the validation loss decrease. During the training, we select the training parameters that avoid overfitting as the final inference parameters. The MSE on our training and testing sets are reduced to 7.0 and 9.1, respectively. This supports that AiMap⁺ is aware of the information about the node–cut–supergate tuples to facilitate learning the best supergate delay w.r.t. the best overall circuit delay we obtained.

Actually, our primary focus lies in evaluating the performance of model inference results in guiding the search of the mapper, and the results are presented below.

5.3. QoR of Technology Mapping

We present the achieved QoR improvements w.r.t. the area (μm²) and delay (

p s

) compared to the well-known technology-mapping available in ABC [14,26], the recent work SLAP [20] using supervised learning to filter the candidate cuts for each node, and a preliminary version of this paper AiMap [4].

Note, that in this test, area recovery is performed by ABC, AiMap, and AiMap⁺ to optimize the area with the same iterations.

Our main goal in this work is to improve the QoR of the mapped network. As previously stated the delay of the supergate has a significant impact on the QoR. From Table 5, we have the following findings.

Comparison against ABC. AiMap⁺ noticeably improves area by 9.3% and delay by 1.5% on the 20 evaluation circuits, on average. As for the area, AiMap⁺ improves 16/20 circuits, achieving up to 29% on 64b_mult, and for most multiplier circuits, it can bring about a 20% improvement. Furthermore, our performance on arithmetic circuits is significantly better than on control logic circuits, primarily due to the fact that the majority (more than 70%) of node–cut–supergate tuples in the training dataset arise from arithmetic circuits. As for the delay, AiMap⁺ improves 9/20 circuits, achieving up to 24% on the ctrl logic. The relatively small improvement in delay is primarily attributed to the fact that both ABC and AiMap⁺ undergo area recovery in this test. However, area recovery can have unpredictable effects on delay for ASIC mappers due to the resulting changes in the graph structure, which in turn, impact the delay of supergates.
Comparison against SLAP. AiMap⁺ significantly improves the area by 12.3% with a 9.7% delay penalty on 20 evaluation circuits, on average. The results of the circuit shared by SLAP and AiMap⁺ are derived from their original article, while we are unable to replicate the results of other circuits, which are denoted as “–”. As for the area, AiMap⁺ has surpassed SLAP on almost all circuits, achieving up to 33% on the bar circuit, with the exception of the adder (slightly less than 3%). As for the delay, the results of SLAP are superior to ours due to their exclusive focus on delay optimization. It should be noted that although both AiMap⁺ and SLAP learn strategies from historical decision data to guide the mapper’s search, their approaches differ significantly. SLAP learns to classify the priority cuts among mapping. However, the key issue in ASIC technology mapping is how to estimate the gate delay corresponding to a cut using structural information from the cut and process information from the library, a consideration absent in their methods [4].
Comparison against AiMap. Compared to AiMap, AiMap⁺ improves delay by an average of 2.6% without any area penalty across 20 evaluation circuits. This also validates the rationale behind our AiMap revision, which includes (i) designing a more refined network to capture the latent graph structure and physical information of supergates and (ii) integrating heuristic supergate delay estimation with regression-based delay estimation to jointly guide the mapper’s search.

It is worth noting that there is an overlap between our training and testing circuits, but this does not result in overfitting. This is because the predicted supergate delays are not the final results but merely guide the search for the mapper.

5.4. QoR of Delay-Oriented Technology Mapping

We further compare the results obtained from conducting only delay-oriented mapping for AiMap⁺, AiMap and ABC, i.e., they both perform mapping for only one iteration. The results are presented in Table 6, from which we can draw the following.

(1) In the test of delay-oriented mapping, AiMap⁺ is considerably better than ABC, with an average area improvement of 12%. Besides, in the majority of circuits presented in the table, both the area and delay metrics of AiMap⁺ are consistently superior to those of ABC. The results support that (i) the estimation of supergate delay has a significant impact on both the area and delay of the mapped circuit, as it stands for the search direction of the mapper process; (ii) AiMap⁺ has indeed learned features related to the supergate delay, which can be used to guide the search direction of the mapper.

(2) An interesting result has been observed in the adder’s results during delay-oriented mapping, where AiMap achieves a remarkable delay improvement of 15% compared to ABC. Surprisingly, it even outperforms our parameterized strategy-generated labels by 7% (the labels result being 2394.06

p s

). This indicates that our learning-based approach for supergate estimations not only captured delay-related features but also provided possibilities for exploring new mapping solution spaces. Note, that the mapping results are further proved by the combinational equivalence checking for the adder circuit.

5.5. QoR of Training Strategies

Training Strategies for delay optimization. We next highlight the QoR of the three strategies we put forward in Section 3 for generating better overall circuit delay. Note, that the three strategies are performed sequentially, e.g., the optimal parameters found in Strategy 1 will be carried forward to conduct the search in Strategy 2. We only recorded the improvement in delay, denoted as ΔDelay, disregarding the changes in the area. The results are presented in Table 7, from which we can draw the following findings.

(1) The three strategies we proposed have all been highly effective, averagely achieving delay improvements of 20.5%, 26.3%, and 26.6%, respectively, building upon the previous strategy. Among them, the improvement on the bar circuit is particularly noteworthy, with increases of 51.8%, 54.8%, and 54.9%, respectively. This demonstrates the effectiveness of the three strategies for generating learning labels.

(2) Strategy 1 is deemed to yield better results, primarily due to the larger approximation range of our load-independent supergate delay estimation for the supergate delay non-linear model, which is eminently significant.

Training strategies for ADP optimization. In addition, our strategies can be extended to enhance the mapped network for both area and delay (i.e., Area-Delay-Product, ADP). Figure 4 presents the ADP distribution of different circuits under the three training strategies, from which we can derive the following insights.

(1) The three strategies proposed in newmap effectively explore the design space, each with its distinct characteristics. Typically, Strategy 1 has a more significant impact on the Quality of Results of the mapped netlist, as it directly influences supergate selection from the perspectives of load and slew. In contrast, Strategy 3 has a comparatively minor effect on the QoR of the netlist since the filtered cuts only indirectly affect the choice of supergates.

(2) On average, the three strategies contribute to the area and delay improvements by 16.6%, 18.3%, and 18.8%, respectively. Notably, these strategies perform better on arithmetic circuits than on control circuits, likely due to the larger scale of arithmetic circuits, which offers a broader space for exploration.

5.6. Feature Importance Evaluation

We finally perform feature importance evaluation by randomly permuting the features on the test dataset, as shown in Figure 5. In general, a higher value for each feature indicates its greater importance in the learning model. It can be observed that the overall delay (including the max and sum delay of supergates) extracted from the supergate is the most crucial factor in our designed learning model, followed by the node/cut fanout feature. On the other hand, features such as leakage power and fanout gap are relatively less significant. It has been observed that the delay importance of pins 4 and 5 is notably low, primarily because, in the ASAP 7 nm library [39], the majority of standard cells have fewer than three input pins.

6. Conclusions and Future Work

In this work, we first formulated the cell delay estimation as a regression learning task by incorporating multiple features, e.g., node/cut fanouts, cut structures and non-linear delay behavior of supergates, to guide the mapper search. Then, we designed a framework AiMap⁺ by incorporating pin-delay-aware attention based on the cut–node embedding and a cut–supergate embedding fusion module by a Neural Tensor Network. The well-performed learning results were supported by the proposed three parameterizable strategies to generate learning labels. Experimental results showed that compared with ABC, our proposed method noticeably improves area by 9.3% and delay by 1.5%, and improves area by 12% for delay-oriented mapping.

There are several interesting future directions in technology mapping that merit exploration. (1) Gate selection is a critical task of the technology mapping process, and the existing frameworks exhibit biases in the selection of certain gates, such as XOR and Majority gates. Exploring targeted improvements in the perceptual capabilities of these specific gates represents a valuable research direction. (2) Enhancing the efficiency of technology mapping is also noteworthy, as the mapping process often involves numerous repetitions of node–cut–supergate tuples. Reducing these repetitive delay estimations could significantly improve efficiency. (3) Another promising area of research is enhancing the layout-aware capabilities during the technology mapping phase (e.g., cell delay and wire delay). It is crucial to incorporate wire delays arising from the layout into the mapping process, as these delays predominantly stem from the capacitive loading of long wires. In this context, adapting AIGs to a feasible layout during the technology mapping stage warrants careful consideration.

Author Contributions

Conceptualization, methodology, experiment and validation, J.L.; writing, data analysis, J.L. and Q.Z.; funding acquisition Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Major Key Project of PCL (No. PCL2023A03), National Natural Science Foundation of China (No. 61925203) and National Natural Science Foundation of China (No. U22B2021).

Data Availability Statement

We will publish source codes of this work at https://gitee.com/jfkey/learn2map (accessed on 8 August 2024). Data integral to this study may be obtained by contacting the corresponding author, who will ensure accessibility upon a duly justified request.

Acknowledgments

This article is a revised and expanded version of a paper entitled “AiMap: Learning to Improve Technology Mapping for ASICs via Delay Prediction”, which was presented at the International Conference on Computer Design (ICCD) 7 November 2023, Washington, DC, USA.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, L.; Chen, Y.; Chu, Z.; Fang, W.; Ho, T.Y.; Huang, Y.; Khan, S.; Li, M.; Li, X.; Liang, Y.; et al. The dawn of ai-native eda: Promises and challenges of large circuit models. arXiv 2024, arXiv:2403.07257. [Google Scholar]
Li, X.; Huang, Z.; Tao, S.; Huang, Z.; Zhuang, C.; Wang, H.; Li, Y.; Qiu, Y.; Luo, G.; Li, H.; et al. iEDA: An Open-source infrastructure of EDA. In Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 77–82. [Google Scholar]
Li, X.; Tao, S.; Chen, S.; Zeng, Z.; Huang, Z.; Wu, H.; Li, W.; Huang, Z.; Ni, L.N.; Zhao, X.; et al. iPD: An Open-source intelligent Physical Design Toolchain. In Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 83–88. [Google Scholar]
Liu, J.; Ni, L.; Li, X.; Zhou, M.; Chen, L.; Li, X.; Zhao, Q.; Ma, S. AiMap: Learning to Improve Technology Mapping for ASICs via Delay Prediction. In Proceedings of the International Conference on Computer Design (ICCD), Washington, DC, USA, 6–8 November 2023; pp. 344–347. [Google Scholar]
Pei, Z.; Liu, F.; He, Z.; Chen, G.; Zheng, H.; Zhu, K.; Yu, B. AlphaSyn: Logic synthesis optimization with efficient monte carlo tree search. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Francisco, CA, USA, 29 October–2 November 2023; pp. 1–9. [Google Scholar]
Wang, P.; Lu, A.; Li, X.; Ye, J.; Chen, L.; Yuan, M.; Hao, J.; Yan, J. Easymap: Improving technology mapping via exploration-enhanced heuristics and adaptive sequencing. In Proceedings of the ICCAD, San Francisco, CA, USA, 29 October–2 November 2023; pp. 1–9. [Google Scholar]
Grosnit, A.; Zimmer, M.; Tutunov, R.; Li, X.; Chen, L.; Yang, F.; Yuan, M.; Bou-Ammar, H. Lightweight Structural Choices Operator for Technology Mapping. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 9–13 July 2023; pp. 1–6. [Google Scholar]
Yu, L.; Guo, B. Timing-Driven Simulated Annealing for FPGA Placement in Neural Network Realization. Electronics 2023, 12, 3562. [Google Scholar] [CrossRef]
Wang, Z.; Chen, L.; Wang, J.; Bai, Y.; Li, X.; Li, X.; Yuan, M.; Hao, J.; Zhang, Y.; Wu, F. A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping arbitrary logic functions onto carry chains in FPGAs. Electronics 2021, 11, 27. [Google Scholar] [CrossRef]
Chowdhury, A.B.; Tan, B.; Carey, R.; Jain, T.; Karri, R.; Garg, S. Bulls-Eye: Active few-shot learning guided logic synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2022, 42, 2580–2590. [Google Scholar] [CrossRef]
Kok, C.L.; Li, X.; Siek, L.; Zhu, D.; Kong, J.J. A switched capacitor deadtime controller for DC-DC buck converter. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 217–220. [Google Scholar]
Liu, D.; Svensson, C. Power consumption estimation in CMOS VLSI chips. IEEE Solid-State Circuits 1994, 29, 663–670. [Google Scholar]
Brayton, R.; Mishchenko, A. ABC: An academic industrial-strength verification tool. In Proceedings of the International Conference on Computer-Aided Verification (CAV), Edinburgh, UK, 15–19 July 2010; pp. 24–40. [Google Scholar]
Keutzer, K. DAGON: Technology binding and local optimization by DAG matching. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), Miami Beach, FL, USA, 20 June–1 July 1987; pp. 341–347. [Google Scholar]
Cong, J.; Wu, C.; Ding, Y. Cut Ranking and Pruning: Enabling a General and Efficient FPGA Mapping Solution. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, USA, 21–23 February 1999; pp. 29–35. [Google Scholar]
Mishchenko, A.; Cho, S.; Chatterjee, S.; Brayton, R. Combinational and sequential mapping with priority cuts. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 4–8 November 2007; pp. 354–361. [Google Scholar]
Cong, J.; Ding, Y. FlowMap: An optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 1994, 13, 1–12. [Google Scholar] [CrossRef]
Cong, J.; Ding, Y. On area/depth trade-off in LUT-based FPGA technology mapping. IEEE Trans. Very Large Scale Integr. Syst. 1994, 2, 137–148. [Google Scholar] [CrossRef]
Neto, W.L.; Moreira, M.T.; Li, Y.; Amarù, L.; Yu, C.; Gaillardon, P.E. SLAP: A supervised learning approach for priority cuts technology mapping. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 5–9 December 2021; pp. 859–864. [Google Scholar]
Chen, D.; Cong, J. DAOmap: A depth-optimal area optimization mapping algorithm for FPGA designs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 7–11 November 2004; pp. 752–759. [Google Scholar]
Calvino, A.T.; De Micheli, G. Technology Mapping Using Multi-output Library Cells. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Francisco, CA, USA, 29 October–2 November 2023; pp. 1–9. [Google Scholar]
Liu, M.; Robinson, D.; Li, Y.; Yu, C. MapTune: Advancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning. arXiv 2024, arXiv:2407.18110. [Google Scholar]
Chen, C.; Hu, G.; Zuo, D.; Yu, C.; Ma, Y.; Zhang, H. E-Syn: E-Graph Rewriting with Technology-Aware Cost Functions for Logic Synthesis. arXiv 2024, arXiv:2403.14242. [Google Scholar]
Calvino, A.T.; Riener, H.; Rai, S.; Kumar, A.; De Micheli, G. A versatile mapping approach for technology mapping and graph optimization. In Proceedings of the IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, Taiwan, 17–20 January 2022; pp. 410–416. [Google Scholar]
Chatterjee, S.; Mishchenko, A.; Brayton, R.K.; Wang, X.; Kam, T. Reducing structural bias in technology mapping. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2006, 25, 2894–2903. [Google Scholar] [CrossRef]
Wu, M.C.; Dao, A.Q.; Lin, M.P.H. A Novel Technology Mapper for Complex Universal Gates. In Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), Tokyo, Japan, 18–21 January 2021; pp. 475–480. [Google Scholar]
Cai, Y.; Yang, Z.; Ni, L.; Xie, B.; Li, X. Enhancing ASIC Technology Mapping via Parallel Supergate Computing. arXiv 2024, arXiv:2404.13614. [Google Scholar]
Huang, S.C.; Jiang, J.H.R. A dynamic accuracy-refinement approach to timing-driven technology mapping. In Proceedings of the IEEE International Conference on Computer Design (ICCD), Lake Tahoe, CA, USA, 12–15 October 2008; pp. 538–543. [Google Scholar]
Hu, B.; Watanabe, Y.; Kondratyev, A.; Marek-Sadowska, M. Gain-based technology mapping for discrete-size cell libraries. In Proceedings of the DAC, Anaheim, CA, USA, 2–6 June 2003; pp. 574–579. [Google Scholar]
Amarú, L.; Gaillardon, P.E.; De Micheli, G. The EPFL combinational benchmark suite. In Proceedings of the IEEE/ACM International Workshop on Logic Synthesis, Washington, DC, USA, 1–23 September 2015. [Google Scholar]
Mishchenko, A.; Chatterjee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, USA, 22–24 February 2006; pp. 41–49. [Google Scholar]
Murgai, R. Technology-dependent logic optimization. Proc. IEEE 2015, 103, 2004–2020. [Google Scholar] [CrossRef]
Chatterjee, S. On Algorithms for Technology Mapping; University of California: Berkeley, CA, USA, 2007. [Google Scholar]
Touati, H.J. Performance-Oriented Technology Mapping; University of California: Berkeley, CA, USA, 1990. [Google Scholar]
Rudell, R.L. Logic Synthesis for VLSI Design; University of California: Berkeley, CA, USA, 1989. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Liu, J.; Zhou, M.; Ma, S.; Pan, L. MATA*: Combining Learnable Node Matching with A* Algorithm for Approximate Graph Edit Distance Computation. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), Birmingham, UK, 21–25 October 2023; pp. 1503–1512. [Google Scholar]
Clark, L.T.; Vashishtha, V.; Shifren, L.; Gujja, A.; Sinha, S.; Cline, B.; Ramamurthy, C.; Yeric, G. ASAP7: A 7-nm finFET predictive process design kit. Microelectron. J. 2016, 53, 105–115. [Google Scholar] [CrossRef]
Socher, R.; Chen, D.; Manning, C.D.; Ng, A. Reasoning with neural tensor networks for knowledge base completion. Adv. Neural Inf. Process. Syst. 2013, 26, 926–934. [Google Scholar]
Bai, Y.; Ding, H.; Bian, S.; Chen, T.; Sun, Y.; Wang, W. Simgnn: A neural network approach to fast graph similarity computation. In Proceedings of the ACM international conference on web search and data mining (WSDM), Melbourne, Australia, 11–15 February 2019; pp. 384–392. [Google Scholar]
Hansen, M.C.; Yalcin, H.; Hayes, J.P. Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering. IEEE Des. Test Comput. 1999, 16, 72–80. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
Berman, C.L.; Carter, J.L.; Day, K.F. The fanout problem: From theory to practice. In Proceedings of the Decennial Caltech Conference on VLSI on Advanced research in VLSI, Cambridge, MA, USA, 1 June 1989; pp. 69–99. [Google Scholar]
Singh, K.J.; Sangiovanni-Vincentelli, A. A heuristic algorithm for the fanout problem. In Proceedings of the ACM/IEEE Design Automation Conference, San Francisco, CA, USA, 17–22 June 1991; pp. 357–360. [Google Scholar]
Van Ginneken, L.P. Buffer placement in distributed RC-tree networks for minimal Elmore delay. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), New Orleans, LA, USA, 1–3 May 1990; pp. 865–868. [Google Scholar]
Fishburn, J.P. LATTIS: An iterative speedup heuristic for mapped logic. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), Anaheim, CA, USA, 8–12 June 1992; pp. 488–491. [Google Scholar]
Coudert, O.; Haddad, R.; Manne, S. New algorithms for gate sizing: A comparative study. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 3–7 June 1996; pp. 734–739. [Google Scholar]
Savoj, H. Technology dependent timing optimization. In Proceedings of the Workshop Note of IWLS, Tahoe City, CA, USA, 19–21 May 1997. [Google Scholar]
Srivastava, A.; Kastner, R.; Sarrafzadeh, M. Timing driven gate duplication: Complexity issues and algorithms. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design, San Jose, CA, USA, 5–9 November 2000; pp. 447–450. [Google Scholar]

Figure 1. An example of an AIG and its mapped circuit with the Boolean function

f = a \oplus b \land c \land d

. The dashed lines on the edges indicate the wires with inverters.

Figure 1. An example of an AIG and its mapped circuit with the Boolean function

f = a \oplus b \land c \land d

. The dashed lines on the edges indicate the wires with inverters.

Figure 2. Framework Overview of our proposed AiMap⁺. The red arrow stands for the data flow in the training and testing phases and the green arrow only denotes that in the testing. “Feature Emb.” refers to “Feature Embedder”.

Figure 3. Train Loss of the regression problem in our AiMap⁺.

Figure 4. QoR distribution of different circuits under the three training strategies.

Figure 5. Feature Importance analysis.

Table 1. Motivation example: the significant disparity between estimated circuit delay and actual circuit delay.

Circuits	Estimated Results		Actual Results		ΔDelay
Circuits	Area (μm²)	LI-Delay (ps)	Area (μm²)	Delay (ps)	ΔDelay
adder	898.31	2613.78	898.13	3770.65	44%
bar	2681.62	152.96	2680.39	1114.9	629%
log2	26,556.98	3891.66	26,561.26	6797.77	75%
cavlc	463.27	185.07	463.29	93.2	50%
int2float	158.61	174.27	158.63	91.7	47%
ctrl	106.92	98.53	106.84	89.9	9%

LI-Delay refers to the load-independent circuit delay estimation in ABC [14]. The actual circuit delay is computed by the non-linear delay model (NLDM) after the mapping is completed [14].

Table 3. Main notations.

Notations	Descriptions
$d (g)$ ( $d_{s} (g)$ )	load-independent gate delay estimation (from the view of slew)
$LD$ ( ${LD}_{s}$ )	induced delay per unit load (slew)
$Ga$ ( ${Ga}_{s}$ )	pre-defined gain factor for load (slew)
$P$ ( $P_{s}$ )	load-independent parasitic delay (from the view of slew)
$\bar{d} (g)$	our designed load-independent gate delay estimation
$\bar{d} (g, l)$	delay of supergate g w.r.t. the cut under output load l
$RF$	fanouts of the root nodes of the cut
$LF$	fanouts of the leaf nodes of the cut
$Arr (v)$	arrival time of the node v

Table 4. Summarization of node features.

Node Features	Child 1 Features	Child 2 Features
level ( u)	level ( $u_{1}$ )	level ( $u_{2}$ )
fanout (u)	fanout ( $u_{1}$ )	fanout ( $u_{2}$ )
inverter (u)	inverter ( $u_{1}$ )	inverter ( $u_{2}$ )
re-level (u)	–	–

Table 5. Results comparisons of ABC, SLAP, AiMap.

Circuits	# Nodes	ABC		SLAP		AiMap		AiMap⁺		AiMap⁺/ABC		AiMap⁺/AiMap
Circuits	# Nodes	Area (μm²)	Delay (ps)	Area (μm²)	Delay (ps)	Area (μm²)	Delay (ps)	Area (μm²)	Delay (ps)	Area	Delay	Area	Delay
adder	1020	898.1	3770.7	1031.3	3268.7	1061.0	3486.1	1066.6	3482.4	1.19	0.92	1.01	1.00
bar	3336	2680.4	1114.9	3083.2	923.8	2059.4	1059.0	2072.7	1140.6	0.77	1.02	1.01	1.08
log2	32,060	26,561.3	6797.8	–	–	23,330.1	6855.8	23,413.9	6963.5	0.88	1.02	1.00	1.02
multiplier	27,062	25,458.3	4649.1	–	–	20,106.4	4512.4	20,202.3	4456.0	0.79	0.96	1.00	0.99
sin	5416	5207.0	3955.6	5087.6	3584.8	4503.0	3599.2	4451.0	3435.0	0.85	0.87	0.99	0.95
sqrt	24,618	20,252.2	180,518.1			19,918.6	185,281.0	19,720.6	187,546.9	0.97	1.04	0.99	1.01
C6288	2335	2991.8	1248.6	3023.5	1236.6	2479.5	1385.4	2430.8	1314.9	0.81	1.05	0.98	0.95
C7552	2835	1978.5	797.5	2002.0	800.1	1758.9	913.8	1796.7	948.1	0.91	1.19	1.02	1.04
mul32-b	13,711	9889.4	3229.5	–	–	8041.2	3410.5	7873.0	3386.5	0.80	1.05	0.98	0.99
mul64-b	53,023	39,736.5	6715.1	–	–	31,087.4	7058.3	30,838.2	6371.0	0.78	0.95	0.99	0.90
64b_mult	65,856	52,318.6	7922.6	–	–	37,275.1	9343.5	37,155.0	9527.1	0.71	1.20	1.00	1.02
aes	21,119	18,190.0	656.6	16,489.6	594.6	15,792.1	625.6	15,938.6	643.9	0.88	0.98	1.01	1.03
max	2865	2312.3	3809.0	2292.4	3710.5	2128.2	5035.8	2154.1	4120.7	0.93	1.08	1.01	0.82
s9234_1	776	670.9	310.8	–	–	652.7	341.3	669.1	318.6	1.00	1.03	1.03	0.93
s5378	1186	903.3	387.4	–	–	943.9	365.9	960.9	327.1	1.06	0.84	1.02	0.89
C5315	2004	1342.3	591.0	–	–	1279.3	617.8	1242.5	609.9	0.93	1.03	0.97	0.99
i2c	1342	981.6	300.2	–	–	981.6	300.2	993.1	269.8	1.01	0.90	1.01	0.90
cavlc	693	471.2	294.1	–	–	480.3	259.8	474.7	249.0	1.01	0.85	0.99	0.96
int2float	260	159.6	160.0	–	–	162.1	163.3	162.1	172.9	1.02	1.08	1.00	1.06
ctrl	174	107.5	134.0	–	–	107.3	102.0	107.3	102.0	1.00	0.76	1.00	1.00
Geomean	3876	3073.9	1537.8			2789.2	1555.1	2789.4	1515.1	0.907	0.985	1.000	0.974

Note: We highlight AiMap⁺ if it beats the counterparts w.r.t. the area and delay. The results of SLAP derive from the article, whereas “–” refers to the experiment results that we were unable to replicate. Compared to ABC, we improved upon 16 out of 20 cases. In comparison with AiMap, we enhanced performance in 13 out of the 20 cases.

Table 6. Delay-oriented mapping results of ABC, AiMap and AiMap⁺.

Circuits	ABC		AiMap		AiMap⁺		AiMap⁺/ABC		AiMap⁺/AiMap
Circuits	Area (μm²)	Delay (ps)	Area (μm²)	Delay (ps)	Area (μm²)	Delay (ps)	Area	Delay	Area	Delay
adder	2369.43	2618.21	1398.51	2225.75	1417.64	2540.38	0.60	0.97	1.01	1.14
bar	4198.34	1936.9	2779.53	1428.83	3196.17	1915.18	0.76	0.99	1.15	1.34
sin	9784	6456.01	8691.31	6996.77	8412.31	6942.23	0.86	1.08	0.97	0.99
sqrt	48,105.13	303,498.88	37,596.11	301,290.41	40,166.38	296,842.06	0.83	0.98	1.07	0.99
C6288	3822.76	1191.86	2479.53	1385.40	3977.19	1371.81	1.04	1.15	1.60	0.99
C7552	3748.58	1605.98	3733.41	1163.79	3608.38	1221.21	0.96	0.76	0.97	1.05
aes	29,727.57	939.01	25,064.54	791.39	23,618.2	793.37	0.79	0.84	0.94	1.00
max	4079.6	4640.24	4250.59	5667.84	4333.64	5555.36	1.06	1.20	1.02	0.98
C5315	2605.5	721.18	2522.92	778.18	2390.19	758.59	0.92	1.05	0.95	0.97
router	329.16	617.27	452.56	609.04	345.49	630.81	1.05	1.02	0.76	1.04
Geomean	4834.76	2862.15	4126.13	2729.81	4235.29	2850.57	0.88	1.00	1.03	1.04

Table 7. Training results w.r.t. three training strategies.

Circuits	ABC		Strategy 1			Strategy 2			Strategy 3
Circuits	Area (μm²)	Delay (ps)	Area (μm²)	Delay (ps)	$Δ$ Delay	Area (μm²)	Delay (ps)	$Δ$ Delay	Area (μm²)	Delay (ps)	$Δ$ Delay
adder	2371.12	2618.21	2325.37	2394.06	8.6%	2325.37	2394.06	8.6%	2325.37	2394.06	8.6%
bar	4198.34	1936.90	2389.64	934.51	51.8%	3169.06	876.30	54.8%	3275.38	872.59	54.9%
max	4092.32	4640.24	3448.38	4372.64	5.8%	2724.51	2948.38	36.5%	2731.99	2939.40	36.7%
sin	9785.40	6456.01	10,840.20	4498.79	30.3%	8465.33	4438.97	31.2%	8467.88	4434.73	31.3%
i2c	1167.24	244.35	1209.60	222.21	9.1%	1241.32	208.55	14.7%	1241.32	208.55	14.7%
priority	1736.08	2856.16	1697.73	2669.83	6.5%	1697.73	2669.83	6.5%	1697.73	2669.83	6.5%
router	452.56	609.04	441.56	497.60	18.3%	441.56	497.60	18.3%	464.73	488.26	19.8%
Geomean	2323.49	1813.76	2113.47	1442.44	20.5%	2061.37	1336.26	26.3%	2087.21	1331.07	26.6%

Note: We only recorded the improvement in delay, disregarding changes in area.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, J.; Zhao, Q. AiMap⁺: Guiding Technology Mapping for ASICs via Learning Delay Prediction. Electronics 2024, 13, 3614. https://doi.org/10.3390/electronics13183614

AMA Style

Liu J, Zhao Q. AiMap⁺: Guiding Technology Mapping for ASICs via Learning Delay Prediction. Electronics. 2024; 13(18):3614. https://doi.org/10.3390/electronics13183614

Chicago/Turabian Style

Liu, Junfeng, and Qinghua Zhao. 2024. "AiMap⁺: Guiding Technology Mapping for ASICs via Learning Delay Prediction" Electronics 13, no. 18: 3614. https://doi.org/10.3390/electronics13183614

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AiMap⁺: Guiding Technology Mapping for ASICs via Learning Delay Prediction^†

Abstract

1. Introduction

2. Preliminary and Related Works

2.1. Boolean Networks

2.2. Technology Mapping for ASICs

2.3. Timing Analysis

2.4. Problem Formulation

3. Learning Label Generation

4. Approach–AiMap⁺

4.1. Framework Overview

4.2. Feature Embedder

4.3. Learning Supergate Delay

5. Experiments

5.1. Experimental Settings

5.2. The Results of Training Model

5.3. QoR of Technology Mapping

5.4. QoR of Delay-Oriented Technology Mapping

5.5. QoR of Training Strategies

5.6. Feature Importance Evaluation

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

AiMap+: Guiding Technology Mapping for ASICs via Learning Delay Prediction †

Abstract

1. Introduction

2. Preliminary and Related Works

2.1. Boolean Networks

2.2. Technology Mapping for ASICs

2.3. Timing Analysis

2.4. Problem Formulation

3. Learning Label Generation

4. Approach–AiMap+

4.1. Framework Overview

4.2. Feature Embedder

4.3. Learning Supergate Delay

5. Experiments

5.1. Experimental Settings

5.2. The Results of Training Model

5.3. QoR of Technology Mapping

5.4. QoR of Delay-Oriented Technology Mapping

5.5. QoR of Training Strategies

5.6. Feature Importance Evaluation

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

AiMap⁺: Guiding Technology Mapping for ASICs via Learning Delay Prediction^†

4. Approach–AiMap⁺