1. Introduction
Continuous growth in the complexity of modern analogue circuits has been observed in recent years. As the circuits become evermore complex, so does the duration of their verification process.
Given the intense pressure on semiconductor manufacturers to rapidly develop and release novel designs and products, the task of circuit verification assumes an important role in the design process, particularly during the Pre-Si phase, in order to ensure the reliability and functionality of the circuits prior to fabrication [
1]. Normally, the verification process of a circuit is split into Pre-Si verification and Post-Si verification, the focus of our work being the former. While Pre-Si verification focuses on verifying a digital model of a circuit by using a relatively large number of simulations, Post-Si verification deals with assuring that the physical end product performs as intended [
2]. Pre-Si verification is particularly time consuming, with some studies showing that it can take up to 50–70% of the development period [
3,
4]. One reason for this is that failing to check the functionality of a circuit during Pre-Si verification will lead to failure in the Post-Si phase or, even worse, during production. This will lead to redesign delays and a substantial loss in terms of both costs and time. In this regard, machine learning (ML) approaches have been introduced in circuit verification to reduce the time and costs required. Such ML methods are applied for analogue [
5,
6] and digital [
7,
8] circuits, while some works deal with both Pre-Si and Post-Si verification [
9,
10]. An alternative methodology involves the application of symbolic quick error detection, which leverages formal methods and mathematical reasoning to verify the correctness of a given system [
11]. Paradoxically to its importance, Pre-Si analogue circuit verification is rarely reported on in scientific papers. Moreover, benchmark tests for such methods are relatively hard to come by. While our focus is on analogue circuits, efforts have been made in creating synthetic benchmarks for digital circuits [
12,
13,
14], while others aim to reduce the simulation time of the verification process through emulating various circuits [
15]. With the motivation of evaluating the ever growing complexity of algorithms found in CAD tools, [
13] presents a method for obtaining register–transfer-level synthetic benchmark circuits using evolutionary algorithms. The circuits presented in this work are relatively complex, with up to millions of gates. A survey focusing on the generation of synthetic digital circuits can be found in [
16]. Another work [
17], this time dealing with sequential circuits (which are still digital circuits), shows a method for generating synthetic versions of benchmark circuits using an abstract model of the circuit together with netlists. This method proves to be more accurate than a previous one that employed random graphs. Since this can be treated as an optimization problem, some surveys have been created regarding test functions for benchmarking [
18,
19,
20].
In our previous work [
21,
22,
23] on this topic, we presented various components of a complete and state-of-the-art circuit verification algorithm. This algorithm was evaluated on real circuits such as LDOs. However, a full, exhaustive validation of the algorithm was required, and, to this end, we designed and implemented a large benchmark of synthetic functions of various complexities with a variable number of input operating conditions. The main contributions of this work are the following: (i) we present an overview of the full circuit verification algorithm, (ii) we describe the synthetic function benchmark, and (iii) we exhaustively evaluate the performance of the verification algorithm on the synthetic benchmark.
The paper is organized as follows:
Section 2 will present our circuit verification method overview, while
Section 3 will detail the synthetic benchmark. Finally, in
Section 4 and
Section 5, we will highlight our evaluation metrics, together with our results, and discuss conclusions.
2. Method Overview
The main objective of our candidate selection algorithm comprises circuit design verification and validation in the Pre-Si phase, and more explicitly in assuring that a circuit adheres to all of its imposed specification thresholds with respect to its responses. Since the classical approach consists of using a wide range of simulations in order to guarantee proper functionality, our algorithm aims to achieve at least the same performance, while using a smaller number of simulations. This would reduce the necessary time and costs for circuit verification.
Our algorithm can be split into two main stages: the fixed planning (FP) stage and the adaptive planning (AP) stage. Both stages will be explained in detail in the following sections. In short, during the FP stage, we create an initial set of candidates. With those, we conduct a preliminary circuit check and then use this set of candidates in the AP stage. The aforementioned set of candidates is represented by circuit input values or operating conditions (OCs) and the resulting response values (as received from the simulator). In the AP stage, we train surrogate models for the circuit’s responses, using the initial data from FP, in order to better pinpoint the potential failures of the circuit. More specifically, the role of the AP phase is to try to find a failure OC for the circuit based on the predictions of the surrogate models. By using the simulated data in FP, the surrogate models will achieve good initial coverage of the circuit responses. From this point onward, the algorithm will enter an iterative process in which new candidates will be proposed by the surrogate models and simulated in order to verify if any of the circuit’s specifications have been violated. The difference between each iteration consists in the fact that the surrogate models are retrained with the simulations from previous iterations on top of the already existing FP data.
For the surrogate models, we employed Gaussian processes (GPs), which are an ML technique, or, more precisely, a Bayesian method. GPs have been used in various statistical problems for a long period of time [
24]. The primary practical advantage of GPs is their ability to provide a reliable estimate of their own uncertainty. A GP’s uncertainty will increase as we sample in points further away from the training data. This is a direct consequence of the probabilistic and Bayesian foundation of GPs. Another advantage is the fact that, depending on the problem, we can model the GPs’ prior belief through kernels. We should also note that the GPs have disadvantages, namely that they are computationally expensive, due to the fact that they are non-parametric methods, if we are not to take the kernel hyper-parameters into account. This means that all of the training data are considered each time a prediction is made, leading to a cubic increase in computational costs for predictions as the number of training samples increases. We should also note that GPs lose efficiency in high-dimensional spaces, namely when the number of features exceeds a few dozens [
25].
2.1. Fixed Planning
The FP stage plays the role of creating an initial set of candidates with the purpose of being later utilized for the training of the GPs. The usual approach of classical verification is to perform a full factorial (FF) search by testing all combinations of minimum, nominal and maximum values for the circuit OCs. For example, a 6 OC circuit, would require
= 729 simulations. In comparison, our FP approach uses a sampling method for proposing the initial candidate set such that the number of simulations used in FP, together with the ones in AP, is much lower then that in an FF approach. Several sampling methods have been considered, with the objective of achieving good initial coverage of the hyperspace. These methods include Monte Carlo, orthogonal arrays (OAs) and Latin hypercube sampling (LHS) [
26].
OA [
27] is a sub-sampling method that strategically selects a set of OCs from the FF pool of OCs. Besides the number of samples, an OA has three other characteristics: factors, level and strength. The factors of an OA represent how many OCs could be used for a given OA; since the OA is a matrix, the number of factors is equivalent to the number of columns. The level represents how many values the OCs could take. For example, an OA containing values of 0 and 1 is at level 2. This means that the OC could only take minimum and maximum values, while level 3 would indicate that OCs take minimum, maximum and middle (or nominal) values. In practice, OAs are generally used with 2 or 3 levels. The strength,
s, is an integer that indicates that the set of row values chosen from any
s column of the OA appear the same number of times. LHS [
28] represents another sampling method that takes into account the number of dimensions,
d (number of OCs in our case), and stratifies each dimension into
n equal strata. This sampling method is also known as
n-rooks. If we are to take a simple example of
d = 2 dimensions, the grid would resemble a checkerboard in which we sample tiles such that by selecting any one row and column, only one rook can have direct access to that tile.
Our work [
21], which handled the problem of comparison between the 3 sampling methods in the context of analogue circuit verification, showed that, out of the 3 methods, OA has the best performance, obtaining results similar to those of the FF approach while using far fewer simulations. The next step consisted of combining OA and LHS in order to further improve upon those results. After testing on various budgets (with a fixed number of simulations), the combined method outperformed the individual sampling methods.
2.2. Adaptive Planning
Moving on to the 2nd step of the algorithm, in adaptive planning, the OCCs simulated in the FP stage are used as training data for GP surrogate models. One GP is trained for each of the circuit responses, so that we can have an initial estimate of the response functions. From here, an initial evaluation candidate set is created, from which the GPs can select the next candidate for each of the circuit responses to be simulated in order to fail under the imposed specifications. This evaluation set is created from an FF grid, where each input OC can take its minimum, maximum and middle values; this sums up to
equally distanced candidates in the OC hyperspace, where L is the number of levels (3 in our case) and N_OC represents the number of operating conditions. To this grid set, we further add a number of LHS samples, depending on the number of OCs. This combination of grid and LHS candidates will form our initial pool of evaluation candidates. Since this is an iterative process, our GPs would normally have to choose a candidate from the same fixed evaluation set, which could constitute a problem, due to the fact that the candidate which would lead to the true minimum of a response is not guaranteed to be among the ones in the evaluation set. To this end, the evaluation set is altered, by modifying the values of the OCs based on the GP estimates and with the help of gradient descent (GD) or evolutionary algorithms, such as GDE3 [
29]. The GD approach was introduced in our previous work [
22], with the aim of minimizing the response functions (the GP estimates more precisely) so that the GP could have a better candidate pool to choose from. GDE3, on the other hand, represents an evolutionary algorithm, with its main advantage being that it can handle various types of problems, but more importantly multi-objective problems, in contrast to the single-objective GD approach. The GDE3 approach was considered in order to further improve our GD approach, due to shortcomings detailed in [
22], such as the possibility of GD becoming “stuck” in a local minimum. Our comparison of the 2 candidate evaluation pool improvement approaches (GD vs GDE3) [
23] has shown that using GDE3 can introduce improvements to the overall algorithm, despite these not being significant and not being observed in all scenarios. Initially, the GD approach was evaluated using 3 different acquisition functions: probability of improvement (PI), expected improvement (EI) and the lower confidence bound (LCB). These acquisition functions are extracted from the GP estimates and represent a way of scoring the candidates. PI is a measure that represents the probability of improving upon the best found value so far, while EI indicates how much we can expect to improve upon that value. The LCB, on the other hand represents the lower envelope of the GP variance or uncertainty and can be defined as
.
After evaluating the GD approach using these 3 acquisition functions for the GPs, LCB showed the best results [
22]. GDE3, on the other hand, being a multi-objective approach, uses all 3 acquisition functions. While our first, simpler, GD approach proceeded to select the best candidate for each response from the new candidate evaluation pool (after applying GD) based on the LCB score, the GDE3 approach will perform non-dominated selection of candidates based on the Pareto front, selecting the best 100. Only after this step will the GDE3 algorithm be applied, resulting in our new candidate evaluation pool. Similarly to the GD approach, the GDE3 approach will proceed to rank the new candidates based on the hyper-volume determined by the 3 acquisition functions. The ensemble of processes in the GDE3 approach will be further denoted as multi-objective acquisition function ensemble (MACE), described in [
23] and inspired from [
30,
31]. The main steps of the circuit verification algorithm using the MACE approach can be found in Algorithm 1, while an overview of the process can be observed in
Figure 1.
Regarding the implementation of our GP surrogate models, we employed the BoTorch framework [
32], while an ablation study in terms of hyper-parameter tuning is presented in [
22], which highlights various comparisons between different learning rates and used optimizers.
In terms of effectiveness, we can look at the results in
Table 1, which highlight the comparison between the classical approach, the FP-only stage and the AP stage. Through the classical approach, we understand the exhaustive search in the input OC hyperspace. As we can see in the table, the obtained values of FP and AP are compared with the ones obtained after the classical approach is used in a circuit with 5 responses. Four of the responses require minimization (where values should be above the imposed specification), while the last one requires maximization (where values should be lower than the imposed specification). Red indicates values that are not as good than the ones found via the classical approach, while green signifies equal or better values. While FP alone did not obtain values nearly as good as those obtained with the classical approach, together with the AP stage, we find the same worst cases (the closest values to the specifications). The main advantage is that our algorithm needs about 10× fewer iterations to find the same worst case. Nonetheless, there are some limitations of our method. Mainly, these revolve around the stochastic nature of the algorithm. Especially in higher dimension spaces, it is possible for the classic approach to find better candidates, while the GP-based algorithm gets stuck in local minima. On the other hand, the classical approach is not feasible in higher-dimension spaces, as it would require a number of simulations that will exponentially grow with the increase in input dimensions. For example, in a 9-dimension space, the classical approach would employ
simulations, whereas our algorithm can usually handle this number of dimensions with under 200 simulations (as shown in
Section 4.3.1).
Algorithm 1: Circuit Verification Algorithm |
|
3. Synthetic Benchmark Creation and Development
After developing the verification algorithm (described in
Section 2), we continued by assuring its robustness and high performance under different scenarios. It is worth mentioning that in previous papers (see [
21,
22,
23]), we only presented preliminary results, focusing on results of a couple of real and synthetic circuits. In comparison, in this paper, we set out to develop a data-driven approach for Pre-Si circuit verification. The results presented in
Section 4 cover the reported performance of 900 circuits ranging from 2 to 10 input/operating conditions.
During synthetic benchmark development, we started with the assumption that every synthetic circuit is mathematically composed of different functions. Each function models how a specific input (or operating condition) affects the output (or response). As the algorithm trains a separate GP model for each circuit response, synthetic circuits with just one response will be sufficient to assess the performance.
Next, we defined a list with one-dimensional functions with varying complexities. This list forms the basis of all proposed synthetic circuits. In order to compile the list, we combined both the functions already used in our previous synthetic circuits experiments [
21,
22,
23] with new ones. In the pursuit of obtaining various wave shapes, we started from the Mexican hat function definition and varied the arguments; sometimes, we even multiplied the result by a certain trigonometric function. The resulting graphs vary greatly, and the mother function is no longer visually distinguishable. Following this methodology, we obtained a list of 30 different one-dimensional functions. The proposed functions were developed by specialized circuit engineers, to ensure that they resemble plausible graphs resulting from real circuits.
In order to report results in a consistent manner, we needed a method to measure the complexity of the synthetic circuits. In this phase, we considered different approaches to asses a method for the 30 one-dimensional functions. To the best of our knowledge, there is no universal method with which to obtain a numerical value representing the complexity of a certain mathematical function. One idea that we considered during early development stages was to measure something tangential to the somewhat vague notion of “complexity”. As there are mathematical tools with which to asses whether a function is stationary or whether a function is normally distributed, we started by employing these tools to achieve an objective measure of the function. The reason for the normal distribution assessment was that, by definition, the GP works better when it can be fitted to Gaussian distributions. By implication, a function that is not normally distributed would pose more difficulties to the GP training process.
Therefore, we supposed that the less likely a function is to be stationary, the more complex it is. Analogous to the notion of normal distribution, the less likely a function is to be normally distributed, the more complex it should be. The idea of this notion is to use the numerical outputs of such assessment mathematical tools to construct a complexity metric. In this regard, we considered the following statistical tests: the augmented Dickey–Fuller, Kwiatkowski–Phillips–Schmidt–Shin, Shapiro-Wilk, Anderson–Darling, and Lilliefors tests. The first 3 are stationarity tests and the following 2 are tests that evaluate how likely a function is to be normally distributed. These tools output continuous values that could be interpreted as scalar measures. Because their intended use is not to asses complexity, we hoped that the results of the statistical tests would correlate with our visual impressions. Unfortunately, high ranges and significant discrepancies between subjective complexity assessment and numerical outputted values discouraged us from further pursuing the idea of mathematical tests.
After experimenting with an objective method to measure complexity, we focused on the alternative approach: subjective assessment. Each function was graded from 1 to 10, where 1 means least complex. In order to mitigate the disadvantages associated with subjectivity, a group of 6 people independently graded the functions. Then, the final complexity scored was computed as the mean value.
In order to illustrate the assumptions we adhered to while grading, in
Figure 2, we present some examples for what we considered simple, medium, and difficult functions.
The aforementioned functions have the following formula:
These functions were selected in order to emphasize the variability in the complexity of the benchmark. From the start, we envisioned the benchmark to include various complexity functions. We considered a low-complexity 1D function to be a straight line or a curve, a medium-complexity one to contain several slope changes, and lastly, a high-complexity one representing functions that resemble a ringing waveform. These 1D functions are then combined to form synthetic circuits with various OC numbers (or input features).
The next challenge was deciding on how to combine the individual functions in order to obtain the final definition of the synthetic circuit. One idea we took into consideration was to randomize between addition and multiplication. The problem with this approach was that the complexity of the resulting hyperplane would have been significantly difficult to assess. For example, minima or maxima could multiply in unpredictable manners (in the absence of high-level mathematical analysis, which is neither the scope of this project nor useful for future research directions with the proposed synthetic benchmark).
Considering the aforementioned disadvantage, we continued with an intuitive way to combine functions. Thus, we decided to add them. This approach is simple and easy to debug in the case of synthetic circuits that are not learned in a satisfactory manner by the GP. Plus, it offers more opportunities for measuring the resulting complexity. The first explored idea was to compute the complexity of the resulting synthetic circuit as the mean value between the complexities of the functions that make up that circuit. The problem we found with the approach is that it cannot compare circuits with different dimensional inputs. The capacity of the GP model to learn a hyperplane decreases with the dimensional increase in the target function. For example, a two-dimensional response with a mean complexity of 5 is clearly easier to learn than a ten-dimensional response with the same mean complexity. In order to compare complexities between different input dimensions, we continued the experiments considering multiplicative complexity (MC), which is calculated as the product of individual complexities. Therefore, circuits with different dimensional inputs but with the same MC are comparable in terms of the difficulty they impose on the GP model.
Therefore, at this point, we could define a response,
R (which corresponds to the entire synthetic circuit as it has one response), as follows,
where
,
N is the number of circuit/response input conditions, and
j is randomly chosen between 1 and 30 and indicates the index of the function in the function list.
After consulting circuit engineers, we concluded that the MC of a hypothetical real circuit could not be very high. Therefore, we settled on an arbitrary MC maximum threshold in order to pursue the data-driven evaluation method. Another aspect we consulted the circuit engineers on is the maximum number of OCs for a given circuit. Thus, the experimental paradigm is focused on circuits with a maximum MC value of 40, with a maximum of 10 OCs.
The MC for the proposed benchmark could be summarized as follows:
where
j represents the indexes of functions that comprise the
R response.
In terms of scale, we decided to keep a constant definition of the functions: . This uniformity will help in debugging in the algorithm in the future.
5. Conclusions
In this paper, we present a validation of our circuit verification algorithm through an extensive synthetic benchmark, consisting of multiple evaluations sets, each differentiated by the number of input OCs. As the number of input OCs increases, so does the complexity of the evaluation set, representing a good indicator for observing the performance of a verification method across different circuit configurations. The performance of our circuit verification method is highlighted on this benchmark, with RVE results under 2% using a relatively low number of simulations and with only a few circuits requiring over 100 iterations. As these latter circuits have eight or more input OCs, it is to be expected that these circuits would require more simulations due to the high complexity. Overall, the evaluation sets represent a good benchmark for any circuit verification method that aims to find the absolute minimum or maximum response values for various circuit configurations and complexities. This, in turn, can be an indicator of whether or not an algorithm is consistent enough to be further applied in Pre-Si verification. It is worth emphasizing that a synthetic circuit can be used only as a preliminary validation technique for Pre-Si verification. Our benchmark, although very diverse, cannot fully replace validation on real circuits. Nonetheless, it can still give precious insights into the capabilities of the verification method. The main advantages of a synthetic benchmark over a real one include the following: complete knowledge of the output hyperspace with respect to the input, and decreased time for computing the response (in comparison with PSpice-like models, which require extensive periods of time to simulate a real circuit). On the other hand, synthetic circuits might fail to model the intricacies that naturally arise in real circuits.
For future work, we will consider further updating the methodology for the validation process by testing the verification algorithm on the synthetic benchmark using multiple random seeds in order to mitigate randomness impacts. As for the verification algorithm, an extensive test on real circuits will further validate its efficiency.