Predicting the Characteristics of High-Speed Serial Links Based on a Deep Neural Network (DNN)—Transformer Cascaded Model

Wu, Liyin; Zhou, Jingyang; Jiang, Haining; Yang, Xi; Zhan, Yongzheng; Zhang, Yinhang

doi:10.3390/electronics13153064

Open AccessArticle

Predicting the Characteristics of High-Speed Serial Links Based on a Deep Neural Network (DNN)—Transformer Cascaded Model

by

Liyin Wu

¹

,

Jingyang Zhou

¹,

Haining Jiang

¹,

Xi Yang

¹,

Yongzheng Zhan

²

and

Yinhang Zhang

^1,*

¹

School of Communication and Electronic Engineering, Jishou University, Jishou 416000, China

²

Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Co., Ltd., Jinan 250101, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(15), 3064; https://doi.org/10.3390/electronics13153064

Submission received: 1 July 2024 / Revised: 24 July 2024 / Accepted: 31 July 2024 / Published: 2 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

The design level of channel physical characteristics has a crucial influence on the transmission quality of high-speed serial links. However, channel design requires a complex simulation and verification process. In this paper, a cascade neural network model constructed of a Deep Neural Network (DNN) and a Transformer is proposed. This model takes physical features as inputs and imports a Single-Bit Response (SBR) as a connection, which is enhanced through predicting frequency characteristics and equalizer parameters. At the same time, signal integrity (SI) analysis and link optimization are achieved by predicting eye diagrams and channel operating margins (COMs). Additionally, Bayesian optimization based on the Gaussian process (GP) is employed for hyperparameter optimization (HPO). The results show that the DNN–Transformer cascaded model achieves high-precision predictions of multiple metrics in performance prediction and optimization, and the maximum relative error of the test-set results is less than 2% under the equalizer architecture of a 3-taps TX FFE, an RX CTLE with dual DC gain, and a 12-taps RX DFE, which is more powerful than other deep learning models in terms of prediction ability.

Keywords:

high-speed link; signal integrity; eye diagram; channel operating margin; cascaded model

1. Introduction

As the transmission bandwidth of wireline serial link technology reaches the GHz level, it is no longer possible to ensure efficient signal transmission by simply optimizing the dielectric and layout structure. High-speed serial link systems suffer from serious signal integrity (SI) problems due to the skin effect, dielectric loss, crosstalk, reflections, and jitter; therefore, the SI analysis becomes more and more strict in the design stage of high-speed serial links. The simulation analysis of SI usually consists of two steps: electromagnetic field solvers (EMFSs) and circuit system simulation [1]. Firstly, an EMFS is used to obtain S-parameters to characterize the frequency response of the circuit. Then, these S-parameters are imported into the model circuit system for time-domain simulation to obtain the main SI metrics, including an eye diagram, the impulse response, and transient waveforms.

Although the traditional SI analysis based on physical models of high-speed links can offer high accuracy, it consumes a lot of time and computer resources. Input/Output Buffer Information Specification Algorithmic Model Interface (IBIS-AMI) is a behavioral model that simulates the input/output behavior and algorithms of end-to-end high-speed serial links, simplifying the internal physical details [2]. Compared to physical models, it has the advantages of speed, simplicity, and low resource consumption, but it has poor accuracy and lacks flexibility. Compliance standard template comparison is another commonly used method [3]. This can swiftly and intuitively estimate channel performance, but it inevitably discards the channel margins due to the need to strictly satisfy discrete metrics [4]. On the contrary, the channel operating margin (COM) technique can effectively overcome this disadvantage by searching the optimal design space of the whole link in the form of a signal-to-noise ratio (SNR). Compared to the traditional SI metrics such as eye diagrams and the bit error rate (BER), the COM approach shows advantages such as simpler operation, faster speed, and more efficient testing. Obviously, compared with Annex69B channel evaluation for 10 Gb/s Ethernet (10 GbE), using COMs is more accurate [5]. Although the COM approach can provide an effective way to evaluate and optimize high-speed links, it requires numerous iterations for spatial search and is not flexible enough to analyze high-capacity channels.

Machine learning (ML) has been widely used in the simulation and design of high-speed links in recent years, and shows the ability to improve the efficiency of SI analysis. The authors of [4,6,7,8,9,10] searched for features from simulation data and trained Artificial Neural Network (ANN), Deep Neural Network (DNN), and Least-Squares Support Vector Machine (LS-SVM) models to replace circuit system simulations for the accurate prediction of time-domain (TD) and frequency-domain metrics such as eye height (EH)/eye width (EW) and return loss (RL)/insertion loss (IL); however, the acquisition of simulation data consumes a lot of computational resources and time. Feedforward Neural Networks (FNNs), Random Forest Regression (RFR), and Support Vector Machines (SVMs) are used to achieve simple predictions of simulation data [11,12,13], such as for predicting S-parameters and impulse responses. When dealing with more complex simulation data, traditional ML methods may lose many features. The Recurrent Neural Network (RNN) is an ML model that can effectively capture the features of complex data, especially for sequence data such as S-parameters and impulse responses. The authors of [14,15,16,17,18,19,20] employ RNN and Long Short-Term Memory (LSTM) architecture to create surrogate models for predicting the transient response waveforms of complex high-speed links. These independent surrogate models can perform SI analysis based on simulation data, but cannot deal with the physical parameters of links. SI analysis based on physical parameters requires more complicated ML methods, and the authors of [21,22,23,24] achieved effective predictions of physical parameters used for assessing high-speed link performance by combining multiple deep learning algorithms. Moreover, balanced architecture optimization for high-speed serial links can be achieved based on ML [4,25,26,27]. In conclusion, performance prediction and architecture optimization [28] are two important parts of ML applications for high-speed serial links.

In this paper, a DNN–Transformer cascaded model is proposed for the SI analysis and optimization of high-speed serial links. This model can skip the use of EMFSs and circuit system simulation, and directly predicts SI metrics including EH/EW, IL/RL, impulse response, and COM values according to physical parameters. Meanwhile, the optimization of links can be achieved by using this model to predict the corresponding COM values and equalizer parameters for the links with different equalizer configurations. Furthermore, Bayesian optimization based on the Gaussian process (GP) is used to optimize the hyperparameters under the same conditions for different combinations of models. Compared with the prediction of high-speed link performance using traditional ML [6,7,8,9,10], DNN–Transformer can directly use the physical parameters of the link for analysis, and its prediction accuracy is significantly better. For the prediction of simulation data such as impulse responses, DNN–Transformer can capture more features and its prediction ability is more accurate than that achieved when using an RNN [14] or LSTM [20] model alone to create a surrogate model. The results show that the DNN–Transformer model can achieve more effective prediction. The performance and feasibility of the proposed method is illustrated by the prediction data and graphical results given in this paper.

The main contributions of this paper can be summarized as follows:

(1): Based on the key physical parameters of channels, neural network models are used to directly analyze link performance, and this obviates the time-consuming processes of EMF solving and circuit system simulation.
(2): Through neural network models, the accurate prediction of multiple SI indicators and equalizer parameters is achieved. Additionally, we show that SI analysis and link optimization can also be rapidly achieved.
(3): A DNN–Transformer cascaded model is proposed, Bayesian optimization is used to tune the hyperparameters of this model, and its superior performance is demonstrated by comparing it with other models.

The rest of this paper is organized as follows: Section 2 discusses the basic principles of this work, including the principles of the SI and COM methods. Section 3 describes how the simulation dataset was created. In Section 4, the design idea of the DNN–Transformer model and test metrics are presented. Section 5 gives the numerical results to demonstrate the prediction performance of our model. Finally, Section 6 concludes the paper.

2. Fundamental Methods

2.1. Signal Integrity and S-Parameters

As transmission rates increase, the inefficiencies of traditional parallel signal transmission methods are becoming apparent and high-speed serial digital transmission is becoming widely used. Figure 1 shows the widely used SerDes structure in a high-speed serial link system. It comprises a transmitter chip with a serializer and a feedforward equalizer (FFE), a channel, and a receiver chip with a decision feedback equalizer (DFE) and a de-serializer. The FFE at the sending end mainly performs pre-emphasis on the signal to improve its high-frequency component. The composition of a wireline channel is complex, usually including traces, vias and connectors, etc. These physical structures exhibit low-pass filter characteristics and can enlarge the high-frequency loss of a signal, severely worsening the signal transmission quality and leading to SI problems. The continuous-time linear equalizer (CTLE) at the receiving end is mainly used to compensate for the high-frequency loss of the channel, eliminate the pre-cursors, and suppress the trailing of the pulse response. The DFE is mainly used to eliminate the post-cursors of the pulse response. The combination and parameter settings of the FFE, CTLE, and DFE have a significant impact on the SI of ultra-high-speed wireline serial communication links.

The main goal of SI analysis is to detect and reduce the factors that cause losses, such as dithering, reflection, and crosstalk. On the one hand, the SI can be analyzed from the frequency domain, as shown in Figure 2a. This mainly includes IL, RL, insertion loss deviation (ILD), and the insertion loss-to-crosstalk ratio (ICR), which are subsequently placed into a template for evaluation. On the other hand, the SI can be evaluated in the time domain, such as through eye diagrams, transient simulations, bathtub curves, and bit error rates, as shown in Figure 2b.

The critical method of traditional SI analysis involves the acquisition of S-parameters. These parameters contain comprehensive Frequency Domain (FD) characteristics of the transmission channel and offer a large amount of information on aspects such as reflection, crosstalk, and loss. Moreover, S-parameters can be employed in time-domain simulations to generate data such as eye diagrams and bathtub curves.

2.2. Channel Operating Margin

The COM method is a high-speed serial link characterization method recommended by the IEEE 802.3 working group for channel compliance testing. The official definition states that a “COM is a figure of merit (FOM) for channels determined from a minimum reference PHY architecture and channel s-parameters” [29]. This approach provides a relatively accurate and fair environment for the physical design of channels by considering various factors such as loss, reflection, inter-symbol interference (ISI), dispersion ISI, crosstalk, and device specifications, enabling a relatively accurate assessment of channel performance and the impact of aggressor channels on victim channels. The impact of equalizers has been considered in subsequent versions of the COM method, and the value of a COM based on the FOM can be improved by selecting equalizer parameter settings. Consequently, calculating the COM can also determine whether the channel quality meets the transceiver’s SI requirements [30]. As shown in Equation (1), the COM can be expressed by the ratio of available signal amplitude

A_{s}

to statistical noise amplitude

A_{n}

.

COM = 20 \times \log_{10} (\frac{A_{s}}{A_{n}})

(1)

The process of deriving the COM involves several essential steps, including determining the transfer function, converting the transfer function into the impulse response, applying transmitter and receiver equalizer algorithms, and performing statistical noise calculations. Figure 3 presents a flowchart of COM model derivation and depicts two paths for the victim and aggressor channels. Additionally, the model also considers transmitter and receiver package interference S (tp/rp), termination resistance Rd, equalizers, and filters. To better simulate actual channels, the COM model incorporates Gaussian white noise and jitter at the receiver end. Figure 4 shows the detailed process of COM value calculation.

The equalizer parameters are mainly set based on the channel loss. The FOM serves as a quantitative metric for assessing channel quality and equalizer performance. It takes into account several factors affecting SI, including ISI, jitter, crosstalk, and noise. The FOM calculation formula can be expressed as follows:

FOM = 10 \times \log_{10} (\frac{A_{s}^{2}}{σ_{TX}^{2} + σ_{ISI}^{2} + σ_{J}^{2} + σ_{XTK}^{2} + σ_{N}^{2}})

(2)

The numerator

A_{s}

is derived from the amplitude of the impulse response at

t_{s}

, which corresponds to the main cursor of the impulse response

h (t)

. The denominator includes the sum of variances from all noise, jitter, and interference components.

σ_{T X}^{2}

represents the noise variance at the transmitter,

σ_{I S I}^{2}

denotes the variance in the residual ISI amplitude, and

σ_{J}^{2}

indicates the variance in jitter amplitude. In COM analysis, jitter is accounted for by converting horizontal jitter into vertical noise at the sampling instant

t_{s}

. Additionally,

σ_{X T K}^{2}

represents the total crosstalk variance from all interference paths, while

σ_{N}^{2}

denotes the Gaussian white noise at the receiver sampling point.

The COM approach entails exploring all possible parameter combinations of the TX and RX equalizers within a set range to find the configuration that maximizes the FOM. This process determines the values of

A_{s}

and

A_{n}

that result in the optimal FOM.

Traditional methods rely on various indicators such as jitter, eye height, and eye width, but the COM serves as a comprehensive metric for evaluating a serial link. It can significantly reduce the computation time and number of iterations, and provides an accurate evaluation of channel performance both before and after equalization.

3. Dataset Construction

3.1. Channel Design and Dataset Splitting

In this work, the channel datasets were generated using professional PCB design software. As shown in Figure 5, Altium Designer 20.0.13 and Allegro 17.4 (EDA software) were primarily utilized to create the layouts for high-speed differential lines. Subsequently, the PCB layout files were imported into Advanced Design System 2021 (ADS) or HFSS 2023 R1 for electromagnetic (EM) simulation. Our design conforms to the specifications for USB end-to-end differential transmission lines, including parameters such as line width and length, the characteristic impedance of transmission channels, and differential line spacing.

The differentiated characteristics of channel data, as presented in Table 1, primarily include parameters such as material, trace dimensions, and PCB type. The board characteristics encompass both numerical features—such as permittivity (Er), the dissipation factor (Df), and board thickness—and categorical features such as the PCB type. Er consists of the sheet permittivity’s real part and the dielectric loss tangent angle (TanD). Df is derived from the Djordjevic model, which is defined by parameters including the low and high frequencies of TanD and relative high-frequency permittivity. In this study, Df is quantified according to the numerical values of TanD and its corresponding high and low frequencies. The PCB type is categorized into either the stripline type, represented by a numerical value of 1, or the microstrip type, represented by 0.

These trace features are quantified based on specific parameters such as the trace length, width, thickness, conductivity, profile, and differential line spacing. The profile is characterized by the geometric shape of the trace, such as the number of corners. The other characteristics are directly represented by their corresponding numerical values.

3.2. COM Configuration and Equalizer Setting

This work utilizes the COM 4.0 version program provided by IEEE 802.3. The configuration references the enhanced COM (eCOM) simulation setup with the USB4 Gen4 standard and incorporates the elements of the 50 GBASE-KR standard. The PAM4 configuration settings, as shown in Table 2, employ a combination of a transmitter FFE and receiver-side CTLE and DFE. This configuration includes the tap coefficients of the TX FFE and RX DFE, the adaptive range of DC gain, and the zero-pole positions of the RX CTLE. Additional parameters include the number of signal levels, denoted by L. For PAM4 signals, L is set to 4. According to the recommended PAM4 setup based on USB4, the symbol rate is set to 20 GBd, and the differential peak voltage output of the transmitter is set to 0.4 V, denoted as A_v. The USB4 Gen4 standard suggests that PAM3 should be used as the modulation method. The COM settings for PAM3 are roughly same as those of PAM4, except that the symbol rate is set to 25.6 GBd, and L is set to 3.

Based on the adopted configuration of TX FFE + RX CTLE + RX DFE, the prediction parameters include tap coefficients of the FFE, DC gain of the CTLE, and tap coefficients of the DFE. The TX FFE and RX CTLE data are obtained by searching for FOM values throughout the design space, while the DFE coefficients are determined according to the pulse response h(t) corresponding to the optimal FOM. The SI is significantly impacted by post-cursors closer to the main cursor, while the more distant post-cursors exert minimal influence, so only the first four post-cursors of the DFE are predicted.

3.3. Dataset Splitting

For this work, a total of 325 channels were collected. In this dataset, 280 channels were used to create an eye diagram dataset, referred to as Dataset A, and 215 channels were used to create a dataset for the COMs and predicate the parameters of the equalizers, referred to as Dataset B. Due to the impact of the FOM, the COM approach uses different equalizer parameter optimization methods for stripline and microstrip systems, which results in a weakened fitting ability of the ML model. Consequently, we standardized Dataset B using strip-line channels. Among the 215 channels in Dataset B, 170 channels are stripline channels in Dataset A, while 45 channels are newly created stripline channels. According to standard methodology, both datasets are partitioned into training, validation, and test sets. Part of the test set in Dataset B is extracted from the validation set. The split settings are detailed in Table 3.

4. Construction of the Cascaded Model and Training

4.1. DNN

The M-P model, proposed by psychologist McCulloch and logician Pitts, is a mathematical model that was developed through an analysis and synthesis of the basic properties of neurons [31]. This model is crucial for the implementation of neural networks and constitutes the foundational unit of a DNN. The structure of this model is depicted in Figure 6. The

x = {(x_{1}, x_{2}, \dots, x_{n})}^{T}

represents the n input features of the neuron, and

ω_{i} = (ω_{i 1}, ω_{i 2}, \dots, ω_{i n})

denotes the weight vector responsible for linear weighted connections. The weight vector ω is optimized within neural networks to enhance the accuracy of predictions. The linear weighted output Z can be expressed as follows:

Z_{i} = \sum_{j = 1}^{n} ω_{i j} x_{j}

(3)

Figure 6 depicts the structure of the M-P neuron model. The difference between Z and an offset θ is used as the input to the function

f (\cdot)

. The function

f (\cdot)

is referred to as the activation function and used to derive the output of the neuron. θ can be considered as the weight of input

x_{0}

with a fixed value of −1 and is also an optimization target of the neural network. In this paper, the ReLU function is adopted as the activation function because it can more effectively mitigate the occurrence of overfitting. The formula for this is shown in Equation (4).

f_{ReLU} (x) = \max (0, x)

(4)

The output of the activation function is typically used as the input for adjacent layers. DNNs can complete tasks such as regression, classification, and recognition. DNNs are widely used for regression tasks in production and performance evaluation [4,6,7,8,9,32]. The DNN used in this paper, as shown in Figure 7, adopts a standard structure with multiple hidden layers, along with an input layer and a linear regression output layer [33], and employs a fully connected network architecture between its neurons.

In order to make the model converge better, the training model usually requires a cost function to evaluate the gap between model-predicted values and actual values. In this paper, Smooth L1 [34] is adopted as the cost function:

{smooth}_{L 1} = {\begin{array}{l} 0.5 \times {(y_{n} - {\hat{y}}_{n})}^{2} & if | y_{n} - {\hat{y}}_{n} | < 1 \\ | y_{n} - {\hat{y}}_{n} | - 0.5 & otherwise \end{array},

(5)

where

y_{n}

and

{\hat{y}}_{n}

represent the real value and predicted value, respectively.

The performance of the model is evaluated using the cost function; then, an optimizer is employed to backpropagate and continuously minimize the cost function. The model parameters and weights are continuously updated until the expected performance is achieved. After comparing several different optimizers, the Adam optimizer was ultimately selected for use in this paper [35].

4.2. Transformer for Regression

The accuracy of equalizer parameters and COM values directly predicted by channel feature parameters is relatively low. In this work, the Single-Bit Response (SBR) before and after equalization is first predicted based on the feature parameters of the channel, and then the equalizer parameters and COM values are determined according to the predicted pulse response. The pulse response needs to be used as a time series for prediction. Traditionally, RNN or LSTM models are used for time-series prediction, but these networks have certain limitations. The Transformer model can effectively improve the gradient explosion problem that occurs in LSTM by using attention mechanisms, thereby enhancing prediction precision [36].

The formula of an attention mechanism is shown in Equation (6). It mainly consists of three inputs: Q (Query), K (Key), and V (Value), along with a softmax module. When Q, K, and V are identical, it is referred to as a self-attention mechanism.

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(6)

The complete Transformer architecture primarily includes an embedding layer, a positional encoding layer, an encoder layer, a decoder layer, and an output layer. In this work, since our task is a time-series prediction task rather than an NLP task, the positional encoding layer has been removed. The essence of time-series prediction can be viewed as a regression task; the model only needs to generate a single output value corresponding to each time point and does not need a decoder to generate sequence outputs. Using only the encoder layer is sufficient, which simplifies the model structure and reduces the computational load. As shown in Figure 8, our final design keeps the embedding layer, the encoder layer, and the final output layer. Additionally, in order to achieve faster convergence, the Gaussian Error Linear Unit (GELU) was chosen as the activation function.

f_{GELU} (x) = x P (X \leq x) = x \times \frac{1}{2} [1 + \erf (x / \sqrt{2})]

(7)

4.3. Cascaded Model and Training

As illustrated in Figure 9, a cascaded Transformer and DNN model is utilized in this work to accurately predict the COM value and its corresponding equalizer coefficients. A total of 325 channel features have been collected as a dataset, and a DNN model is used to predict the spectral feature for each channel, specifically the IL and RL at the Nyquist frequency points. These predicted spectral data, along with the channel features, are then fed into a Transformer model to derive the pre-equalization SBR. Utilizing the pre-equalization SBR, the corresponding equalizer parameters under the COM configuration are calculated by the DNN model. In order to compute the post-equalization SBR, equalizer parameters combined with the channel features and frequency-domain features are then sent into another Transformer model. Finally, the sampled post-equalization SBR, channel features, and spectral features are used to compute the final corresponding COM value in the DNN model.

The input vector of physical characteristics for each channel can be defined as follows:

X = {x_{i j}}_{i = 1 : N_{C}, j = 1 : N_{P}},

(8)

where

N_{C}

represents the number of channels in the dataset,

N_{P}

is the total number of channel features, and X is an

N_{C} \times N_{P}

matrix. The element

x_{i j}

denotes the quantified value of a specific physical feature of a channel: for instance,

x_{11}

corresponds to the length feature value of first channel.

The DNN used in the cascaded model can be represented by Equation (9):

\begin{array}{l} Y & = F_{D N N} (Z, θ) \\ = W^{(l)} f_{ReLU} (\dots (W^{(2)} f_{ReLU} (W^{(1)} X + θ^{(1)}) \\ + θ^{(2)}) \dots) + θ^{(l)} \end{array},

(9)

where Z represents the linearly weighted output of input matrix X and model weights W, as described in Equation (3), and θ represents the offset. The index l denotes the layer of the DNN model, which includes the input layer, hidden layers, and output layer. Our network employs a total of four different DNN models with varying hyperparameters:

Y^{f}

for predicting spectral features,

Y^{e}

for predicting equalizer parameters,

Y^{E}

for predicting the eye diagram, and

Y^{C}

for predicting the COM.

The augmented matrix

X_{f}

is obtained by horizontally concatenating the input matrix X of the first DNN model with its output

Y^{f}

:

\begin{array}{l} X^{f} & = [X | Y^{f}] \\ = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 N_{P}} & y_{1 I} & y_{1 R} \\ x_{21} & x_{22} & \dots & x_{2 N_{P}} & y_{2 I} & y_{2 R} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ \\ x_{N_{C} 1} & x_{N_{C} 2} & \dots & x_{N_{C} N_{P}} & y_{N_{C} I} & y_{N_{C} R} \end{matrix}] \end{array},

(10)

where the column vector

y_{I} = {(y_{1 I}, y_{2 I}, \dots, y_{N_{c} I})}^{T}

represents the predicted channel IL and

y_{R} = {(y_{1 R}, y_{2 R}, \dots, y_{N_{c} R})}^{T}

represents the predicted channel RL.

To enhance the accuracy of the predicted COM value, the pulse response is extracted from the pre- and post-equalization SBR in the COM tool. The peak corresponding time point is taken as the main cursor, and the amplitudes of four pre-cursors and eight post-cursors determined by the COM sampling interval are selected as the required SBR. Then, the pulse of each channel is flattened and the corresponding time labels are inserted, forming a sequential feature vector. Equation (11) represents the sequence of amplitude values. After this processing, the Transformer model can be employed to accurately predict the SBR.

{\begin{array}{l} {S B R}_{1} (t) = F_{T r a n s} [x_{11} (t), x_{12} (t), \dots, \\ x_{1 N_{P}} (t), y_{1 I} (t), y_{1 R} (t)] \\ {S B R}_{1} = [S B R_{1} (t - t_{0}), \dots, \\ S B R_{1} (t), \dots, S B R_{1} (t + t_{1})] \end{array},

(11)

where

F_{T r a n s}

denotes the Transformer model being utilized,

S B R_{1} (t)

represents the SBR corresponding to the input feature vector of the channel at a specific time point, and

{S B R}_{1}

represents the SBR sequence of the channel.

t_{0}

and

t_{1}

are the left and right boundaries of the time range, respectively.

By feeding the SBR into the subsequent DNN model with distinct parameters, the prediction of equalizer parameters can be achieved via Equation (12):

Y^{e} = F_{D N N} ({S B R}_{1}, {S B R}_{2}, \dots, {S B R}_{N_{C}})

(12)

By integrating the equalizer parameters with the input featuring spectral characteristics, an augmented matrix

S B R_{N_{C}}^{e q} (t)

is constructed for the prediction of post-equalization SBR. This matrix, via Equation (13), is then processed into a time-dependent sequence following the methodology presented in Equation (11).

{\begin{array}{l} S B R_{N_{C}}^{e q} (t) = F_{T r a n s} [x_{N_{C} 1} (t), \dots, x_{N_{C} N_{P}} (t), \\ y_{N_{C} I} (t), y_{N_{C} R} (t), Y_{N_{C}}^{e} (t)] \\ \begin{array}{l} {S B R}_{N_{C}}^{e q} = [S B R_{N_{C}}^{e q} (t - t_{0}), \dots, \\ S B R_{N_{C}}^{e q} (t), \dots, S B R_{N_{C}}^{e q} (t + t_{1})] \\ Y_{N_{C}}^{e} (t) = [y_{F F E - N_{C}}^{T a p s} (t), y_{C T L E - N_{C}}^{D C g a i n} (t), y_{D F E - N_{C}}^{T a p s} (t)] \end{array} \end{array}

(13)

Finally, the obtained post-equalization SBR is concatenated with

X^{f}

to form the final input features for predicting the COM value.

Before formally training the cascaded model, we standardized the input data and employed a pre-training strategy. Initially, the DNN models

F_{D N N}^{f r e q} (X)

and

F_{D N N}^{e q} (X)

designed for predicting spectral features and equalizer parameters, respectively, were trained independently, and their corresponding parameters were saved. These pre-trained DNN models were then integrated with the necessary Transformer model to construct the complete cascaded model for predicting the equalizer parameters and COM, as depicted in Figure 9. For training the cascaded model, Smooth L1 was used as the cost function and the Adam algorithm was applied to update the model parameters. With the help of the optuna toolkit, model structuring and hyperparameter optimization (HPO) were performed via Bayesian optimization using the GP [37,38]. The chosen acquisition function was the Expected Improvement (EI) function, with an initial set of 10 observation points, allowing for effective HPO within a computationally feasible scope.

Compared to the global grid search, random grid search, and halving search methods, Bayesian optimization is more efficient for HPO and requires less optimization time. Additionally, the K-Fold cross-validation method is employed in this paper to enhance the reliability of the model evaluation.

After completing the model’s training, its predictive capabilities can be assessed with several different metrics. One of these metrics, shown in Equation (14), is the RMSE. The squaring operation in RMSE increases its sensitivity to errors, making it particularly responsive to outliers. This increased sensitivity can also make it overly responsive to outliers, but the square-root operation cannot intuitively reflect the actual magnitude of error.

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{n} - {\hat{y}}_{n})}^{2}}

(14)

MAE is calculated according to the average value of absolute differences between the predicted and actual values. Compared to RMSE, MAE is less sensitive to outliers and provides a more intuitive reflection of the actual magnitude of errors. However, this can lead to an underestimation of the impact of the errors.

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{n} - {\hat{y}}_{n} |

(15)

MAPE is the average value of absolute percentage differences between the predicted and actual values. Compared to RMSE and MAE, MAPE is easier to understand and allows for the comparison of prediction results across different scales. It is more suitable for situations with significant changes in actual values. However, when the actual values are close to zero, the error can become large, so MAPE is unsuitable for datasets containing zero or near-zero values.

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{n} - {\hat{y}}_{n}}{y_{n}} |

(16)

5. Numerical Results

In this work, the neural network model has been constructed using a Python library based on Pytorch 2.0.1. The computer used for model computation included a GeForce RTX 4070 and an Intel Core i5-12600 KF 3.7 GHz.

5.1. IL and RL

As shown in Figure 10, the IL and RL are losses at Nyquist frequency, and the model’s predictions of IL are more accurate than its predictions of RL. This is because IL is primarily influenced by the dielectric loss and input features, including dielectric loss-related parameters. Conversely, RL is affected by additional factors such as impedance mismatches and complex trace shapes, complicating its prediction. Therefore, the accuracy of IL prediction is better than that of RL prediction. The relative error of RL prediction is 17.4%, but this is sufficient to support the accurate prediction of eye diagrams and COM values.

5.2. EH/EW of the Eye Diagrams

Figure 11 shows the relative errors in eye height and width for 40 test channels with the PAM3 and PAM4 modulation methods. Channels No. 1 to No. 32 are stripline channels, while channels No. 33 to No. 56 are microstrip channels. Channels No. 1 to No. 10 exhibit more complex trace paths, resulting in larger errors in eye height and eye width. In contrast, channels No. 11 to No. 23, which have simpler routing with straighter or fewer bend paths, show better fitting precision. Most channels in the dataset utilize FR-4 as the dielectric layer, whereas channels No. 24 to No. 32 employ alternative materials, resulting in relatively larger errors. Compared to the stripline channels, channels No. 33 to No. 56, which utilize microstrip lines, show significantly higher relative errors, demonstrating that microstrip lines have weaker anti-interference capabilities than striplines. Table 4 shows the RMSE, MAE, MAPE, and MRE for the eye diagram prediction with the PAM3 and PAM4 modulation methods. It is evident that the prediction results exhibit good convergence and precision. Because the EH unit is chosen as mV, the RMSE and MAE metrics increase by an order of magnitude compared to EW. The results in this table, combined with those in Figure 11, demonstrate that the PAM3 and PAM4 models are effective and exhibit high accuracy.

5.3. COM

Because many pre-cursor and post-cursor SBR values are close or equal to zero, MAPE and MRE are not suitable for evaluating the accuracy of SBR predictions, and only RMSE and MAE are used to evaluate the prediction accuracy for the SBR. For the SBR before equalization of the 40 test channels, the RMSE is 7.3 × 10⁻⁴ and the MAE is 6.3 × 10⁻⁴. For the SBR after equalization, the RMSE is 1.1 × 10⁻³ and the MAE is 1.6 × 10⁻⁴. Below, we use a randomly selected channel (ID: 3) from the test set to illustrate the SBR prediction performance. As depicted in Figure 12, both the pre-equalization and post-equalization SBR can be predicted by the Transformer model with high precision.

For the PAM4 signal format, Figure 13 provides a concise demonstration. We selected an equalizer configuration that includes a 3-taps TX FFE, an RX CTLE with dual DC gain, and a 12-taps RX DFE (hereafter referred to as the standard equalizer configuration). As shown in Figure 13, channels No. 7 and No. 30 have large relative errors due to their complex shapes. Some channels with high RL, such as channels No. 7–No. 9, also exhibit significant relative error. The relative error of the predicted COM values for each test channel is within 2%, demonstrating the ability of the DNN–Transformer model to achieve the required prediction accuracy.

Here, the influences of different combinations of equalizers on the prediction precision of COM values are analyzed deeply. As shown in Table 5, we configured five different equalizer combinations. The results indicate that our proposed cascaded model achieves the expected prediction performance across various equalizer combinations. However, it is apparent that a reduction in the variety of equalizers weakens the model’s generalization capability and decreases its prediction accuracy. When only the DFE was enabled, the RMSE increased to 1.6 × 10⁻¹, and the MRE reached 17.5%. We attribute this to the optimization method of this equalizer within the COM framework and the fact that the cascaded model also takes into account predictions for both the CTLE and FFE. Therefore, we adjusted the network structure by removing unused equalizers in the cascaded model for different equalizer combinations. This modification has resulted in an improvement in the model’s prediction accuracy for COM values, with the MRE decreasing from 17.5% to 11.5%.

Below, a test channel (ID: 3) is used to illustrate the prediction capability of the model with the first four taps of the DFE and three taps of the FFE. As shown in Figure 14a, the cascaded model realizes highly accurate predictions. Furthermore, the prediction results for the main DC gain of the CTLE for the 40 test channels, as presented in Figure 14b, also indicate a high level of accuracy.

The predictive capabilities of different cascaded model combinations were investigated as follows. The equalizer configuration was set as the standard configuration, and the HPO parameters were uniformly configured, with the learning rate search range set from 0.0001 to 0.01, conducting 20 trials for optimization. Given the characteristics of the SBR, we tested the predictive performance of standalone models, including a DNN, an LSTM network, and a Transformer. The DNN combined with two sequence prediction models, LSTM and a Transformer, was selected as a cascaded scheme. The DNN-LSTM model replaces the Transformer module in Figure 9 with a typical LSTM network. The DNN, LSTM, and Transformer use independent models without time label processing modules, so these models can only achieve eye diagram and COM value predictions. The best structural parameters of these models—such as the number of hidden layers and the hidden layer size of the LSTM, and the head number, vector dimension, and feedforward network dimension of the Transformer’s attention mechanism—were all obtained through Bayesian optimization. As presented in Table 6, the cascaded models demonstrate superior predictive capabilities compared to the standalone models. Among the cascaded models, those incorporating a Transformer display the best predictive performance, indicating that the DNN–Transformer cascaded model is the optimal COM prediction model.

As shown in Table 7, we have compared the DNN–Transformer model with the models presented in the Introduction section in terms of functionality. The traditional LSTM [20] structure could not achieve prediction under physical parameter variables. Traditional machine learning models such as the SVM [13], RFR [12], and DNN [9] models have limited processing capabilities for sequences and thus cannot predict transient waveforms. The GNN-RNN [21] model simplifies the interconnected components and circuits, meaning that some of the actual parameters of links are lost and the performance is decreased. The above models could not achieve link optimization, and RFR [12] could only complete FD prediction. In contrast, the proposed DNN–Transformer is more practical. This model can effectively achieve these above behavioral tasks and performs better than other models in RMSE and MAE for COM value prediction. Due to the increased complexity of this model, its inference time is increased, but it still has a significant time advantage compared to traditional EM simulation whose single-solution time is several hours.

6. Conclusions

In this paper, a DNN–Transformer cascaded neural network is proposed for effectively analyzing the SI in high-speed serial links. During the dataset creation process, we referenced the USB4 Gen4 and 50GBASE-KR standards for PCB design and electromagnetic simulation, and used the physical design parameters of each channel as inputs for the model. This DNN–Transformer model is used in this paper to extract the features from physical design parameters of channels and successfully predict the eye diagram data and COM values of test links. In addition, this deep learning model can successfully predict the SBR before and after equalization, and main equalizer parameters for different combinations are also accurately predicted. For the model’s training, we employed a Bayesian optimization method based on the GP for HPO. Finally, this paper compares DNN–Transformer with various other models such as DNN, LSTM, Transformer, and DNN-LSTM models. The results shows that our DNN–Transformer cascaded model accurately achieves performance prediction and equalization architecture optimization for high-speed serial links, and the MRE in its COM prediction results for the test set, with an equalizer configuration comprising a 3-taps TX FFE, an RX CTLE with dual DC gain, and a 12-taps RX DFE, is 1.6%.

Author Contributions

Conceptualization, L.W., J.Z. and Y.Z. (Yinhang Zhang); methodology, L.W., J.Z. and H.J.; software, L.W.; validation, L.W., J.Z. and Y.Z. (Yinhang Zhang); formal analysis, L.W., H.J. and Y.Z. (Yinhang Zhang); investigation, L.W.; resources, L.W., Y.Z. (Yinhang Zhang) and Y.Z. (Yongzheng Zhan); data curation, L.W. and Y.Z. (Yinhang Zhang); writing—original draft preparation, L.W.; writing—review and editing, X.Y., Y.Z. (Yinhang Zhang) and Y.Z. (Yongzheng Zhan); visualization, L.W.; supervision, X.Y., Y.Z. (Yinhang Zhang) and Y.Z. (Yongzheng Zhan); project administration, L.W.; funding acquisition, L.W., Y.Z. (Yinhang Zhang) and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 61861019, 62161012), the Hunan Provincial Department of Education (Grant No. 22B0525, 21A0335), the China Postdoctoral Science Foundation (Grant No. 2024M751268), and the Postgraduate Research Program of Jishou University (Grant No. Jdy23035).

Data Availability Statement

The data used to support the findings of the study are available within the article.

Conflicts of Interest

Author Yongzheng Zhan was employed by the company Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DNN	Deep Neural Network;
LSTM	Long Short-Term Memory Neural Network;
SBR	Single-Bit Response;
SI	signal integrity
COM	channel operating margin;
eCOM	enhanced channel operating margin;
GP	Gaussian process;
HPO	hyperparameter optimization;
EM	electromagnetic;
EMFS	electromagnetic field solver;
IBIS-AMI	Input/Output Buffer Information Specification Algorithmic Model Interface;
EH	eye height;
EW	eye width;
RL	return loss;
IL	insertion loss;
TD	time domain;
FD	frequency domain;
ML	machine learning;
SVM	Support Vector Machine;
LS-SVM	Least-Squares Support Vector Machine;
FNN	Feedforward Neural Network;
RFR	Random Forest Regression;
RNN	Recurrent Neural Network;
FFE	feedforward equalizer;
DFE	decision feedback equalizer;
CTLE	continuous-time linear equalizer;
ILD	insertion loss deviation;
ICR	insertion loss-to-crosstalk ratio;
ISI	inter-symbol interference;
FOM	figure of merit.

References

Hall, S.H.; Heck, H.L. Advanced Signal Integrity for High-Speed Digital Designs; John Wiley & Sons: Hoboken, NJ, USA, 2009; pp. 201–206. [Google Scholar]
Yan, J.; Zargaran-Yazd, A. IBIS-AMI modelling of high-speed memory interfaces. In Proceedings of the 2015 IEEE 24th Electrical Performance of Electronic Packaging and Systems (EPEPS), San Jose, CA, USA, 25–28 October 2015; pp. 73–76. [Google Scholar]
LAN/MAN Standards Committee of the IEEE Computer Society. IEEE Standard for Ethernet. 2012. Available online: https://ieeexplore.ieee.org/document/6419735 (accessed on 26 June 2024).
Wang, Y.; Hu, Q.S. A COM Based High Speed Serial Link Optimization Using Machine Learning. IEICE Trans. Electron. 2022, 105, 684–691. [Google Scholar] [CrossRef]
Gore, B.; Richard, M. An excerise in applying channel operating margin (COM) for 10GBASE-KR channel design. In Proceedings of the 2014 IEEE International Symposium on Electromagnetic Compatibility (EMC), Raleigh, NC, USA, 4–8 August 2014; pp. 653–658. [Google Scholar]
Ambasana, N.; Gope, B.; Mutnury, B.; Anand, G. Application of artificial neural networks for eye-height/width prediction from S-parameters. In Proceedings of the IEEE 23rd Conference on Electrical Performance of Electronic Packaging and Systems, Portland, OR, USA, 26–29 October 2014; pp. 99–102. [Google Scholar]
Ambasana, N.; Anand, G.; Mutnury, B.; Gope, D. Eye Height/Width Prediction From S-Parameters Using Learning-Based Models. IEEE Trans. Compon. Packag. Manuf. Technol. 2016, 6, 873–885. [Google Scholar] [CrossRef]
Ambasana, N.; Anand, G.; Gope, D.; Mutnury, B. S-Parameter and Frequency Identification Method for ANN-Based EyeHeight/Width Prediction. IEEE Trans. Compon. Packag. Manuf. Technol. 2017, 7, 698–709. [Google Scholar] [CrossRef]
Lu, T.; Sun, J.; Wu, K.; Yang, Z. High-Speed Channel Modeling with Machine Learning Methods for Signal Integrity Analysis. IEEE Trans. Electromagn. Compat. 2018, 60, 1957–1964. [Google Scholar] [CrossRef]
Lho, D.; Park, J.; Park, H.; Kang, H.; Park, S.; Kim, J. Eye-width and Eye-height Estimation Method based on Artificial Neural Network (ANN) for USB 3.0. In Proceedings of the 2018 IEEE 27th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), San Jose, CA, USA, 14–17 October 2018; pp. 209–211. [Google Scholar]
Sánchez-Masís, A.; Rimolo-Donadio, R.; Roy, K.; Sulaiman, M.; Schuster, C. FNNs Models for Regression of S-Parameters in Multilayer Interconnects with Different Electrical Lengths. In Proceedings of the 2023 IEEE MTT-S Latin America Microwave Conference (LAMC), San Jose, CA, USA, 6–8 December 2023; pp. 82–85. [Google Scholar]
Li, X.; Hu, Q. A Machine Learning based Channel Modeling for High-speed Serial Link. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 1511–1515. [Google Scholar]
Trinchero, R.; Canavero, F.G. Modeling of eye diagram height in high-speed links via support vector machine. In Proceedings of the 2018 IEEE 22nd Workshop on Signal and Power Integrity (SPI), Brest, France, 22–25 May 2018; pp. 1–4. [Google Scholar]
Cao, Y.; Zhang, Q.J. A New Training Approach for Robust Recurrent Neural-Network Modeling of Nonlinear Circuits. IEEE Trans. Microw. Theory Tech. 2009, 57, 1539–1553. [Google Scholar] [CrossRef]
Mutnury, B.; Swaminathan, M.; Libous, J.P. Macromodeling of Nonlinear Digital I/O Drivers. IEEE Trans. Adv. Packag. 2006, 29, 102–113. [Google Scholar] [CrossRef]
Cao, Y.; Erdin, I.; Zhang, Q.J. Transient Behavioral Modeling of Nonlinear I/O Drivers Combining Neural Networks and Equivalent Circuits. IEEE Microw. Wirel. Compon. Lett. 2010, 20, 645–647. [Google Scholar] [CrossRef]
Yu, H.; Michalka, T.; Larbi, M.; Swaminathan, M. Behavioral Modeling of Tunable I/O Drivers with Preemphasis Including Power Supply Noise. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2020, 28, 233–242. [Google Scholar] [CrossRef]
Yu, H.; Chalamalasetty, H.; Swaminathan, M. Modeling of Voltage-Controlled Oscillators Including I/O Behavior Using Augmented Neural Networks. IEEE Access 2019, 7, 38973–38982. [Google Scholar] [CrossRef]
Faraji, A.; Noohi, M.; Sadrossadat, S.A.; Mirvakili, A.; Na, W.C.; Feng, F. Batch-Normalized Deep Recurrent Neural Network for High-Speed Nonlinear Circuit Macromodeling. IEEE Trans. Microw. Theory Tech. 2022, 70, 4857–4868. [Google Scholar] [CrossRef]
Moradi, M.; Sadrossadat, A.; Derhami, V. Long Short-Term Memory Neural Networks for Modeling Nonlinear Electronic Components. IEEE Trans. Compon. Packag. Manuf. Technol. 2021, 1, 840–847. [Google Scholar] [CrossRef]
Li, Z.; Li, C.X.; Wu, Z.M.; Zhu, Y.; Mao, J.F. Surrogate Modeling of High-Speed Links Based on GNN and RNN for Signal Integrity Applications. IEEE Trans. Microw. Theory Tech. 2023, 71, 3784–7796. [Google Scholar] [CrossRef]
Lho, D.; Park, H.; Park, S.; Kim, S.; Kang, H.; Sim, B.; Kim, S.; Park, J.; Cho, K.; Song, J.; et al. Channel Characteristic-Based Deep Neural Network Models for Accurate Eye Diagram Estimation in High Bandwidth Memory (HBM) Silicon Interposer. IEEE Trans. Electromagn. Compat. 2021, 64, 196–208. [Google Scholar] [CrossRef]
Li, G.S.; Mao, C.S.; Zhao, W.S. Semi-supervised Regression Model for Eye Diagram Estimation of High Bandwidth Memory (HBM) Silicon Interposer. In Proceedings of the 2023 International Applied Computational Electromagnetices Society Symposium, Hangzhou, China, 15–18 August 2023; pp. 1–3. [Google Scholar]
Goay, C.H.; Ahmad, N.S.; Goh, P. Temporal Convolutional Networks for Transient Simulation of High-Speed Channels. Alex. Eng. J. 2023, 74, 643–663. [Google Scholar] [CrossRef]
Li, B.W.; Jiao, B.; Chou, C.H.; Mayder, R.; Franzon, P. Self-Evolution Cascade Deep Learning Model for High-Speed Receiver Adaptation. IEEE Trans. Compon. Packag. Manuf. Technol. 2020, 10, 1043–1053. [Google Scholar] [CrossRef]
Li, B.W.; Jiao, B.; Chou, C.H.; Mayder, R.; Franzon, P. CTLE Adaptation Using Deep Learning in High-Speed SerDes Link. In Proceedings of the IEEE 70th Electronic Components and Technology Conference (ECTC), Orlando, FL, USA, 3–30 June 2020; pp. 952–955. [Google Scholar]
Zhang, H.H.; Xue, Z.S.; Liu, X.Y.; Li, P.; Jiang, L.J.; Shi, G.M. Optimization of High-Speed Channel for Signal Integrity with Deep Genetic Algorithm. IEEE Trans. Electromagn. Compat. 2022, 64, 1270–1274. [Google Scholar] [CrossRef]
Shan, G.; Li, G.; Wang, Y.; Xing, C.; Zheng, Y.; Yang, Y. Application and Prospect of Artificial Intelligence Methods in Signal Integrity Prediction and Optimization of Microsystems. Micromachines 2023, 14, 344. [Google Scholar] [CrossRef]
Mellitz, R.; Ran, A.; Li, M.P.; Ragavassamy, V. Channel Operating Margin (COM): Evolution of Channel Specifications for 25 Gbps and Beyond. In Proceedings of the DesignCon 2013, Santa Clara, CA, USA, 28–31 January 2013. [Google Scholar]
Pu, B.; He, J.; Harmon, A.; Guo, Y.; Liu, Y.; Cai, Q. Signal Integrity Design Methodology for Package in Co-packaged Optics Based on Figure of Merit as Channel Operating Margin. In Proceedings of the 2021 IEEE International Joint EMC/SI/PI and EMC Europe Symposium, Raleigh, NC, USA, 26 July–13 August 2021; pp. 492–497. [Google Scholar]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Bono, F.M.; Radicioni, L.; Cinquemani, S. A Novel Approach for Quality Control of Automated Production Lines Working under Highly Inconsistent Conditions. Eng. Appl. Artif. Intell. 2023, 122, 106149. [Google Scholar] [CrossRef]
Steel, R.G.D.; Torrie, J.H. Principles and Procedures of Statistics; McGraw Hill: New York, NJ, USA, 1960. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
Frazier, P.I. A Tutorial on Bayesian Optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar]

Figure 1. Common structure of a SerDes circuit.

Figure 2. (a) FD. (b) TD. Illustration of two different simulation methods for detecting SI.

Figure 3. A typical COM model.

Figure 4. COM calculation process.

Figure 5. Channel fabrication and simulation process.

Figure 6. The structure of the M-P neuron model.

Figure 7. The structure of this study’s DNN.

Figure 8. The structure of our Transformer for regression tasks.

Figure 9. The structure of the cascaded model.

Figure 10. Comparison of predicted and actual values for (a) IL and (b) RL.

Figure 11. Relative errors of eye diagrams for test set channels in the two modulation formats.

Figure 12. True and predicted results for (a) the SBR before equalization and (b) the SBR after equalization for 3-taps FFE + CTLE + 12-taps DFE test set channels (ID: 3).

Figure 13. Relative error and COM values for PAM4 coding (ID: 1–40) in the standard equalizer configuration.

Figure 14. Comparison of predicted and actual equalizer parameter results.

Table 1. Channel feature parameters and quantitative methods.

Feature Variable	Input	Output
Er	PCB sheet permittivity real part and TanD	Eye diagrams COM values Equalizer parameters
Df	PCB sheet dissipation factor
Trace	PCB trace length
	PCB trace width
	Differential line space
	PCB trace thickness
	PCB trace conductivity
	PCB trace structure (profile)
Zd	Differential impedance
PCB Type	Stripline is 1 and microstrip is 0

Table 2. Configuration of COM parameters.

Parameter	Symbol	Setting	Unit
Symbol rate	$f_{b}$	20	GBd
Number of signal levels	L	4
Samples per UI	M	32
Target detector error ratio	$D E R_{0}$	$1 0^{- 8}$
Transmitter output voltage, victim	$A_{v}$	0.4	V
CTLE DC gain	$g_{D C}$	[−12:1:0]	dB
CTLE DC gain2	$g_{D C_H P}$	[−6:1:0]	dB
CTLE HP pole	$f_{H P_P Z}$	0.25	GHz
CTLE zero	$f_{Z}$	$f_{b} / 2.5$	GHz
CTLE pole1	$f_{P 1}$	$f_{b} / 2.5$	GHz
CTLE pole2	$f_{P 2}$	$f_{b}$
FFE main cursor	c (0)	0.62
FFE pre-cursor	c (−1)	[−0.18:0.02:0]
FFE post-cursor	c (1)	[−0.38:0.02:0]
DFE length	$N_{b}$	12	UI
DFE magnitude limit	$b_{\max} (1)$	0.75
DFE magnitude limit	$b_{\max} (2 ~ N_{b})$	0.2
COM pass threshold	$t h$	3	dB

Table 3. Dataset split settings.

	Number of Channels
Datasets	A: 280	B: 215 (170 from A)
Training Set	184	140
Validation Set	40	40
Testing Set	56	40 (5 from the validation set)
Total	325

Table 4. Prediction of eye diagrams with the DNN model.

Eye	Modulation	RMSE	MAE	MAPE	MRE (%)
EH (mV)	PAM3	4.8 × 10⁻¹	3.3 × 10⁻¹	5.7 × 10⁻³	2.5
EH (mV)	PAM4	4.0 × 10⁻¹	2.9 × 10⁻¹	8.1 × 10⁻³	2.8
EW (UI)	PAM3	4.4 × 10⁻³	3.4 × 10⁻³	9.3 × 10⁻³	2.7
EW (UI)	PAM4	4.4 × 10⁻³	3.6 × 10⁻³	1.2 × 10⁻²	3.5

Table 5. Predicted COM error values for the cascaded model.

COM	RMSE	MAE	MAPE	MRE (%)
3-taps FFE + CTLE + 12-taps DFE	4.5 × 10⁻²	3.5 × 10⁻²	5.1 × 10⁻³	1.6%
3-taps FFE + CTLE + 8-taps DFE	5.0 × 10⁻²	3.7 × 10⁻²	5.6 × 10⁻³	2.0%
3-taps FFE + CTLE + 4-taps DFE	4.2 × 10⁻²	3.2 × 10⁻²	4.1 × 10⁻³	1.7%
CTLE + 12-taps DFE	9.6 × 10⁻²	7.1 × 10⁻²	1.4 × 10⁻²	5.7%
12-taps DFE	1.6 × 10⁻¹	1.1 × 10⁻¹	3.0 × 10⁻²	17.5%

Table 6. Predicted error values of different models.

Model	RMSE	MAE	MAPE	MRE (%)
DNN	6.9 × 10⁻²	4.3 × 10⁻²	7.8 × 10⁻³	6.5
LSTM	5.3 × 10⁻²	4.4 × 10⁻²	6.4 × 10⁻³	2.0
DNN-LSTM	5.0 × 10⁻²	4.1 × 10⁻²	6.0 × 10⁻³	1.8
Transformer	4.7 × 10⁻²	3.7 × 10⁻²	5.5 × 10⁻³	1.7
DNN–Transformer	4.6 × 10⁻²	3.5 × 10⁻²	5.1 × 10⁻³	1.6

Table 7. Functional comparison of different models for high-speed links.

Model	Physical Parameters Variable	Transient Waveform Prediction	FD Prediction	Link Optimization	GPU/CPU Time (ms)	RMSE	MAE
SVM [13]	Yes	No	No	No	<5 *	3.8 × 10⁻¹	3.1 × 10⁻¹
RFR [12]	Yes	No	Yes	No	<5 *	4.8 × 10⁻¹	3.8 × 10⁻¹
DNN [9]	Yes	No	No	No	<5	6.9 × 10⁻²	4.3 × 10⁻²
LSTM [20]	No	Yes	No	No	<5	5.3 × 10⁻²	4.4 × 10⁻²
GNN-RNN [21]	Yes	Yes	No	No	567.1	\	\
DNN–Transformer	Yes	Yes	Yes	Yes	792.8 *	4.6 × 10⁻²	3.5 × 10⁻²

* GPU inference time.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, L.; Zhou, J.; Jiang, H.; Yang, X.; Zhan, Y.; Zhang, Y. Predicting the Characteristics of High-Speed Serial Links Based on a Deep Neural Network (DNN)—Transformer Cascaded Model. Electronics 2024, 13, 3064. https://doi.org/10.3390/electronics13153064

AMA Style

Wu L, Zhou J, Jiang H, Yang X, Zhan Y, Zhang Y. Predicting the Characteristics of High-Speed Serial Links Based on a Deep Neural Network (DNN)—Transformer Cascaded Model. Electronics. 2024; 13(15):3064. https://doi.org/10.3390/electronics13153064

Chicago/Turabian Style

Wu, Liyin, Jingyang Zhou, Haining Jiang, Xi Yang, Yongzheng Zhan, and Yinhang Zhang. 2024. "Predicting the Characteristics of High-Speed Serial Links Based on a Deep Neural Network (DNN)—Transformer Cascaded Model" Electronics 13, no. 15: 3064. https://doi.org/10.3390/electronics13153064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting the Characteristics of High-Speed Serial Links Based on a Deep Neural Network (DNN)—Transformer Cascaded Model

Abstract

1. Introduction

2. Fundamental Methods

2.1. Signal Integrity and S-Parameters

2.2. Channel Operating Margin

3. Dataset Construction

3.1. Channel Design and Dataset Splitting

3.2. COM Configuration and Equalizer Setting

3.3. Dataset Splitting

4. Construction of the Cascaded Model and Training

4.1. DNN

4.2. Transformer for Regression

4.3. Cascaded Model and Training

5. Numerical Results

5.1. IL and RL

5.2. EH/EW of the Eye Diagrams

5.3. COM

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI