Next Article in Journal
A Run-Time Reconfiguration Method for an FPGA-Based Electrical Capacitance Tomography System
Next Article in Special Issue
Finite-Size and Illumination Conditions Effects in All-Dielectric Metasurfaces
Previous Article in Journal
Processing at the Edge: A Case Study with an Ultrasound Sensor-Based Embedded Smart Device
Previous Article in Special Issue
Calculation of Magnetic Flux Density Harmonics in the Vicinity of Overhead Lines
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Compressed Complex-Valued Least Squares Support Vector Machine Regression for Modeling of the Frequency-Domain Responses of Electromagnetic Structures

by
Nastaran Soleimani
and
Riccardo Trinchero
*
Department of Electronics and Telecommunications, Politecnico di Torino, 10129 Torino, Italy
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(4), 551; https://doi.org/10.3390/electronics11040551
Submission received: 29 December 2021 / Revised: 4 February 2022 / Accepted: 8 February 2022 / Published: 11 February 2022
(This article belongs to the Special Issue Computational Electromagnetics for Industrial Applications)

Abstract

:
This paper deals with the development of a Machine Learning (ML)-based regression for the construction of complex-valued surrogate models for the analysis of the frequency-domain responses of electromagnetic (EM) structures. The proposed approach relies on the combination of two-techniques: (i) the principal component analysis (PCA) and (ii) an unusual complex-valued formulation of the Least Squares Support Vector Machine (LS-SVM) regression. First, the training and test dataset is obtained from a set of parametric electromagnetic simulations. The spectra collected in the training set are compressed via the PCA by exploring the correlation among the available data. In the next step, the compressed dataset is used for the training of compact set of complex-valued surrogate models and their accuracy is evaluated on the test samples. The effectiveness and the performance of the complex-valued LS-SVM regression with three kernel functions are investigated on two application examples consisting of a serpentine delay structure with three parameters and a high-speed link with four parameters. Moreover, for the last example, the performance of the proposed approach is also compared with those provided by a real-valued multi-output feedforward Neural Network model.

1. Introduction

In recent decades, Machine Learning (ML) methods have been widely applied to construct accurate and fast-to-evaluate surrogate models able to reproduce the input–output behavior of electromagnetic (EM) structures as a function of deterministic and uncertain parameters [1,2,3,4,5,6,7,8,9,10,11,12,13,14]. In the above scenario, advanced data-driven and ML-based regressions, such as Polynomial Chaos Expansion [1,2,3,4], Support Vector Machine (SVM) regression [5,6], Least-Squares Support Vector Machine (LS-SVM) regression [7], Gaussian Process regression (GPR) [8], and feedforward [9,10,11,12], deep [12,13], convolutional [12] and Long Short-Term Memory (LSTM) [14] neural networks (NNs), have been successful applied to uncertainty quantification (UQ) and optimization in EM applications. The common idea is to adopt the above methods to train a regression model by using a limited number of training samples generated via a set of expensive full-wave or circuital simulations with the so-called computational model. Then, since the resulting surrogate model is known in a closed form, it can be suitably embedded within UQ and an optimization scheme as an efficient and effective “surrogate” of the expensive computational model [15].
Despite the high performance shown in realistic electronic and EM applications, most of the ML techniques have been developed for dealing with real-valued data [16,17,18]. However, complex-valued data are widely used in electronic applications (e.g., for AC simulations and frequency-domain analysis). The simplest strategy for extending the applicability of real-valued ML techniques to the case of complex-valued data is based on the so-called dual-channel formulation [19,20,21]. The underlaying idea is to recast the complex-valued problem into two uncorrelated real-valued ones, by stacking the real and imaginary part of the complex input and output values. The main advantage of the above procedure is that plain real-valued ML techniques can be directly adopted without requiring any generalization or improvement. However, such an approach completely ignores the possible correlation among the real and imaginary parts of the complex-valued output, thus leading to possible issues regarding accuracy and robustness to noise [19,22]. For the above reasons, alternative pure complex-valued formulations have been proposed for several ML techniques, such as NN [22], SVM regression [23], kernel Least Squares regression [19,20,21,24,25], LS-SVM regression [7] and GPR [26].
This paper investigates the performances of a pure-complex implementation of a kernel-based ML technique such as the LS-SVM regression. The proposed complex-value formulation is based on the theoretical framework developed in [19,21,24] for a plain kernel-based regression in which the regularization term is missing. Indeed, like the SVM regression, the LS-SVM regression is a powerful and well-consolidated regression technique combining the beneficial effect of the kernel and of the Tikhonov regularization [17,25,27]. Specifically, thanks to the kernel trick, the above regression allows the construction of a non-parametric model in which the number of unknowns to be estimated by the regression problem is independent from the dimensionality of the input space. Furthermore, the regularizer terms limits the overfitting issue and increases the model generalization.
The aim of this work is twofold. First, the mathematical formulation of the complex-value LS-SVM regression is developed, and the mathematical link between the pure complex formulation and the dual channel one is highlighted, where the latter is a special case of the more general complex-valued formulation. Second, the performance of compressed surrogate models constructed via the dual-channel and the pure complex-value LS-SVM regression with a complex and pseudo kernel are investigated by considering the frequency-domain responses of two test-cases consisting of a serpentine delay structure and a high-speed link, with three and four parameters, respectively. It is important to stress that, in the above applications, a compressed representation of the frequency-domain responses based on principal component analysis (PCA) [18,28,29,30] has been used to remove the redundant information and to reduce the number of LS-SVM-based models that need to be trained [31]. For the sake of completeness, for the second example, the results of the proposed approaches are compared with those provided by a real-valued multi-output feedforward NN model [11,12].
The paper is organized as follows. Section 2 describes the problem statement addressed during the paper, and its challenges. Section 3 provides an overview of the mathematical background of the PCA. Section 4 provides a self-contained formulation of the complex-valued LS-SVM regression with specific emphasis on the differences between the dual-channel and the pure complex implementation. In Section 5, the accuracy and the performance of the proposed complex-valued LS-SVM regression are investigated on two test cases consisting of a serpentine delay line structure and a transfer function of a high-speed interconnect link. Finally, the paper ends with the conclusions in Section 6.

2. Problem Statement and Challenges

Starting from a training set D = x i , y i f k i , k = 1 L , K and collecting the configuration of the input parameters x i = x i , 1 , , x i , d T X , with X d and the corresponding output y i f k ℂ computed by a full-computational model (i.e., y i f k = M f k ; x i ) for a set of values of the independent variable f k (e.g., the frequency), our goal is to build a surrogate model M ˜ that approximates the training data and is able to generalize well on the “unseen” data, usually known as test samples, such as:
y f k ; x M ˜ f k ; x ,
For any x   X and for k = 1 , , K .
Without loss of generality, we seek a data-driven regression technique able to provide an accurate and efficient approximation of the actual behavior of the computational model M f k ; x i . This means that the surrogate model M ˜ should be constructed by using a small set of training samples (i.e., L should be as small as possible), since the training samples are usually generated via a set of computational expensive simulations based on the full-computational model M .
The above modeling problem is rather challenging. First, the regression technique should be able to work in the complex domain. Moreover, the resulting surrogate model should be able to mimic the behavior of a multi-output complex-valued function provided by the computational model (i.e., the output values y i f k ) , in which its values unavoidably depend on both the configuration of the input parameters x i and on the variable f k .
A possible modeling scheme consists of considering the free variable f k as an extra input parameter. This means that we are seeking a “single” advanced model able to represent in a closed-form the impact of both the parameter x and the frequency f k on the system output. As an example, such a model can be obtained via a plain or recurrent NN [12,13]. However, due to the complexity of the problem at hand, the resulting neural network structure usually requires a large number of neurons and several hidden layers. Moreover, despite their flexibility, the training of NN-based structures requires the solution of a non-convex optimization leading to training issues (i.e., expensive training time and/or huge number of training samples [13,32]).
In order to partially overcome the above issues, we can think of using a single-output regression trained via the solution of a convex optimization problem, such as the plain kernel Least Squares regression [21], the LS-SVM regression [33], or the SVM regression [34], and build one model for each frequency point. In this way, the model generation requires less training samples; unfortunately, however, the overall model requires training K single-output surrogates, thus making the training process extremely expensive and cumbersome when a large number of frequency points are considered.
A data compression strategy can be seen as a good compromise between the two modeling schemes depicted in the previous section. As an example, PCA [18,28,29,30] allows the extraction of the inherent correlation existing among several realizations of output data samples at different frequency points, thus leading to a compressed representation of the frequency spectra. In such a scenario, the number of actual single-output models required to represent the data can be heavily reduced.

3. PCA Compression

This section briefly presents the mathematical background of the PCA [18,28,29,30]. Let us consider the output dataset y i f k i , k = 1 L , K that collects L spectra computed for different configurations of the input parameters (i.e., the number of training outputs), each having K frequency points. The above dataset can be recast as a K × L matrix Y , such that the element Y i , k = y i f k . For normalization purposes, a zero-mean matrix is considered:
Y ˜ = Y μ   ,
where μ is the mean value of Y , calculated row-wise and subtracted column-wise
The new rectangular matrix Y ˜ K × L can be represented by means of the compact singular value decompositions (SVD) [28,30]:
Y ˜ = U S V H ,
where, assuming that there are L columns of U and V associated with non-zero singular values, U = u 1 , u L   K × L and V = v 1 , , v L     L × L are orthogonal matrices, such that U H U = V H V = I L × L , collecting the left and right singular vectors, and S = diag σ 1 , σ L   L × L is a diagonal matrix collecting the singular values sorted in a decreasing order σ 1 σ 2 σ L . Now, a compressed approximation of the actual matrix Y ˜ can be obtained as follows:
Y ˜ U ˜ S ˜ V ˜ H ,
where U ˜ = u 1 , , u n   K × n ˜ with u i   K × 1 and V ˜ = v 1 , , v n   n ˜ × L with v i   L × 1 are the reduced left and right-eigenvector matrices collecting only the first n ˜ components (i.e., the first n ˜ columns of the original matrices U and V ), and S ˜ = d i a g σ 1 , , σ n ˜ is a reduced diagonal matrix containing the first n ˜ singular values.
The above relationship can be used to obtain a compressed representation Z n ˜ × L   of the original matrix Y , such that:
Z = U ˜ H Y ˜ = S ˜ V ˜ H ,
It is important to note the resulting compressed matrix Z n ˜ × L is smaller than the original matrix Y ˜ , since usually n ˜ K . Moreover, the rows of the compressed matrix Z can be considered to be the realizations of a new set of output variables z x l l = 1 L , with z x l n ˜ × 1 , which can be considered as the collection of L samples of a compressed n ˜ -dimensional output variable. Specifically, the element ( i , j ) of the matrix Z , corresponds to the i -th output of the compressed representation evaluated at the j-th configuration of the input parameters, i.e., Z i j = z i x j . This means that only n ˜ single-output models need to be trained to represent the whole dataset provided by the matrix Y, thus leading to a substantial improvement in the training time. Once a surrogate model for each of the n ˜ components of the compressed multi-output representation in Z is available, the overall compress surrogate can be inexpensively used to predict the system output y x ¯ K × 1 for a generic test sample x ¯ X as follows:
y x ¯ μ + U ˜ Z ¯   ,
where Z ¯ = z 1 x ¯ , , z n ˜ x ¯ T n ˜ × 1 .
It is important to remark that the compressed representation z x provided by the PCA compression allows approximating the actual data y x with a tunable accuracy depending on the number of PCA components n ˜ , such that [28]:
σ n ˜ + 1 σ 1 2 ε 2 ,
where ε is a given error threshold tuned by the user.

4. Complex Valued Least-Square Support Vector Machine Regression

The LS-SVM regression is a kernel-based regression with an L2 regularizer, which allows constructing a non-parametric model in which the number of unknowns and complexity is independent from the number of input parameters via the solution of a simple linear system, i.e., by solving a convex optimization problem [33].
However, the plain formulation of such a technique has been developed for real-valued datasets only. The aim of this section is to extend the mathematical formulation of the LS-SVM regression to the more general case of the complex-data problem. The proposed formulation is based on the preliminary results presented in [25] and on the results presented in [19,21] for a classical reproducing kernel Hilbert space (RKHS) regression, in which the regularizer term is neglected. For the sake of simplicity, the following complex valued formulation is developed for the simplified case of single-output regression, but all the calculations can be easily extended to the case of a multi-output problem.
Starting from a set of complex-valued samples D = x l , y l l = 1 L where x l d and y l = y ( x l ) , the primal space formulation of the LS-SVM regression is written:
y M ˜ x = i = 1 N w i ϕ i x + b = w , Φ x + b ,
where w = w 1 , , w N T = w R + j w I N are the complex regression coefficients (i.e., w i = w i , R + j w i , I ), Φ x = Φ R x + j Φ I x = ϕ 1 x , , ϕ N x T is a vector-valued complex function Φ : d N collecting the complex-valued basis functions ϕ i x that provide a map between the parameter space and the feature space, b = b R + j b I is the bias term, and w , Φ x = Φ H x w = w T Φ x is the inner product in the complex domain.
The primal space formulation of the complex-valued regression in Equation (8) can be written in term of its real and imaginary components:
M ˜ x = M ˜ R x + j M ˜ I x = w R Φ R T x + w I Φ I T x + b R + j w I Φ R T x w R   Φ I T x + b i   ,
In the above primal space formulation, the regression unknowns (i.e., the coefficients in w R and w I , and the bias term b R and b I , respectively) are estimated by solving the following convex optimization problem:
min w R , w I , b R , b I , e R , e I 1 2 w R T w R + 1 2 w I T w I + γ R 2 l = 1 L e R , l 2 + γ I 2 l = 1 L e I , l 2
Such that
e R , l = R e y l M ˜ x l = y R , l   w R Φ R x l + w I Φ I x l + b R
e R , l = I m y l M ˜ x l = y I , l   ( w I Φ R x l w R Φ I x l + b I )
For l = 1 , ,   L , where w R T w R + w I T w I = w 2 2 is the L2 regularizer and the terms e R , l 2 and e I , l 2 provide the error of a squared loss function.
The Lagrangian for the above constraint optimization problem is written:
L ( w R , w I , b R , b I , e R , e I , α R , α I ) = 1 2 w R T w R + 1 2 w I T w I + γ R 2 l = 1 L e R , l 2 + γ I 2 l = 1 L e I , l 2 l = 1 L α R , l e R , l y R , l   + w R T Φ R x l + w I T Φ I x l + b R l = 1 L α I , l e I , l y I , l   + w I T Φ R x l w R T Φ I x l + b I   ,
where α l = α R , l + j α I , l are the Lagrangian multipliers such that α R , l , α I , l 0 for l = 1 , ,   L .
By computing the partial derivatives of the Lagrangian L w R , w I , b R , b I , e R , e I , α R , α I with respect to its parameters:
L w R = 0 w R = l = 1 L α R , l Φ R x l α I , l Φ I x l ,
L w I = 0 w I = l = 1 L α R , l Φ I x l + α I , l Φ R x l ,
L b R = 0 l = 1 L α R , l = 0   ,
L b I = 0 l = 1 L α I , l = 0   ,
L e R , l = 0 γ R e R , l = α R , l   ,
L e I , l = 0 γ I e I , l = α I , l   ,
L α R , l = 0 e R , l y R , l   + w R T Φ R x l + w I T Φ I x l + b R = 0 ,
L α I , l = 0 e I , l y I , l   + w I T Φ R x l w R T Φ I x l + b I = 0 ,
For l = 1 , , L .
By substituting (12)–(17) in (18) and (19):
α R , l γ R y R , l   + i = 1 L α R , i Φ R x i α I , i Φ I x i T Φ R x l + i = 1 L α R , i Φ I x i + α I , i Φ R x i T Φ I x l + b R = 0 α I , l γ I y I , l   + i = 1 L α R , i Φ I x i + α I , i Φ R x i T Φ R x l i = 1 L α R , i Φ R x i α I , i Φ I x i T Φ I x l + b I = 0 l = 1 L α R , l = 0 l = 1 L α I , l = 0
For l = 1 , , L .
The above system of equations is the dual-form representation of the optimization problem in Equation (10), in which the original regression coefficients collected in the vectors w R and w I are replaced by the Lagrangian multipliers α R and α I . It is important to remark that, although the number of unknowns in the primal space (i.e., the dimensionality of w = N ) is given by the number of basis functions, in the dual formulation the number of unknowns (i.e., the Lagrangian multipliers collected in the vectors α = α R + j α I ) is always equal to the number of the training samples L . This means the resulting model is non-parametric, i.e., a model in which its complexity is independent from the number of both the input parameters and the basis functions.
In order to define the kernel function and the dual formulation of the complex-valued LS-SVM, the first two equations in Equation (20) can be rewritten as follows:
α R , l γ R y R , l   + i = 1 L α R , i Φ R T x i Φ R x l + Φ I T x i Φ I x l + i = 1 L α I , i Φ R T x i Φ I x l Φ I T x i Φ R x l + b R = 0 ,
α I , l γ I y I , l   + i = 1 L α R , i Φ I T x i Φ R x l Φ R T x i Φ I x l + i = 1 L α I , i Φ R T x i Φ R x l + Φ I T x i Φ I x l + b I = 0 ,
For l = 1 , , L .
Now, let us define a complex-valued kernel k c x , x :
k c x , x = Φ x , Φ x = Φ T x Φ x = Φ R x + j Φ I x T Φ R x j Φ I x = Φ R T x Φ R x + Φ I T x Φ I x + j Φ I T x Φ R x Φ R T x Φ I x = k x , x + j k I x , x
Similar to the real-valued formulation, the kernel function is defined as the inner product in the complex feature space of the basis functions evaluated at x and x , where: k x , x = Φ R T x Φ R x + Φ I T x Φ I x and k I x , x = Φ I T x Φ R x Φ R T x Φ I x .
Using the above definition, Equations (21) and (22) can be rewritten as follows:
α R , l γ R y R , l + i = 1 L α R , i k x i , x l α I , i k I x i , x l + b R = 0
α I , l γ I y I , l + i = 1 L α I , i k x i , x l + α R , i k I x i , x l + b I = 0
For l = 1 , , L .
In the above equations, the regression unknowns can be computed by solving the following linear system:
K R R + I L γ R K R I 1 0 K I R K I I + I L γ I 0 1 1 T 0 T 0 0 0 T 1 T 0 0 α R α I b R b I = y R y I 0 0
where I L is the identity matrix of size L × L , 1 T = 1 , , 1 T 1 × L , 0 T = 0 , , 0 T 1 × L , α R ,   α I L × 1 are vectors collecting the real and imaginary part of the regression coefficients, b = b R + j b I is the bias term and K R R ,   K R I , K I R ,   K I I L × L are kernel matrices defined as:
K R R i , j = K I I i , j = k x i , x j
K I R i , j = K R I i , j = k I x i , x j
for any i , j = 1 , , L . The parameters γ R and γ I are the regularizer hyperparameters tuned by the user and provide a trade-off between the model flatness and its accuracy [34].
It is important to remark that the above linear system is square and, therefore, if the determinant of the square matrix is different from 0, it always yields a unique solution, which leads to the following dual space formulation for complex-valued of LS-SVM regression:
y x = l = 1 L α l k C x l , x + b
By substituting Equation (23) in the above equation gives:
y x = l = 1 L α R , l k x l , x α I , l k I x l , x + b R + j α R , l k I x l , x + α I , l k x l , x + b I
From the above formulation, it is clear that by means of the complex kernel k C , the complex-valued LS-SVM regression in the dual space is able to account for possible correlation between the real and imaginary part of y x .

4.1. Complex-Valued Kernel

There are several strategies to construct a complex kernel k C . Within this paper, we will investigate two of them, the independent kernel, referred to as the complex valued complex function (CVCF) [19], and the pseudo kernel, referred to as the pseudo complex-valued function (PCF) [35,36]. A generic CVCF kernel can be constructed starting from a real-valued kernel k as follows:
k C x , x = k x R , x R + k x I , x I + j k x R , x I + k x I , x R ,
The above complex kernel is fully compliant with the definition provided in Equation (23). The real kernel k can be any real kernel function, e.g., linear kernel, radial basis function (RBF) kernel and polynomial kernel. Hereafter, in this paper, for the CVCF kernel we will use k as the RBF kernel, i.e.,
k x , x = exp 1 2 σ 2 x x ) 2 ,
where σ is the kernel hyperparameter, which in this work will be tuned, along with the regularizer hyperparameters, during the model training by combining cross validation (CV) with a Bayesian optimizer [37,38].
As an alternative, a family of kernels based on the PCF can be suitably generated from the isotropic complex covariance function, such that (additional mathematical details are provided in [19,36]):
k c x , x = cos c x x k x , x + j   sin c x x k x , x ,
where k x , x k can be selected as any kernel function and c is a new hyperparameter. In this specific case, a Rational Quadratic kernel is adopted, such as k x , x = σ 2 1 + x x 2 2 a l 2 a , which is a Rational Quadratic kernel in which a, l and σ are additional hyperparameters. Moreover, in this case all the hyperparameters are tuned via CV and a Bayesian optimizer [37,38].

4.2. Dual Channel Kernel (DCK) LS-SVM for Complex-Valued Data

The dual channel kernel (DCK) formulation can be seen as a special case of the general mathematical framework presented in the previous section. The underlying idea is to recast the complex variables in terms of their real and imaginary part and to work with a standard real kernel, i.e., k c = k : d × d .
Under the above assumption, the regression problem in Equation (26) can be simplified as follows:
K R R + I L γ R 0 L 1 0 0 L K R R + I L γ I 0 1 1 T 0 T 0 0 0 T 1 T 0 0 α R α I b R b I = y R y I 0 0
where 0 L is the L × L null matrix, K R R L × L such that K R R i , j = k x i , x j , whilst the matrices K I R = K R I = 0 L .
It is important to remark that, in the above formulation, there is no coupling between the real and imaginary coefficients α R and α I , and bias terms b R and b I . Indeed, the solution of the above linear system is equivalent to solving two decoupled ones, accounting for the real and imaginary parts of the regression unknowns independently, such as:
K R R + I L γ R 1 1 T 0 α R b R = y R 0 ,
K R R + I L γ I 1 1 T 0 α I b I = y I 0 ,
In this work, a standard RBF kernel is considered as the real kernel k x i , x j . In the above scenario, the dual space formulation of the LS-SVM is written [27]:
y R x = l = 1 L α R , l k x l , x + b R ,
y I x = l = 1 L α I , l k x l , x + b I ,
It is important to note that, in the above formulation, the model for the real and imaginary parts of y are built separately, thus ignoring any possible correlation between them. Furthermore, the above models can be trained using the LS-SVM Lab toolbox for the LS-SVM available in MATLAB [39].

5. Application Examples

This section compares the accuracy and the robustness against noise of the three implementations of the complex-valued LS-SVM regression provided in Section 4 by considering two different application examples. Specifically, the proposed approaches are applied to predict the scattering parameters of a serpentine structure with three parameters and the transfer function of a high-speed link with four parameters.

5.1. Example I

As a first test case, the complex-valued LS-SVM regression applied to calculate the scattering parameters of a serpentine delay line structure is presented. Serpentine lines are widely used in printed circuit board (PCB) design to compensate time delays introduced by the trace routing. However, the frequency-domain behavior of such a structure is heavily affected by its geometrical and electrical parameters and should be carefully assessed during the design phase to avoid signal and power integrity issues and to meet design constraints [40].
The structure of the serpentine delay line considered in this example is shown in Figure 1 (inspired by [40]). The S21 scattering parameters of the above structure are investigated as a function of three parameters (i.e., x = ε r , L L , S W T ) in a frequency bandwidth from 1 MHz to 1 GHz. Table 1 shows the range of variability serpentine delay line parameters that are simulated to produce the training and test data.
The whole dataset consists of 3000 samples (1000 training data and 2000 test samples). The samples were generated via Latin Hypercube Sampling (LHS) by assuming a uniform variability between their maximum and minimum value. For each configuration of the geometrical parameters, the corresponding scattering parameters were computed for 5000 linearly spaced frequency sample points. The samples were generated via a set of parametric simulations with the full-wave solver available in CST.
The PCA compression is then used to compress the data in the frequency domain and to reduce the computation for the model training. Figure 2 shows the behavior normalized singular values of the frequency points of the training dataset. The plot shows that n ¯ = 9 is enough to represent the whole training set with a 0.001% threshold.
After applying the PCA, the performances of the LS-SVM regression using the PCF and the CVCF complex kernel function (see Section 4.1), and of the DCK LS-SVM regression (see Section 4.2), were assessed on the test samples for the S21 parameter. To investigate the performance of the mentioned methods, the relative Root Mean Square Error (NRMSE) is calculated as the following equation:
N R M S E   % = 100 1 T t = 1 T X y X y ^ 2   1 T t = 1 T X y 2 ,
where X y can be either the real or imaginary part of the actual test samples, X y ^ is the corresponding prediction estimated via the proposed metamodels, and T is the number of test samples.
Figure 3 and Figure 4 show the normalized error for real and imaginary parts of these three regression approaches for an increasing number of training samples (i.e., L = 50, 250, and 500).
The plots highlight the improved accuracy of the CVCF and PCF with respect to the DCK. Indeed, with the DCK-based model, we are implicitly neglecting any kind of correlation between the real and imaginary parts of the S parameters, and this lack of complexity becomes even more evident when a low number of training samples is used to train the model.

5.2. Example II

The second application example is based on the high-speed link, as depicted in Figure 5, representing a signal distribution on a PCB. Similar to the previous example, the frequency response of the link, and thus its performance, can be greatly influenced by possible variations of its internal parameters [5].
Specifically, the proposed modeling approaches are here adopted to build a surrogate model for the frequency-domain behavior of the complex-valued transfer function, in which y x ; f = V o u t ( f ; x E ( f , as a function of the values of four lumped components C 1 x 1 , C 2 x 2 , L 1 x 3 , and L 2 x 4 , is defined by four uniformly distributed random variables with a variation of ±50% around their central value collected in x = x 1 , x 2 , x 3 , x 4 T (additional details are provided in Table 2).
Four sets consisting of L = 20, 100, 150, and 500 training input configurations were generated via an LHS and used as input for a computational model implemented in MATLAB, which provides as output the corresponding transfer function evaluated at 200 frequency points. The PCA is applied to remove redundant information, leading to a compressed representation of the original dataset with only n ¯ = 10 components using a threshold of 0.01%.
The compressed training sets are then used to train three different surrogate models based on the DCK, PCF, and CVCF LS-SVM regressions. For the sake of completeness, the predictions of the above methods are compared with those provided by an additional surrogate model built via a multi-output feedforward NN structure [34,35] considering the real and imaginary parts of the considered transfer function (i.e., using the dual channel implementation). The NN is trained via the Gradient Decent backpropagation method implemented within the Neural Net Fitting Tool available in the MATLAB Deep Learning Toolbox. The network consists of three hidden layers with 50, 20, and 15 neurons, respectively. The activation function is the hyperbolic tangent sigmoid.
Figure 6 and Figure 7 show the performance of each method on a test set consisting of 1000 samples assessed via the relative NRMSE in Equation (39) for the real and imaginary parts, respectively. The results clearly highlight the improved accuracy achieved via the PCF. Moreover, as expected, due to its simplified formulation, the DCK again provides the lowest accuracy. By comparison, NN shows the lower accuracy when a small number of training samples is used (i.e., up to 150 training samples). This low convergence with respect to the number of training samples is due to the inherent non-convex nature of the optimization problem solved during the model training [32].
Moreover, in order to stress the reliability of the proposed techniques, the training output y was corrupted with Gaussian noise, such that:
y i , n o i s y x i = y i x i × 1 + ζ n ,
where ζ n N 0 , σ n 2 is a Gaussian random variable with standard deviation σ n = 0.01 ,   0.03 .
Figure 8 and Figure 9 compare the relative NRMSE computed at a single frequency point selected as the one providing the maximum error via the proposed approaches, and a multi-output feedforward NN for different values of the noise standard deviation σ n and the number of training samples L for the real and imaginary parts, respectively. Among the introduced methods, CVCF shows the better performance and robustness against noise, both for real and imaginary parts.

6. Conclusions

In this paper, the application of different methods based on LS-SVM regression for predicting the complex-valued frequency response was presented in a serpentine delay line and a high-speed link. The required data for training and examination were obtained from CST and MATLAB software, in terms of physical and geometrical parameters, including substrate dielectric constant and height and strip width of the serpentine delay line, and the capacitances and inductances of the high-speed link. PCA is shown to reduce the computational cost of the frequency samples. Then, the training data are used to create the model, and the test data are considered for the examination of the established model. The performance of the three proposed methods and of a multi-output feedforward NN are compared on the test dataset. It is shown that the CVCF method is suitable for the first complex electromagnetic problem when there are few training data points. In addition, PCF shows an acceptable performance under noise-free conditions at all frequency points for the second example. Finally, the results show that CVCF is the best method when the level of noise is increased.

Author Contributions

Conceptualization, N.S. and R.T.; methodology, N.S. and R.T.; software, data curation and validation, N.S.; writing original draft preparation, N.S.; writing review and editing, N.S. and R.T.; supervision, R.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank Flavio Canavero, Politecnico di Torino, Italy, for his valuable and constructive suggestions during the planning and development of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Manfredi, P.; Ginste, D.V.; Stievano, I.S.; de Zutter, D.; Canavero, F.G. Stochastic transmission line analysis via polynomial chaos methods: An overview. IEEE Electromagn. Compat. Mag. 2017, 6, 77–84. [Google Scholar] [CrossRef]
  2. Zhang, Z.; El-Moselhy, T.A.; Elfadel, I.M.; Daniel, L. Stochastic testing method for transistor-level uncertainty quantification based on generalized polynomial chaos. IEEE Trans. Comput. -Aided Des. Integr. Circuits Syst. 2013, 32, 1533–1545. [Google Scholar] [CrossRef] [Green Version]
  3. Spina, D.; Ferranti, F.; Dhaene, T.; Knockaert, L.; Antonini, G.; Ginste, D.V. Variability analysis of multiport systems via polynomial-chaos expansion. IEEE Trans. Microw. Theory Tech. 2012, 60, 2329–2338. [Google Scholar] [CrossRef]
  4. Ahadi, M.; Roy, S. Sparse Linear Regression (SPLINER) Approach for Efficient Multidimensional Uncertainty Quantification of High-Speed Circuits. IEEE Trans. Comput. -Aided Des. Integr. Circuits Syst. 2016, 35, 1640–1652. [Google Scholar] [CrossRef]
  5. Trinchero, R.; Manfredi, P.; Stievano, I.S.; Canavero, F.G. Machine Learning for the Performance Assessment of High-Speed Links. IEEE Trans. Electromagn. Compat. 2018, 60, 1627–1634. [Google Scholar] [CrossRef]
  6. Ma, H.; Li, E.; Cangellaris, A.C.; Chen, X. Support Vector Regression-Based Active Subspace (SVR-AS) Modeling of High-Speed Links for Fast and Accurate Sensitivity Analysis. IEEE Access 2020, 8, 74339–74348. [Google Scholar] [CrossRef]
  7. Treviso, F.; Trinchero, R.; Canavero, F.G. Multiple delay identification in long interconnects via LS-SVM regression. IEEE Access 2021, 9, 39028–39042. [Google Scholar] [CrossRef]
  8. Houret, T.; Besnier, P.; Vauchamp, S.; Pouliguen, P. Controlled Stratification Based on Kriging Surrogate Model: An Algorithm for Determining Extreme Quantiles in Electromagnetic Compatibility Risk Analysis. IEEE Access 2020, 8, 3837–3847. [Google Scholar] [CrossRef]
  9. Watson, P.M.; Gupta, K.C.; Mahajan, R.L. Development of knowledge based artificial neural network models for microwave components. IEEE Int. Microw. Symp. Baltim. 1998, 1, 9–12. [Google Scholar]
  10. Veluswami, A.; Nakhla, M.S.; Zhang, Q.-J. The application of neural networks to EM based simualtion and optimization of interconencts in high-speed VLSI circuits. IEEE Trans. Microw. Theory Techn. 1997, 45, 712–723. [Google Scholar] [CrossRef]
  11. Kumar, R.; Narayan, S.L.; Kumar, S.; Roy, S.; Kaushik, B.K.; Achar, R.; Sharma, R. Knowledge-Based Neural Networks for Fast Design Space Exploration of Hybrid Copper-Graphene On-Chip Interconnect Networks. IEEE Trans. Electromagn. Compat. 2021, 1–14, Early Access Article. [Google Scholar] [CrossRef]
  12. Swaminathan, M.; Torun, H.M.; Yu, H.; Hejase, J.A.; Becker, W.D. Demystifying Machine Learning for Signal and Power Integrity Problems in Packaging. IEEE Trans. Compon. Packag. Manuf. Technol. 2020, 10, 1276–1295. [Google Scholar] [CrossRef]
  13. Jin, J.; Zhang, C.; Feng, F.; Na, W.; Ma, J.; Zhang, Q. Deep Neural Network Technique for High-Dimensional Microwave Modeling and Applications to Parameter Extraction of Microwave Filters. IEEE Trans. Microw. Theory Tech. 2019, 67, 4140–4155. [Google Scholar] [CrossRef]
  14. Moradi, M.; Sadrossadat, A.; Derhami, V. Long Short-Term Memory Neural Networks for Modeling Nonlinear Electronic Components. IEEE Trans. Compon. Packag. Manuf. Technol. 2021, 1, 840–847. [Google Scholar] [CrossRef]
  15. Bourinet, J.-M. Reliability Analysis and Optimal Design under Uncertainty—Focus on Adaptive Surrogate-Based Approaches; Computation [stat.CO]; Université Clermont Auvergne: Clement Ferrand, France, 2018. [Google Scholar]
  16. Scardapane, S.; van Vaerenbergh, S.; Hussain, A.; Uncini, A. Complex-valued neural networks with nonparametric activation functions. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 4, 140–150. [Google Scholar] [CrossRef]
  17. Adali, T.; Schreier, P.J.; Scharf, L.L. Complex-valued signal processing: The proper way to deal with impropriety. IEEE Trans. Signal Processing 2011, 59, 5101–5125. [Google Scholar] [CrossRef]
  18. Papaioannou, A.; Zafeiriou, S. Principal Component Analysis with Complex Kernels, IEEE Transactions on Neural Networks and Learning Systems, 2013. pp. 1719–1726. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.402.3888&rep=rep1&type=pdf (accessed on 28 December 2021).
  19. Boloix-Tortosa, R.; Murillo-Fuentes, J.J.; Santos, I.; Pérez-Cruz, F. Widely linear complex-valued kernel methods for regression. IEEE Trans. Signal Processing 2017, 65, 5240–5248. [Google Scholar] [CrossRef]
  20. Tobar, F.A.; Kuh, A.; Mandic, D.P. A novel augmented complex valued kernel LMS. In Proceedings of the 2012 IEEE 7th Sensor Array and Multichannel Signal Processing Workshop (SAM), Hoboken, NJ, USA, 17–20 June 2012; pp. 473–476. [Google Scholar]
  21. Boloix-Tortosa, R.; Murillo-Fuentes, J.J.; Tsaftaris, S.A. The Generalized Complex Kernel Least-Mean-Square Algorithm. IEEE Trans. Signal Processing 2019, 67, 5213–5222. [Google Scholar] [CrossRef]
  22. Hirose, A. Complex-Valued Neural Networks; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  23. Bouboulis, P.; Theodoridis, S.; Mavroforakis, C.; Evaggelatou-Dalla, L. Complex Support Vector Machines for Regression and Quaternary Classification. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1260–1274. [Google Scholar] [CrossRef] [Green Version]
  24. Scardapane, S.; van Vaerenbergh, S.; Comminiello, D.; Uncini, A. Widely Linear Kernels for Complex-Valued Kernel Activation Functions. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8528–8532. [Google Scholar]
  25. Ogunfunmi, T.; Paul, T.K. On the complex kernel-based adaptive filter. In Proceedings of the 2011 IEEE International Symposium of Circuits and Systems (ISCAS), Rio de Janeiro, Brazil, 15–18 May 2011; pp. 1263–1266. [Google Scholar]
  26. Boloix-Tortosa, R.; Murillo-Fuentes, J.J.; Payán-Somet, F.J.; Pérez-Cruz, F. Complex Gaussian processes for regression. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5499–5511. [Google Scholar] [CrossRef]
  27. Soleimani, N.; Trinchero, R.; Canavero, F. Application of Different Learning Methods for the Modelling of Microstrip Characteristics. In Proceedings of the 2020 IEEE Electrical Design of Advanced Packaging and Systems (EDAPS), Shenzhen, China, 14–16 December 2020; pp. 1–3. [Google Scholar]
  28. Manfredi, P.; Grivet-Talocia, S. Compressed Stochastic Macromodeling of Electrical Systems via Rational Polynomial Chaos and Principal Component Analysis. In Proceedings of the 2021 Asia-Pacific International Symposium on Electromagnetic Compatibility (APEMC), Nusa Dua-Bali, Indonesia, 27–30 September 2021. [Google Scholar]
  29. Ahmadi, M.; Sharifi, A.; Fard, M.J.; Soleimani, N. Detection of brain lesion location in MRI images using 326 convolutional neural network and robust PCA. Int. J. Neurosci. 2021, 131, 1–12. [Google Scholar]
  30. Jolliffe, I.T. Principal Component Analysis; Springer: New York, NY, USA, 2002. [Google Scholar]
  31. Manfredi, P.; Trinchero, R. A data compression strategy for the efficient uncertainty quantification of time-domain circuit responses. IEEE Access 2020, 8, 92019–92027. [Google Scholar] [CrossRef]
  32. Kushwaha, S.; Attar, A.; Trinchero, R.; Canavero, F.; Sharma, R.; Roy, S. Fast Extraction of Per-Unit-Length Parameters of Hybrid Copper-Graphene Interconnects via Generalized Knowledge Based Machine Learning. In Proceedings of the IEEE 30th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), Austin, TX, USA, 17–20 October 2021. [Google Scholar]
  33. Suykens, J.A.K.; van Gestel, T.; de Brabanter, J.; de Moor, B.; Vandewalle, J. Least Squares Support Vector Machines; World Scientific Publishing Company: Singapore, 2002. [Google Scholar]
  34. Vapnik, V. The Nature of Statistical Learning Theory, 2nd ed.; Springer: New York, NY, USA, 1999. [Google Scholar]
  35. Posa, D. Parametric families for complex valued covariance functions: Some results, an overview and critical aspects. Spat. Stat. 2020, 39, 100473. [Google Scholar] [CrossRef]
  36. Iaco, D.S.; Palma, M.; Posa, D. Covariance functions and models for complex-valued random fields. Stoch. Environ. Res. Risk Assess. 2003, 17, 145–156. [Google Scholar] [CrossRef]
  37. Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Processing Syst. 2012, 25, 1–10. [Google Scholar]
  38. Geng, J.; Gan, W.; Xu, J.; Yang, R.; Wang, S. Support vector machine regression (SVR)-based nonlinear modeling of 354 radiometric transforming relation for the coarse-resolution data-referenced relative radiometric normalization (RRN). Geo-Spat. Inf. Sci. 2020, 23, 237–247. [Google Scholar] [CrossRef]
  39. LS-SVMlab, Version 1.8 ed; Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven: Leuven, Belgium, 6 July 2011; Available online: https://www.esat.kuleuven.be/sista/lssvmlab/old/toolbox.html (accessed on 28 December 2021).
  40. Soh, W.-S.; See, K.-Y.; Chang, W.-Y.; Oswal, M.; Wang, L.-B. Comprehensive analysis of serpentine line design. In Proceedings of the 2009 Asia Pacific Microwave Conference, Singapore, 7–10 December 2009; pp. 1285–1288. [Google Scholar]
Figure 1. (a) Design parameters of the serpentine line to be analyzed; (b) cross-sectional view of the serpentine line.
Figure 1. (a) Design parameters of the serpentine line to be analyzed; (b) cross-sectional view of the serpentine line.
Electronics 11 00551 g001
Figure 2. Normalized singular value plot of the serpentine delay line for the considered dataset with 5000 frequency points (blue line). The horizontal line shows the 0.001% threshold for the PCA truncation.
Figure 2. Normalized singular value plot of the serpentine delay line for the considered dataset with 5000 frequency points (blue line). The horizontal line shows the 0.001% threshold for the PCA truncation.
Electronics 11 00551 g002
Figure 3. Comparison of the relative NRMSE values computed by the proposed approaches on the test samples by considering the real part of the S21 parameter of the serpentine delay line structure for an increasing number of training samples (i.e., L = 50, 250, 500).
Figure 3. Comparison of the relative NRMSE values computed by the proposed approaches on the test samples by considering the real part of the S21 parameter of the serpentine delay line structure for an increasing number of training samples (i.e., L = 50, 250, 500).
Electronics 11 00551 g003
Figure 4. Comparison of the relative NRMSE values computed by the proposed approaches on the test samples by considering the imaginary part of the S21 parameter of the serpentine delay line structure for an increasing number of training samples (i.e., L = 50, 250, 500).
Figure 4. Comparison of the relative NRMSE values computed by the proposed approaches on the test samples by considering the imaginary part of the S21 parameter of the serpentine delay line structure for an increasing number of training samples (i.e., L = 50, 250, 500).
Electronics 11 00551 g004
Figure 5. Schematic of the high-speed interconnect link.
Figure 5. Schematic of the high-speed interconnect link.
Electronics 11 00551 g005
Figure 6. Comparison of the relative NRMSE values computed by the proposed approaches and a feedforward multi-output neural network on the test samples by considering the real part of the transfer function of a high-speed link for an increasing number of training samples (a) L = 20; (b) L = 100; (c) L = 150; (d) L = 500).
Figure 6. Comparison of the relative NRMSE values computed by the proposed approaches and a feedforward multi-output neural network on the test samples by considering the real part of the transfer function of a high-speed link for an increasing number of training samples (a) L = 20; (b) L = 100; (c) L = 150; (d) L = 500).
Electronics 11 00551 g006
Figure 7. Comparison of the relative NRMSE values computed via the proposed approaches and a feedforward multi-output neural network on the test samples by considering the imaginary part of the transfer function of high-speed link for an increasing number of training samples. (a) L = 20; (b) L = 100; (c) L = 150; (d) L = 500).
Figure 7. Comparison of the relative NRMSE values computed via the proposed approaches and a feedforward multi-output neural network on the test samples by considering the imaginary part of the transfer function of high-speed link for an increasing number of training samples. (a) L = 20; (b) L = 100; (c) L = 150; (d) L = 500).
Electronics 11 00551 g007aElectronics 11 00551 g007b
Figure 8. Three-dimensional plot of the relative NRMSE computed on the real part of the test samples at a single frequency point selected as the one providing the maximum error via the proposed approaches (see panel (a) for the DCK, panel (b) for CVCF and panel (d) for PCF), and a multi-output feedforward NN (see panel (c))for different values of the noise standard deviation σ n and the number of training samples L.
Figure 8. Three-dimensional plot of the relative NRMSE computed on the real part of the test samples at a single frequency point selected as the one providing the maximum error via the proposed approaches (see panel (a) for the DCK, panel (b) for CVCF and panel (d) for PCF), and a multi-output feedforward NN (see panel (c))for different values of the noise standard deviation σ n and the number of training samples L.
Electronics 11 00551 g008aElectronics 11 00551 g008b
Figure 9. Three-dimensional plot of the relative NRMSE computed on the real part of the test samples at a single frequency point selected as the one providing the maximum error via the proposed approaches (see panel (a) for the DCK, panel (b) for CVCF and panel (d) for PCF), and a multi-output feedforward NN (see panel (c)) for different values of the noise standard deviation σ n and the number of training samples L.
Figure 9. Three-dimensional plot of the relative NRMSE computed on the real part of the test samples at a single frequency point selected as the one providing the maximum error via the proposed approaches (see panel (a) for the DCK, panel (b) for CVCF and panel (d) for PCF), and a multi-output feedforward NN (see panel (c)) for different values of the noise standard deviation σ n and the number of training samples L.
Electronics 11 00551 g009
Table 1. Serpentine delay line parameters for the training and test dataset.
Table 1. Serpentine delay line parameters for the training and test dataset.
Training RangesTest Ranges
4.5 mm ≤ LL ≤ 5.1 mm4.5 mm ≤ LL ≤ 5.1 mm
3.9 ≤ εr ≤ 4.53.9 ≤ εr ≤ 4.5
0.13 mm ≤ SW ≤ 0.17 mm0.13 mm ≤ SW ≤ 0.17 mm
1 MHz ≤ f ≤ 3 GHz1 MHz ≤ f ≤ 3 GHz
1000 samples for each frequency2000 samples for each frequency
Table 2. High-speed interconnect link parameters for the training and test datasets.
Table 2. High-speed interconnect link parameters for the training and test datasets.
Training and Test Ranges
C1 (x1)(1 ± 0.5 x1) pF
C2 (x2)(0.5 ± 0.25 x2) pF
L1 (x3)(10 ± 5 x3) nH
L2 (x4)(10 ± 5 x4) nH
i1,2,3,4
xiRandom variable in [−1, 1]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Soleimani, N.; Trinchero, R. Compressed Complex-Valued Least Squares Support Vector Machine Regression for Modeling of the Frequency-Domain Responses of Electromagnetic Structures. Electronics 2022, 11, 551. https://doi.org/10.3390/electronics11040551

AMA Style

Soleimani N, Trinchero R. Compressed Complex-Valued Least Squares Support Vector Machine Regression for Modeling of the Frequency-Domain Responses of Electromagnetic Structures. Electronics. 2022; 11(4):551. https://doi.org/10.3390/electronics11040551

Chicago/Turabian Style

Soleimani, Nastaran, and Riccardo Trinchero. 2022. "Compressed Complex-Valued Least Squares Support Vector Machine Regression for Modeling of the Frequency-Domain Responses of Electromagnetic Structures" Electronics 11, no. 4: 551. https://doi.org/10.3390/electronics11040551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop