1. Introduction
The base idea of Support Vector Machines is the soft margin classifier that was first introduced by Cortes and Vapnik in 1995 [
1], then in the same year, this algorithm was extended for deal with regression problems by Vapnik [
2]. The first formal statistical analysis about boundaries of the generalization of hard margin SVMs were given by Barlett [
3] and Shawe-Taylor, et al. in 1998 [
4]. In 2000, Shawe-Taylor and Cristianini continued with the development of the statistical learning theory that supports the SVMs and they present the statistical bounds of the generalization of soft margin algorithms for the regression case [
5]. Thus, since its introduction by Vladimir Vapnik and his team, SVMs have become an important and very used learning algorithm to solve regression and classification problems. The sheer number of published applications of support vector machines (SVM) for solving problems involving real data is reflected in the high number of citations that the top 100 SVM publications [
6] have recorded; there are more than 46,000 citations, according to Google Scholar.
SVMs use an intelligent and elegant technique to solve non-linear problems: the kernel trick, which has become a seminal idea to develop several machine learning methods called kernel-based methods. Kernels are symmetric functions that for two vectors of dimension n, return a real number . The use of kernels in SVM allows implicitly mapping the original input data into a higher dimensional Reproducing Kernel Hilbert Space (RKHS) H, in which linear functions that split the mapped data are computed. This is equivalent to solving the non-linear problem in the original input data space.
Although the RKHS includes complex and hypercomplex spaces in which the original input data can be mapped via kernel functions, the majority of extensions and applications of SVM deal with real data, and therefore use real kernel functions. However, the need for processing complex and hypercomplex data in modern applications has paved the way for some important proposals to process complex and hypercomplex-valued input and output data, and kernels. These approaches will be reviewed in more detail in the next section.
Therefore, since the first proposals of SVM extensions for processing complex and hypercomplex data and kernels, their applications have grown in number, including signal processing, pattern recognition, classification of electromagnetic signals, light, sonic/ultrasonic and quantum waves, chaos in the complex domain, phase and phase-sensitive signal processing and nonlinear filtering, frequency, time-frequency and spatiotemporal domain processing, quantum computation, robotics, control, pattern recognition, time-series prediction, and visual servoing, among others. Due to the variety and importance of the applications of this type of SVM extension, in this survey we present and discuss the importance, recent progress, prospective applications, and future directions of complex, hypercomplex-valued and geometric support vector machines.
This survey is organized as follows: In
Section 2, the most important extensions of SVM are reviewed to deal with complex and hypercomplex-valued input and output data, as well as with kernels that process this type of data. In
Section 3, some of the applications of these extensions of SVMs with the greatest impact are introduced, as well as some of their promising developments with future applicability. The last section is devoted to the conclusions.
2. Complex and Hypercomplex Support Vector Machines Extensions
In this section, the most important extensions of SVM are reviewed to deal with complex and hypercomplex-valued input and output data, as well as with kernels that process this type of data. Each approach is studied in the chronological order of publication. In addition,
Table 1 is summarizes the information presented in all the subsections of this section, which presents the main features, advantages and disadvantages of each approach.
2.1. Geometric or Clifford Support Vector Machines
In 2000, the first attempt to develop an extension of SVM to deal with complex and hypercomplex input and output data was published [
7,
8]. The approach was designed using the Geometric Algebra (GA) framework [
16].
GAs are the Clifford special family of associative algebras built over a field, and a quadratic space of elements called multivectors that is constructed using a special product called the geometric product. In Clifford and GAs we can find embedded concepts of complex, linear, tensor, quaternion, and octonion algebras (for a detailed introduction to Clifford and GAs it is recommended to read [
16,
17].
Hence, in [
7,
8], a multivector-valued input and output data was used to perform multiclassification and multi-regression, solving one multivector-valued optimization problem. Nevertheless, the approach was not completely developed, because it did not consider the multivector-valued kernel design, or define the Gramm matrix (or kernel matrix) as a multivector matrix. Therefore, the design can be considered to be an application of the idea of Eiichi Goto presented in 1954 with the proposal of “Parametron” [
18,
19] wherein the phase of a high-frequency carrier is used to represent binary or multivalued information. Similarly, in [
7,
8], the elements of different grades of the multivector are used to represent multivalued information in the input and the output multivector-valued data. It was not until 2010 that the extension presented in [
7,
8] was fully developed, with the introduction of Clifford Support Vector Machines (CSVMs) [
9].
The author considers this to be the most complete work that generalizes the real-valued SVM into a complex and hypercomplex-valued SVM, as the Clifford Algebra theory includes concepts of the most used and important algebras, such as analytic geometry of Descartes, the complex algebra of Wessel and Gauss, Hamilton algebra, matrix Cayley algebra, exterior Grassmann algebra, tensor algebra of Ricci, the algebra of Pauli and Dirac, Lie algebra, etc. Furthermore, all the division algebras such as real, complex, quaternion and octonion can be viewed as isomorphic Clifford algebra.
CSVM presented the entire design of a multivector-optimization problem, the derivation of the primal and dual multivector-valued problems and the definition of the Gramm matrix as a matrix with multivectors as elements, as well as the introduction of a multivector-valued kernel design. The key feature of CSVM design is the use of the Clifford product involved in the multivector-valued kernels to keep separated the different grade components of each input multivector and at the end, to represent those components as the direct sum of linear spaces in multivector-valued outputs. All the above allow the CSVM to solve optimization problems in complex and hypercomplex vector spaces, so that the original and the feature (or RKHS) spaces can preserve the topology and geometry of the complex and hypercomplex input data. It is demonstrated that when this machine learning algorithm is used to process this type of data, the accuracy and convergence speed can be improved with respect to the obtained accuracy and convergence speed with real SVMs.
The CSVM was presented to solve multi-classification and multi-regression, and using it in conjunction with a long short-term memory neural network [
20] can deal with dynamic systems or recurrence.
Two multivector-valued kernels are designed: a complex kernel using the GA and a quaternion-valued kernel embedded in the GA . The design methodology can be extended to any Clifford algebra (so it can be applied to solve any hypercomplex SVM problem).
This approach was used to solve multi-classification for object recognition, multi-interpolation, time-series forecasting, and path-planning problems.
In [
15] a Quaternion Support Vector Machine (QSVM) was presented as an special case of the CSVM of [
9]. The quaternion algebra framework was used to design a QSVM that processes multiple multivectors as input and returns multiple multivectors as output data to achieve multi-class classification. Two quaternion kernels that involve the quaternion product in their definition were designed: polynomial quaternion-valued kernel and Gaussian quaternion Gabor kernel. In addition quaternion
function is defined to allow the QSVM to classify up to sixteen classes using one QSVM. Diamond colour classification and object recognition experiments were conducted to demonstrate the algorithm efficiency.
The article [
21] the methodology to implement a parallel CSVM using the Gaussian kernel was presented. As the Gaussian kernel returns a real number even when it is used with multivectors as input data, and in addition it has the commutative property (as the Gaussian kernel does not involves the Clifford product computation) the multivector input data can be separated into its different grade elements that belong to independent subspaces and therefore a quadratic optimization problem for each element can be solved using parallelism. Experiments of classification were conducted using benchmark problems such as the concentric 2D spirals and a five-class problem described with three overlapped circles.
2.2. Division Algebras
Several efforts using division algebras were made to generalize real SVMs, to deal with complex and hypercomplex numbers. Division algebras are appealing mathematical frameworks to solve the complex, hypercomplex-valued SVM optimization problem due to the fact that all non-zero elements of a vector space have multiplicative inverses. There are four division algebras: real numbers, complex numbers, quaternions, and octonions.
2.2.1. SVM Multiregression for Nonlinear Channel Estimation in Multiple-Input Multiple-Output Systems
In 2002, an approach to solve multidimensional function approximation and regression was presented [
10], and was fully developed and published in 2004 [
11]. The authors addressed the problem of multiple-input, multiple-output (MIMO) system for the frequency nonselective channel estimation. This was an SVM-based approach as it leveraged the multidimensionality of a MIMO channel by using a regression SVM-based algorithm. This study proposed using an iterative reweighted least square (IRWLS) to solve the multivariable SVM problem instead of quadratic programming, and train an SVM multi-regressor to model nonlinearities that affect each transmitter or each receiver module of the transmission-reception chain. These cases were modeled separately using two different channel models of the input information signal represented by the quadrature phase shift keying (QPSK) employed in complex signal processing. On the other hand, the output data are represented as real vectors and the optimization problem defined uses an
-insensitive cost function that in turn uses the
norm to consider all the multiple dimensions in a unique restriction. When
the problem is equivalent to solving one independent, regularized real kernel for each dimension, but for
, the problem becomes an ill-defined one that cannot be solved straightforwardly, and it takes an iterative procedure to obtain the solution which is why they use an IRWLS algorithm.
Simulations were conducted to address the problem of nonlinear channel estimation, and their proposal outperforms the applications of one real support vector regression (SVR) and a radial function network (RBFN) for each dimension. Meanwhile, for linear channel models, results equivalent to those obtained using minimum mean square error (MMSE) strategy were achieved.
2.2.2. Quaternionic and Complex-Valued Support Vector Regression for Equalization and Function Approximation
In [
12], another SVR strategy was designed by Shilton et al., this time to deal with quaternionic and complex-valued equalization and function approximation. The outputs of the SVR werebeen interpreted as complex/quaternion-valued vectors to tackle the interconnectedness of outputs, i.e., the fact that these outputs can be coupled and treated independently can lead to a decrease in regressor accuracy. The author proposed the use of a rotationally invariant cost function in order to consider the magnitude, but not the angle of the error in the output. When a real regressor is applied on each dimension of a complex or a quaternionic-valued signal, each one of the regressor estimates one function in its own axis, and when these functions are summed to construct a complex or quaternion output, the overall risk function will not be rotationally symmetric because it will contain the magnitude and the various angles added (even when the angles were computed using different axis for each regressor). Therefore, considering only the magnitude of the error but not the various angles between the axes, will affect the accuracy of the estimated function. This is one of the main reasons why complex and hypercomplex-valued estimators increase the accuracy when dealing with complex and hypercomplex-valued input data regarding the strategies that cast these problems on to the real domain by splitting data in each different grade part (two for complex, four for quaternion and so on), and then work with each dimension independently instead of solving the optimization problem in complex and hypercomplex input, output, and feature spaces. The authors of [
12] derive a well-defined dual optimization problem using a bilevel definition of the primal convex quadratic problem [
22]. In the experimental results section, they consider the problem of equalization of a 4-symbol quadrature amplitude modulated (4-QAM) signal over a complex linear communication channel added with Gaussian noise. Again, the comparison results show that their proposal outperforms two independent SVR with
-insensitive loss function and the decision boundary obtained with their approach is compared “favorably” with the optimal decision boundary that a Bayesian equalizer gets.
2.2.3. A Division Algebraic Framework for Multidimensional Support Vector Regression
In 2010 Shilton et al. tackled the multidimensional SVR design using the division algebras framework, to present an
-insensitive loss function independent of the coordinate system or basis used [
23]. Although the octonions are included in the division algebras, this paper only deals with real, complex and quaternions extensions of SVR. Indeed, the authors only present the design of a quaternionic
-SVR because the real and complex extensions are cases included as restrictions of the quaternionic SVR. The approach proposes the use of a strictly L2-norm-based loss function to avoid that the trained machine could be influenced by the axe choices, due to the fact that this loss function is coordinate independent and less sensitive to outliers than a quadratic cost function. Additionally, the L2 norm includes an
-insensitive region that ensures sparseness. The authors present the derivation of a primal quaternionic SVR using this loss function, and then the Lagrange multipliers optimization method is used to compute the quadradic-valued dual training problem, which at the end, is analogous to the standard real SVR dual, but considering the use of a quaternionic-valued kernel function. Their proposal is applied to solve problems of complex-valued function approximation, chaotic time-series forecasting and channel equalization. Experimental results show that the proposal outperforms Clifford SVR (C-SVR) presented in 2000 and 2001 [
7,
8], least-square SVR (LS-SVR) [
24] and multidimensional SVR (M-SVR) [
11].
Furthermore, a useful comparative analysis against C-SVR [
7,
8], LS-SVR [
24], M-SVR [
11], extreme learning machine (ELM) [
25,
26], and a kernel method [
27] is performed in terms of the definition of primal loss functions, dual optimization problem, independence of coordinates of the loss functions, sparcity of solution, and outlier sensitivity. It is important to note that the methodologies of the comparative analysis were proposed in 2000, 2002, 2004, 2006, and 2002 respectively.
2.2.4. A Note on Octonionic Support Vector Regression
Mentioned in the above section, Shilton et al., in their 2010 paper [
23], did not include the design of an octonion-valued SVR. Instead, they include this design in a publication in 2012, [
13]. Thus, in this extension to octonionic-valued SVR, to derive the primal and dual optimization problems the authors use the same methodology as the one used in [
23]. The dual training problem obtained is similar to that presented for quaternionic-valued SVR; however, in the case of octonions, their nonassociativity feature had to be considered to this derivation. Therefore, the dual-training problem considers an antiassociator operator affecting the Kernel matrix term of the dual problem, as well as a substracting quadratic term involving the Lagrange multipliers and an “associated kernel function” that computes a pondered
-summation of the kernel results and returns a real number. The same happens with the training machine form between the quaternion and octonion-valued SVR, for this last one, a term that it is included as a multiplier of the value of the associated kernel function is involved. Therefore, although their proposals for derivations of quaternionic and octonionic-valued SVRs are similar, the important difference in antiassociativity of octonions as regards the quaternions has to be factored in to compute the dual training problem, as well as the forms of training machine. This differs from the cases of the CSVM
Section 2.1 and from the proposal presented in the next subsection, in which the generic and invariant methodologies of derivation of the dual optimization problem are shown independent of the algebra with which one works.
The paper [
13] also shows three special cases of octonionic SVR in which the nonassociative octonion-valued loss-function-related terms can be neglected to obtain a “full-analogy” (not identity) between their proposals of real, complex, quaternion, and octonion SVR. These special cases were called pseudoreal, pseudocomplex, and pseudoquaternion regressions. The authors analyzed the features of a nonlinear function that maps data from the input space to the feature space (feature map or kernel function), to describe the cases when the associated kernel function terms are equal to zero, and the full analogy between their octonion-valued and quaternion-valued SVR proposal is achieved. Nevertheless, in the general octonion-valued SVR case, the curse of dimensionality could lead to computationally intractable problems depending on the number of input data, and therefore, this could render it impractical in real applications. However, this was not the case in the experimental results section.
Experimental results demonstrate the performance of the octonionic SVR when applied to the biomechanical study of human locomotion, i.e., gait analysis. In [
28] a high correlation between inertial measurement systems (IMUs: accelerometers and gyroscopes) during human locomotion is suggested. However, due to the fact that the IMUs suffer from intrinsic drift error, there is a need to correct the sensor measurements or systems calibration. One way to achieve this is to use an intelligent regression to estimate the correct measurements of key events of the endpoint foot trajectory during human treadmill walking. This was performed by octonionic SVR, C-SVR, LS-SVR, and M-SVR. The targets of regression were expressed as pure octonions and, as the data sets contain a significant number of outliers, the octonionic SVR and the C-SVR obtained the best overall error, and the first one shows more consistency in obtaining better results.
2.2.5. Complex Support Vector Machines for Regression and Quaternary Classification
In [
14] the derivation and design of a complex SVM for regression and classification are presented. The authors use two main frameworks to develop their proposal: the widely linear mean square estimation and Wirtinger’s calculus.
In [
29] a mean square estimation (MSE) optimization methodology to deal with complex and normal data is presented. The MSE for complex data is not linear as in the real case, but wide sense linear: i.e., the regression of a scalar random variable
y to be estimated is linear, both in terms of a random vector
x that is an observation of
y and regarding the complex conjugate of
x, namely
. This optimization method shows advantages regarding the linear procedure when is used with complex data, and it can yield significant improvements in the estimation performance as it was shown in the examples presented in [
29].
Wirtinger’s calculus was introduced in 1927 [
30]. The Wirtigner derivatives (or Wirtinger operators) are first-order partial differential operators defined by several complex variables. These derivatives behave analogously to the ordinary derivatives of functions of real variables.
Thus, in the paper [
14], the above frameworks were used in order to work with the SVMs in the complex Reproducing Kernel Hilbert Space (RKHS). Three approaches were presented:
(1) The
complexification of the Hilbert space
: the objective is to enrich
with a complex inner product that is equivalent to the definition of the Clifford (or geometric) product of two complex numbers in a Clifford algebra isomorphic to the complex space. Hence, this approach is equivalent to work with this type of hypercomplex number in Clifford or GA algebras of
Section 2.1.
(2) The Dual Real Channel (DRC) approach in which the training data are split into two sets (real and imaginary parts of the input data), and a real regression (or classification) is performed on each data set using a real kernel. It has been proven before that this approach and the complexification are equivalent procedures [
31].
(3) The pure complex SVM, in this paper is derived using Wirtinger’s calculus and widely linear estimation in a very elegant and compact manner. This derivation allows the authors to conclude that the proposed pure complex SVM can be solved by splitting the labels (desired outputs) of data into their real and imaginary parts, and solving two real SVM tasks employing any one of the standard algorithms (they employed SMO in their experiments). Hence, the difference between practical implementations of complexification, DRC and pure complex SVMs is only that the first two mentioned approaches use real kernels meanwhile the pure complex SVM approach uses an induced real kernel .
One of the main contributions of the paper is the validation of complex kernels to classify the target space into four different categories. This is a very intuitive result, as for real valued kernels the classification features two classes, and the complex kernel features a real and an imaginary part , so there are four possible combinations for the signs of and , hence yielding four categories naturally.
The experiments were conducted using pure complex SVR and a complex-induced Gaussian kernel (which is not equal to the real Gaussian RBF) for the function estimation of a sinc function, channel identification, and channel equalization. In addition, a multiclass classification application was demonstrated using the MNIST database of handwritten digits for a 4-classes problem where the problem was solved significantly faster than the one-versus-three and the one-versus-one real SVM strategies: the computational time taken by the pure complex SVM was almost half that of the real SVMs, but the error rate increased.
4. Conclusions and Prospective Work
In this survey, several proposals were reviewed to extend the Support Vector Machine algorithm to deal with complex and hypercomplex-valued inputs, outputs, and kernels. The first attempts to achieve this consisted of splitting the complex and hypercomplex numbers into real and complex (hypercomplex) parts, and then solving the optimization problems independently in the real domain. Then, it was proven that this approach loses the benefits of the complex (hypercomplex) spaces, such as sparsity and geometric and topologic features that the vectors (multivectors) have when are embedded in their spaces. All the proposals that exploit these features by solving the SVM optimization problem in its complex (hypercomplex) space were proven to achieve higher accuracies than those that solve the problem using splitting procedures.
The most complete and general extensions of SVMs, that deal with complex/hypercomplex-valued data are those that, besides considering input and output data that is complex/hypercomplex valued, define the optimization problem (primal and dual problems) and the kernel using complex and hypercomplex algebras and calculus frameworks, such as Clifford and GA, widely linear analysis and Wirtinger?s calculus; although it was also proved [
14] that the design of a complex/hypercomplex-valued kernel that maps the input data onto a complex/hypercomplex RKHS is the key to obtaining the benefits of solving the problem in these spaces, such as the preservation of the geometric and topological special features of these spaces.
Therefore, for dealing with data that is naturally embedded in complex (hypercomplex) spaces the approaches that were presented in the
Section 2 of this paper are the best choices to obtain higher accuracies and lower run time in problems of classification and regression.
Another important conclusion is about the compatibility of the complex numbers and their frameworks with the wave nature due to the waves have intensity and phase, so two numbers are needed to describe it.
In the applications section, it is shown that the extensions of SVM that were designed using complex and hypercomplex mathematical tools can solve efficiently and effectively real-world problems that involve the processing and classification, interpolation, or regression of complex signals, frequency domain data, ECG signals, MRI data, chaotic time series, complex/hypercomplex forecasting, approximation of complex/multiple functions, complex beamforming and DOA, GPR and GPS data, linear and nonlinear complex filtering, pattern recognition, robotics, and computer vision, among others.
Regarding the prospective of complex/hypercomplex extensions and applications of SVMs, one promising development is wave informatics because, as it was said before, the natural compatibility between complex numbers and waves. The applications are important and numerous as they can be seen in
Section 3.
Another area of opportunity for researching is the processing, classification, regression, prediction, identification, and simulation of chaotic systems. All the deterministic aperiodic system that is sensitive to the initial conditions is called a chaotic system. Their study is very important as some chaotic systems examples are the weather, some electrocardiograms, and encephalograms time series, the stock market time series, and fractals, among others. These systems are also naturally modeled using complex-valued functions, and therefore, the extensions of complex-valued SVMs are the best-suited to process them.
As was mentioned, perhaps the most general extension of complex/hypercomplex valued SVMs is the one presented in [
9]. Here, the use of GA framework allowed the authors to define an SVM problem that processes complex/hypercomplex-valued inputs, outputs, and kernels. Nevertheless, the design of complex/hypercomplex kernels remains a main open issue to explore the extensions of SVM to higher-dimensional hypercomplex spaces (more than quaternions, dual quaternions, or octonions). Thus, exploring the definition of high-dimensional Clifford and/or GA to design hypercomplex kernels, and to exploit the benefits of the sparsity of these vector spaces and their power of geometric expressiveness is a promising research area. Even in hypercomplex high-dimensional Clifford and/or GA, the design of kernels may not be necessary because the sparsity could make it possible to obtain linear functions to correctly separate the input data, reducing the complexity of the SVM algorithm in both training and testing stages.
This exploration of higher-dimensional hypercomplex GA could benefit the application of hypercomplex SVMs to simulate another main issue of chaos theory: the geometry of fractals and even the identification, modeling and control of chaotic dynamic systems. The geometry of fractals is defined as the geometry of nature, chaos, and order. The geometric shape of fractals describes sinuous curves, spirals, and filaments that twist about themselves, giving elaborate figures whose details are lost in infinity. Therefore, using very higher-dimensional GA, fractals could be described as nonlinear combinations of the highly complex geometric entities that can be defined in that higher-dimension GA as multivectors, and a hypercomplex-valued SVM could approximate those multivectors. Now, the problem of exploring these GA seems to be a computationally expensive and complex task, but the emergence of another new area of research could make it possible: quantum computing.
Quantum computing stands as another prospective area of research and application of complex and hypercomplex-valued SVM and as an opportunity to develop any type of complex/hypercomplex-valued machine learning techniques [
62,
63,
64,
65,
66,
67]. Quantum computing is a new computational paradigm that uses the quantum theory to develop computer technology. In this computing technology, the quantum system qubit is used instead of the classic computational paradigm’s bit. Quantum systems are those that exhibit both particle and wave-like behavior. Therefore, modeling this dual behavior using complex numbers would seem to be natural, wherein the real part could be used to represent particle behavior and the imaginary part to describe wave phase that gives rise to the interference patterns. Quantum computing has made it possible to solve some problems that cannot be solved using classic computing, such as the factoring of integers and discrete logarithm computation. In this paper, it has been shown that when data of a problem fits well with the wave nature of complex (hypercomplex) numbers, it is better to use an algorithm that works in the vector space in which the fitted data is defined in order to take advantage of their geometric relationships and distributions. Hence, the reviewed extensions of SVMs are suited to deal with the basic unit of quantum information, the qubit.
In
Figure 1 several prospective and promising works to continue developing theory and applications of complex and hypercomplex SVMs are illustrated.