*6.2. Support Vector Machine—Recursive Feature Elimination*

The support vector machine (SVM) originated in the work of Boser, Guyon, and Vapnik [58] and Cortes and Vapnik [59]. The general idea of an SVM is to create a decision boundary (hyperplane) that maximizes the margin between itself and the closest observations (=data points) of each of the classes [54]. The points that are closest to the boundary and, thus, are on the margin are called "support vectors" [60]. It is noteworthy that the input variables, denoted *x*, are often mapped into a higher-dimensional feature space using a (nonlinear) mapping that can be denoted as *φ*(). Following the notation in [59,61], the decision function *f* for a data set *x* can be defined as

$$f(\mathbf{x}) = w\phi(\mathbf{x}) + b \tag{1}$$

where *w* are the weights for the optimal hyperplane (decision surface) that separates the classes with the largest margin, *φ*() is a function that transforms the input, and *b* is the bias value. The bias is the average over the marginal support vectors and can be calculated using the weights *w* [60]. The weights *w* for the optimal hyperplane are calculated as

$$w = \sum\_{i} y\_i a\_i \phi(\mathbf{x}\_i) \tag{2}$$

where *x<sup>i</sup>* is a support vector, *α<sup>i</sup>* is the weight for the support vector *x<sup>i</sup>* , and *y<sup>i</sup>* is the class label *e*{−1, 1} corresponding to the support vector [59,60]. The weights of the support vectors *α* are the parameters of an SVM, which are optimized using convex optimization [60]. For details on the optimization problem behind an SVM, please see [56,61].

The weight vector *w* for the hyperplane will be used in recursive feature elimination to determine the ranking of features. Recursive feature elimination using a support vector machine (SVM-RFE) was introduced by Guyon et al. [60]. It deploys a greedy backward elimination procedure where in each step an SVM is trained and the variable with the lowest squared weight *w* 2 is removed from the set of the remaining variables [48,60,62,63]. Thus, *w* 2 can be regarded as a ranking criterion for the variables [60]. It is noteworthy that in each step one or more variables can be removed [48,60]. Thus, SVM-RFE is inherently different from random forests: the former starts with a complete variable set and iteratively removes one (or multiple) variable(s) whereas the latter functions by iteratively selecting variables. The algorithm for SVM-RFE is depicted in Algorithm 2 (similar to [48,60]).

The logic behind this procedure is that *w* 2 estimates the effect of each variable on the objective function (sensitivity) with larger values indicating more important variables so that the resulting variable subset leads to the best class separation with the SVM classifier [48,60]. The number of variables to retain can either be user-specified (and the number of variables to remove would, thus, be all variables minus the number of variables to retain) [62,63] or the algorithm can be run until a single variable is left and the optimal subset can be selected using cross-validation as the subset leading to the highest validation accuracy. For this study, the variables are standardized using the weighted mean and weighted standard deviation, and the optimal variable subset is determined using cross-validation.

**Algorithm 2** Support vector machine—recursive feature elimination (SVM-RFE)

For *m* = 1 to *M* (number of features to remove)

	- 2.1. Obtain the weights *α* of the support vectors from the trained SVM
		- 2.2. Calculate the weight vector *<sup>w</sup>* of the optimal hyperplane *w* = ∑ *i αiyiφ*(*xi*)

3. Remove the variable associated with the smallest *w* 2 from the set of the remaining features *s*

End
