4.6.1. Statistical Features Approach

To perform dimensionality reduction, statistical features are applied to each segment. Such statistical features are commonly applied to time series classification and in particular for continuous authentication as described in the related work in References [13,17]. The list of applied statistical features is shown in Table 3.



A brief description of the statistical features is provided here. Variance, Skewness and Kurtosis are moments of different levels of a specific quantitative measure of the shape of a function. Variance (second central moment) is the expectation of the squared deviation of a random variable from its mean. Skewness (third moment) is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. Kurtosis (fourth moment) is a measure of the tailedness of the probability distribution of a real-valued random variable. The entropy state function is simply the amount of information (in the Shannon sense) that would be needed to specify the full microstate of the system and it can be used as a statistical measure of randomness. The Wavelet Spectral Shannon Entropy is the Shannon entropy calculated in the spectral representation of the IMU recording transformed using the wavelet transform. The Wavelet Spectral Log Entropy is the entropy calculated using a logarithmic scale on the spectral representation of the IMU recording transformed using the wavelet transform. The Permutation Entropy (PE) was initially proposed in Reference [27] and it has been applied in many domains from healthcare, to analyze EEG recordings [28], to the detection of mechanical faults [29]. PE [27] is a complexity measure for a time series *x* of *T* elements and an embedding dimension *D* ≥ 2. The time series is embedded to a D-dimensional space *Xt* = {*x*(*t*), *x*(*t* + 1), ..., *x*(*t* + *D* − 1)}, with *t* ranging from 1 to *T* − *D* + 1. Given an ordinal set *RD* = {*r*1,*r*2, ...,*rD*}, where *r*1 < *r*2 < ... < *rD*, there are *D*! permutations *πi*, with i ranging from 1 to *D*!. Then, *Xt* is mapped to *πi*, so that {*x*(*t*), *x*(*t* + 1), ..., *x*(*t* + *D* − 1)} → {*r*1,*r*2, ...,*rD*} and {*x*(*t*) ≤ *x*(*t* + 1) ≤ ... ≤ *x*(*t* + *D* − 1)}. Probabilities *pi* of each *π<sup>i</sup>* are calculated as the number of occurrences of each *π<sup>i</sup>* out of the size of the dimensional space *T* − *D* + 1. The permutation entropy of order D is then calculated with the formula *PE* <sup>=</sup> <sup>−</sup> <sup>∑</sup>*D*! *<sup>i</sup>*=<sup>1</sup> *pi* × log *pi*, while the normalised permutation entropy is *NPE* = *PE*/*log*(*D*!). As the embedding dimension *D* ≥ 2 is an important factor, two values of it has been used in our evaluation. Higher values are not considered because of the limitation on the size of the segments, which are imposed by the equations described above. Lower values are also not significant. Approximate entropy (ApEn) is a recently developed statistic quantifying regularity and complexity, which appears to have potential application to a wide variety of relatively short (greater than 100 points) and noisy time-series data (as in this case, because IMU recordings can be noisy). Distribution entropy (DistEn) is also a recent entropy measure, which has been developed based upon the probability density of vector-to-vector distances in state space. Further details on Distribution Entropy are presented in Reference [30].

It is not known a priori which features are more relevant for classification and a feature selection process is adopted. An additional reason to reduce the number of features used for classification is related to the curse of dimensionality in machine learning. Because a small dataset (20 paths for 12 vehicles) is used in this analysis, a limited number of features should be used for classification to avoid the curse of the dimensionality [31]. In this paper, we adopt the rule of thumb described in Reference [32] which states that there should be at least 5 samples for dimension. In this case, we adopt a number of features in the range of 4 to 6 (20 samples for each car divided by 5 is equal to 4). Experimental evaluation also shows that the use of all the features decreases the accuracy (e.g., 0.818 using all features against 0.832 with the 4 best features with sample rate of 250 Hz and *Gyroy*). Because of its simplicity and time efficiency, in this paper, we adopt the RelieFF algorithm, which is part of the family of wrapper feature selection algorithms. The RelieFF algorithm [33] is based on the estimate of the quality of attributes according to how well their values distinguish between instances that are near to each other. The estimate can be implemented using the K-Nearest Neighbors (KNN) algorithm. Further details are presented in Reference [33].

#### 4.6.2. Spectral Domain Approach

The other technique is based on the transformation of the signal from the IMU recording in the frequency domain using the FFT. Then, the frequency domain representation is divided in segments (i.e., frequency bands) with a number *Nf req*, which is one of the hyper-parameters identified in the Section 5. The Root Mean Square (RMS) of each frequency band is calculated to obtain *Nf req* features. This approach is similar to the continuous authentication approaches for human beings, which are described in the related work in References [13,17]. The rational for this approach in vehicle authentication is that the main components of a vehicle (e.g., tyres, absorbers and wheels) will have smoother or harder reactions to road irregularities depending on the vehicle model. The smoother or harder response corresponds respectively to a lower occupation of the frequency bands or a higher occupation of the frequency bands. This can be seen from Figure 8, which shows the amplitude frequency representation with *Nf req* = 6 of each vehicle for the same segment I using a sample rate of 500 Hz and *Gyroy*. It can be seen from the Figure that the first 6 vehicles (all Pandas) have a similar distribution of the amplitude coefficients, while the other vehicles have a different distribution. In more detail, the amplitude of the second frequency component is only slighter less than the first frequency component, while there is wide gap between the amplitude of the first and second frequency components in the vehicle number 12, which is of a completely different brand and model and more importantly of a different class—sport car. On the other side, the frequency response can depend by many different factors including the speed of the vehicle and how it can change from one lap to another. Then, the goal is to identify the optimal hyper-parameters, which can optimize the vehicle classification regardless of the vehicle speed. Note that the frequency representation of a time series is complex and it is not known a priori which components of the time series representation are useful for classification. As shown in the results Section 5, the magnitude component of the frequency domain representation is much more relevant for classification than the phase component. Because of these reasons, the amplitude component of the FFT transform is used. It could be argued that other spectral representations apart from the one based on FFT could also provide a good classification accuracy. This paper has also used a wavelet based representation but the results are slightly worse than the frequency domain representation as shown in Section 5.

### *4.7. Machine Learning*

The authors of this paper used different machine learning algorithms to produce the results shown in Section 5. Considering that it is a small subset of data, only shallow machine learning algorithms have been used. A brief description of the algorithms is provided here.


• SVM is a supervised algorithm, which learns to classify the data points (e.g., originating from the observables), from the labeled training samples (e.g., the reference fingerprints). SVM separates the labeled set in two areas on a multi-dimensional surface by using a separating function, which can be of different types—linear, Radial Basis Function (RBF), polynomial, sigmoidal are the most common. Because the multi-dimensional surface is divided in two areas, SVM is a binary classifier and it can be directly used to distinguish between two mobile phones or for validation (to validate that the claimed identity of a mobile phone). The extension of SVM to multi-classifier identification has been proposed by various authors. In this paper the OneVsOne approach is used. The RBF kernel is used and the RBF scaling factor *γ* must be optimized together with the C factor of the SVM algorithm.

Two main classification metrics are used—(a) the confusion matrices where each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class and (b) the identification accuracy, which is the sum of the true positive plus the true negatives divided for the total number of samples.

**Figure 8.** Amplitude of the components (*Nf req* = 6) in the frequency representation for each of the 12 vehicles using Gyroscope Y and the 500 Hz sample rate.
