*4.2. Data Collection*

The first step is to collect the data from each vehicle. As described in the Section 3, the same driver was used for all the recordings. The driving was performed in different environmental conditions (e.g., sunny days, rainy days) always on the same loop for 20 times. The data were recorded from the Xsens IMU with a sample rate of 2000 Hz, which is much higher than the sample rate available in commercial smartphones. For this reason, the sample rate was decimated to a lower sample rate as described in the Section 5. The car was driven at different speeds in the EL because of the varying traffic conditions.

#### *4.3. Synchronization and Laps Extraction*

The second step is to synchronize the data recordings and to extract the identified laps. To facilitate this step, each lap driving was performed in the following way—the car was stationary on *OP*01 at the beginning of each lap with a running engine for 10 s. This initial Background Noise Window (BNW) segment was used for laps synchronization and extraction. To compute an overall motion of the car the Kalman filter is used to estimate three dimensional angular velocity from the gyroscope sampled to 20 Hz. The variance computed from the BNW is used as a filter parameter for the noise reduction. Then the computed angular velocity is normalized to only one dimension and the one-dimensional convolution is applied. The convolution matrix is normalized to 20 elements long (due to the 20 Hz down-sample frequency). The start and end of each lap is calculated using the value of the angular velocity, when it is less than the threshold for the specific time defined above. The output of this procedure is a time table with starting and ending times of each lap. A description of the pseudo algorithm is shown in Algorithm 1. The visualization of the synchronization and laps extraction for the first two laps is shown in Figure 6.


```
Function veloMean(data):
   delta = ∑
           (a,b)∈{(x,y),(x,z),(y,z)}
                              abs(a − b);
   return mean(norm(data), delta);
Function bound2lap(bound):
   start = detectStarts(bound);
   ends = detectEnds(bound);
   return timetable(start, ends);
Input:
   data = IMU raw data at 2000 Hz
   bnw = BNW part of IMU raw data
Main:
velo = downsample(data, 20Hz);
vari = variation(bnw);
velo = kalmanFilter(velo,vari);
velo = veloMean(velo);
convoMat = [ ( 1
               20 )1,( 1
                     20 )2,...,( 1
                               20 )20 ];
convo = convolution(velo, convoMat);
thres = mean(velo) + std(velo);
bound = convo < thres;
lapsTimetable = bound2lap(bound);
quater = findQuaternionRot(bnw, vari);
quater = mean(quater);
data = quaternionRot(data, quater);
return data, velo, lapsTimetable;
```
**Figure 6.** Detailed representation of the first two laps after synchronization. This plot shows that the proposed synchronization and laps extraction approach does not lose data between the laps (area highlighted with a green colour).

#### *4.4. Segments Extraction*

In the following step, the different road segments are extracted from each lap. The goal is to divide the entire lap in specific segments on which the classification process for vehicle continuous authentication is performed. The issue is already known in literature for continuous authentication of persons [13,14] and it is obviously based on the consideration that the speed of the car will never be exactly the same across laps in a similar way to a person walking or moving at different speeds. A possible solution to the different cars' speed problem across laps is to re-sample the laps records to have the same number of data points. The result of such a squeezing effect is similar to the case in which every car has the same speed pattern in every lap. It is important to divide the lap into many smaller parts similar for every car and to squeeze them separately. For detecting those parts we took advantage of the shape of the EL with many curves. Every one of them and also the readings between them are considered as a separate parts.

To detect turns, the z-axis of angular velocity is used. It is negative during the right turn, positive during the left turn and very close to zero when the car drives straight. One dimensional convolution is applied. The turns are detected separately in every lap of current car to minimize the potential speed biases between those laps. The threshold is set on the sum of median and one fourth of standard deviation of the velocity in the current lap. Every velocity value between the negative threshold and the positive threshold is considered as a turn. This method also detects many smaller turns which are invalid for the next usage. The second level of detection is applied to erase those small turns. For every turn, the area between its velocity and threshold is computed. If this area is smaller than the length of this threshold, it is marked as a false turn and erased. The output of this procedure is a time table with the starting times of each detected segment for each lap. We were able to detect sixteen of them. For more information, see Figure 7. For the description of the pseudo algorithm, see Algorithm 2.

**Figure 7.** An example of the segments identified in a lap of the first car. The first plot shows the convoluted z-axis of the axial velocity with its positive (right turn) and negative (left turn) threshold. The Operational Point (OP) positions from Figure 1 are marked above in form of text arrows. The detected turns are shown at the second subplot (1 means turn and 0 means straight drive). The vertical gray lines separates different segments which are individually marked by yellow rectangles with alphabetical letters. Those letters represents segment names and stands for: A-Start, B-StartTurn, C-FastFirstBump, D-PreRound, E-RoundOne, F-SecodBump, G-RightCurve, H-WindowOne, I-CrossOne, J-VisitBump, K-CrossTwo, L-WindowTwo, M-LeftCurve, N-WindowThree, O-RoundTwo, P-WindowFour. The orange rectangles identify the seven segments used for the machine learning classification. The last subplot shows the z-axis of the acceleration with the identified EL road landmarks from the Figure.

#### **Algorithm 2:** Detect laps parts for one car.

#### **Input:**

```
data = compen. data from Algorithm 1
   velo = z-axis of velo from Algorithm 1
   laps = lapsTimetable from Algorithm 1
Main:
bound = Null · [size(velo)];
velo = downsample(velo, 20Hz);
convoMat = [ ( 1
               20 )1,( 1
                     20 )2,...,( 1
                                20 )20 ];
velo = convolution(velo, convoMat);
for lap in laps do
   tmp = velo.lap;
   thres = median(tmp) + 1
                           4 std(tmp);
   thresRA = max(tmp) - thres;
   turns = -thres < tmp < thres;
   turns = removeRA(turns, thresRA);
   for turn in turns do
       area = curvesArea(turn, ± thres);
       thresArea = size(turn) · ± thres;
       if area < thresArea then
          turns = removeTurn(tun);
       end
   end
   turns = removeTime(tuns, last10s);
   bound.lap = turns;
end
partsTimetable = bound2part(bound, laps);
return partsTimetable;
```
#### *4.5. Segments Records Re-Sampling*

For every segment, its longest occurrence is found between records of every car and its every lap. This maximal segment is used as a reference and re-sampled directly to the desired frequency. Then the algorithm executes throughout the rest of the segment occurrences (for all cars and laps) and re-sample them with individually computed sample rates so that it will match the length of the already re-sampled reference segment. Because the longest segment was chosen as the reference, the computed sample rate are always higher than the desired sample rate. For the description of the pseudo algorithm, see Algorithm 3. The sample rate of our measurements (i.e., 2000 Hz) determined the lower and upper frequency boundaries of re-sampling approach. We found that frequencies below 50 Hz were not enough to hold usable data quality for classification. On the contrary, frequencies above 500 Hz were too high for down-sampling only and required up-sampling for some particular segments. In addition, we note that a sample rate above 500 Hz is unrealistic for most of the mobile phones available in the market.


```
for car in cars do
       for lap in lapsTimetable do
           if minPartSize < size(data.car.lap.part) then
              data.car.lap.part[from minPartSize + 1 to end] = null;
           end
       end
   end
end
return data;
```