**5. Results and Discussion**

Both the datasets selection and ECG signal pre-processing stages have already been described in the previous sections. The final stage is down to the user identification. It can be considered as a classification task, because the identification algorithm must match each ECG record to one of the existing users (classes). In general, the classification is done using machine learning techniques. The machine learning approach requires the selection of an appropriate algorithm that is powerful enough to model complex internal data relations and dataset splitting for a correct estimation of the classifier performance in real-world applications [23].

Machine learning algorithms have different natures, are based on different ideas and mathematical frameworks and are typically used in different applications. These factors should be taken into consideration when selecting the most suitable algorithm for ECG identification. The following seven algorithms have been chosen as the most promising: Logistic Regression, Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Naive Bayes, K Nearest Neighbor (KNN), Neural Networks or Multilayer Perceptron (MLP), Extreme Gradient Boosting (xGboost), Random Forest [24,25].

For multi-layer perceptron, the following configurations were used: 1 hidden layer with 50 neurons; 2 hidden layers with 50 and 30 neurons in each layer; and 3 hidden layers with 70, 50, and 30 neurons in each layer, respectively. The Rectified Linear Unit was selected as the activation function for the hidden layers and softmax as the activation function for the output layer. Training algorithm—RMSprop, number of training epochs—1000, learning rate—0.0001, batch size—100, loss function—categorical cross-entropy. For the other algorithms, we used the default configuration recommended by the sklearn framework (for example, in the case of PCA, the number of components was set to 30).

Dataset splitting requires their division into two subsets: a training and a test set. The samples from the training set are used to fit a classification model, while the samples from the test set are used to provide an unbiased evaluation of the model performance. The test set must be carefully prepared, as it should realistically represent the real-world data that the classification model would operate on.

As ECG-ID and LBDS have multiple records per user, we have split the test and the training set based on the records level. Some records will be randomly selected as the training set, while the remaining ones are included in the test set. Experiments will be conducted for the training and test set ratios of 0.7 and 0.3, respectively. To achieve a more realistic identification performance, the dataset split was done 5 times, after which the mean values for each subset was calculated.

For the MIT-BIH Normal Sinus rhythm and QT database, just one record per user is available. However, these records are of quite a substantial length. The idea is to use the time split for the training and the test set. In this case, the training and test ratios were also assumed to be 0.7 and 0.3.

Furthermore, as mentioned in Section 3, the classification models have been trained for two different scenarios: with and without PCA compression. The only exception here is neural networks, because they are complex non-linear models, which can learn efficient data compression in the first hidden layer. Thus, in this case it makes no sense to use PCA here. All results of our experiments are gathered in Table 2.

As one can see in Table 2, all of the algorithms seem to behave similarly across all of the datasets. Simple algorithms, like KNN and linear models (logistic regression, LDA, SVM), proved to work surprisingly well. Some other simple algorithms, like Naive Bayes, gradient boosting and random forest, performed relatively poorly. Neural networks also seem to guarantee a very high accuracy, which was pretty much expected, in view of their complex non-linear nature and modeling capacity. The PCA compression might slightly improve the accuracy for some datasets, while decreasing it for others. Consequently, it seems that there is no need to include PCA in the data preprocessing pipeline.


**Table 2.** ECG identification results.

The best accuracy was achieved by LDA and MLP for all four datasets. KNN shows high results for all datasets, except for the MIT-BIH Normal Sinus Rhythm. Given that this database is much larger compared to the other ones, it is not clear whether KNN would scale well enough for a larger number of users and records. MLP and xGboost were the most time-consuming algorithms to train whilst logistic regression and LDA were among the fastest algorithms.

Another important observation, based on the results from Table 2, is that the hardware parameters (e.g., measurement instrumentation, lead type, and sampling rate) do not affect the identification results significantly. The lowest accuracy was achieved for the ECG-ID database (potentially because of the highly skewed classes and larger number of users) and the MIT-BIH Normal Sinus Rhythm (potentially difficult to scale on a much bigger number of samples).
