4.1. Experimental Setup
The details of 16 ECG beat types are given in
Table 1. It can be seen from this table that data distribution among 16 classes is quite imbalanced. Class N has the largest size which is even more than 75,000, while the size of the smallest class S is only 2. In addition, other class sizes vary from tens to thousands. Such imbalance more or less enhances the classification difficulty. The beat data are collected from 235-point segments of the ECG recordings, so the original data dimension is 235. We illustrate these data in
Figure 1 by sampling the beat from each class. The waveform display may not be exactly medically-standard because we directly use the real data which suffer from variations and noises. However, these waveforms just reflect the challenge we meet in this issue: ECG beat data are complicated and their classes are easy to be confused.
Our proposed method can be denoted by “MLR + MBD/MPD” in short, where MBD is the first recommended set-based dissimilarity, and MPD is the second recommended one. First, we demonstrate the effectiveness of the whole scheme in comparison with other representative methods. In this part, we evaluate the overall performances of the methods for all beat classes as well as their separate performances on each beat class. Second, we evaluate the components of the proposed model to justify their roles and test the model flexibility. In this part, we evaluate several different learned metric spaces for the recommended set-based dissimilarities, and compare these dissimilarities with other analogous ones. Moreover, we also evaluate the function of the regularizer of MLR as well as the minority size of MBD. Third, we discuss the critical parameters involved in the method to check how they influence the results. In this part, we pay attention to discussing the set size of MBD/MPD, as well as the trade-off parameter of MLR. Fourth, we validate the robustness of the proposed method by confirming its stability and reliability against different feature spaces. In this part, we evaluate the method by means of different features based on six wavelet types with two encoding ways. At last, we compare the proposed approach with other competitive techniques based on AAMI standards to show its compatibility and capability.
The evaluation rule is as follows. We perform 10-fold cross-validation. In each trial, we randomly halve the training and testing data. In addition, we keep the same train-test splits for method comparison. In the training stage, for the classes whose sizes are much larger than the others, such as classes N, A, L, R, P and V, we limit the training data number by randomly selecting at most 500 samples to alleviate the class imbalance problem. In the testing stage, we match between the testing and training sets to decide the testing set label. This process is on the basis of set-integrity assumption: the samples within the set are relevant.
There are three choices for instantiating the abstract operation of MBD in Equation (
7). According to the preliminary tests on datasets MLII and MLV in
Table 2, we find that the sum-operation is a little more effective than the other two, so we directly utilize the sum-operation. By the way, the choice of max-operation is also admissible for MBD because of its similar performance to the sum-operation.
4.2. Effectiveness Evaluation
We demonstrate the effectiveness of our proposed method in comparison with SVM [
2], Neural Network (NN) [
35], and Linear Discriminant Analysis (LDA) [
4]. They belong to the traditional methods for ECG beat classification based on the single-sample manner. Hence, we treat single-sample classification in original feature space with Euclidean metric as the method baseline.
During implementation, for the proposed method, we set the testing set size to be 50 samples for each beat class except for types J, Q, e and S, because, in total, they only have 42 samples, 17 samples, eight samples, and one sample for testing, respectively. For classes J, Q, and e, we fix the set size to be five and, for S, we treat this single sample as the whole set. As for MBD, we set the minority size as the sample number of the smaller one by default. If the minority size is smaller than one, we just maintain the size as one. For SVM, we select the Radial Basis Function (RBF) kernel with the default gamma value 2, set the penalty parameter to be 10, and make the stopping criterion as . For NN, we adopt the five-layer structure, in which the sizes from the bottom layer to the top layer are 36, 31, 26, 21 and 16, respectively. For the hidden units, we choose the hyperbolic tangent activation function. To learn such structure, we utilize the stochastic gradient descent algorithm, in which the batch size is set as half of the training data size, and the learning rate is set as two. For LDA, we keep the subspace dimension as 26 for dimension reduction.
The results on MLII and MLV are reported in
Table 3 and
Table 4, respectively, in terms of classification rate on Rank-1 and Rank-5. Here, the classification rate on Rank-1 is identical with accuracy, which is the ratio of the number of correctly classified patterns to the total number of patterns classified. For SVM and NN, we only record the accuracy because they directly produce the classification results rather than the distance scores for ranking. Note that, for multi-class classification, accuracy is more suitable than other evaluation metrics which are common in binary classification, such as sensitivity, specificity, and so forth. From the results, we can see that the proposed method significantly outperforms the compared widely-used approaches.
As our first recommended set-based dissimilarity, MBD performs better than MPD in the learned metric space. SVM and NN seem to drag down the performance because of the over-fitting and variation problems. However, these problems have been well handled by the proposed method by contrast. The advantage of our approach over the compared methods reflects the benefits from the suitably exploited set-based information in the learned metric space. To be frank, although tuning the parameters of these compared methods may perhaps bring some small performance fluctuation, such limited performance fluctuation will not influence the comparison results between our approach and them due to their significant performance gap.
We also evaluate the method performance for each independent beat class in
Table 5 and
Table 6. It can be observed from the results in
Table 5 that, generally, the proposed method performs better than the other methods. For classes L, a, W, j, E, P and f, MLR + MBD even achieves saturated results. For classes e and S, our proposed method has unsatisfactory performance because the small sample number in their classes are adverse to exploiting the set-based information. However, for these two classes, the performances of other approaches are also quite low as well due to the limited training data. In addition, class Q displays an overall inferior performance for all methods as expected on account of its unclassifiable property. In
Table 6, the proposed method outperforms other compared approaches on the whole. In addition, MLR + MBD acquires the saturated performance on classes L, a, j, E, P and f, which are similar to the results on MLII in
Table 5. Moreover, class S fails all the methods. For e, although the proposed method is inferior to other methods, the winner also has quite low performance. Class Q is still unclassifiable as its label name suggests. Their low performances are also similar to the situation in MLII in
Table 5.
4.3. Component Analysis
We also evaluate the two important modeling components in our method: metric learning and set-based dissimilarity.
On the one hand, we instantiate the metric learning component by several capable models other than MLR in the scheme. These models include Large Margin Nearest Neighbor (LMNN) [
45], Information-Theoretic Metric Learning (ITML) [
46], and Local Fisher Discriminant Analysis (LFDA) [
47]. It can be seen from the results in
Table 7 and
Table 8 that, although the effectiveness of set-based dissimilarity MBD and MPD limits the performance enhancing space, MLR still behaves stronger than other compared metric learning models in boosting their performances. The performance gap between MLR + MBD/MPD and LMNN + MBD/MPD is not large. This is because MLR and LMNN share a similar inner mechanism despite their different modeling ways: they pursue an optimized relative comparison relationship between intra-class and inter-class distances. Therefore, LMNN can be used as an alternative metric learning component to substitute for MLR in the scheme. Such modeling freedom actually reflects the flexibility of our method, which also ensures its developing potentiality in the future.
On the other hand, we analyze the set-based dissimilarity component by comparing MBD, MPD, APD, and MAD [
42]. They encode the point-to-point distances into the set level in different ways. MPD measures the minimum distance between sets, APD measures the average distance between sets, and MAD balances the strategies of MPD and APD by measuring the mean approach distance between sets, while MBD measures the minority distance between sets, which generalizes both MPD and MAD. The results in
Table 7 and
Table 8 show that MBD and MPD are substantially more effective than MAD and APD, which implies that the discriminative information is primarily contained within the outer part instead of the central part of the sets. Furthermore, the advantage of MBD over MPD indicates that, in the outer part of the sets, the inner minority is more discriminative than the outmost periphery.
Moreover, we also analyze the regularizer alternatives of the MLR component in Equation (
5). The results in
Table 9 and
Table 10 show that changing the regularizer has a minor influence on the whole framework. This also embodies some flexibility of our method.
Furthermore, we evaluate the performance of MBD under different sizes of minorities formulated in Equation (
7). The sizes of minorities are determined by
and
. We use the same quantity of minorities for both measured sets:
, where
α denotes the size of minorities and
β denotes the ratio. If
, then we set
. The results for MBD in both Euclidean and learned metric spaces are reported in
Table 11 and
Table 12. We can see the effectiveness of minority-based strategy by comparing the results between using the partial subset from
to
and using the whole set by
. The performance for both MBD and MLR + MBD present a rising tendency when
β decreases by the large, respectively. In addition, when
β is approximately within the range
, MBD has relatively good performance in both Euclidean and learned metric spaces.
On the whole, MLR is a global metric learning method that optimizes the distance comparison relationship for all the samples in the feature space. Both effectiveness of MLR itself and improving the set-based dissimilarity for measurement based classification directly justify the goal of this model to make intra-class distances smaller than inter-class distances. In the learned metric space, local variations of the sample set become the main problem that impedes the performance of set-based dissimilarity. The effectiveness of MBD in such space just validates its capability to cope with such problem.
4.4. Parameter Discussion
We further discuss the critical parameters involved in the proposed method: the set size (the sample number of set) for MBD and the trade-off parameter for MLR. Because MPD is the second recommended set-based dissimilarity in our model, we also put it into discussion.
At first, we evaluate the performance of MBD and MPD in both Euclidean and learned metric spaces, respectively, under the set size varying from 10 to 100 with a step length of 10. The performance with the set size larger than 100 is less meaningful because the performance may be deteriorated or exaggerated due to the statistical bias caused by the limited sample number of a few beat classes.
By combining the results in
Table 13 and
Table 14, we find that the performance first increases and then decreases on the whole except the minor fluctuation. Hence, too large and too small set sizes may degrade the capability of both MBD and MPD; when the set size falls in the approximate range
, they can play comparably good performances in both Euclidean and learned metric spaces.
Then, we evaluate the MLR trade-off parameter
C in Equation (
5). This parameter plays the role in balancing slack variable and regularizer. Usually, when the different-class distributions are seriously overlapped in the feature space, the smaller parameter is more desirable, and, conversely, when the distributions are relatively separable, the larger one seems better. We testify a group of trade-off parameters
to reveal how they influence the method performance of MLR + MBD/MPD. We also record the results of MLR under these trade-off parameters directly without doing set-based dissimilarity measurement. For convenience, we provide the baseline performance as reference.
By observing the results in
Table 15 and
Table 16, we find that, for MLR, the smaller parameters give rise to the better performance than the larger ones, which hints that the sample overlapping situation indeed exists in the feature space. Actually, the soft manner from the slack variable of MLR can cope with such a situation to a certain degree. In greater detail of the results, the comparison between the first and second rows shows the trade-off parameter has no influence on the superiority of MBD to MPD in the learned metric space; the comparison between the second and third rows manifests the effectiveness of set-based strategy in the learned metric space regardless of the settings of such parameters; the comparison between the third and fourth rows displays the positive role of metric learning itself in improving the space discriminability during single-sample classification.
4.5. Robustness Validation
The robustness of the proposed method can be demonstrated by its stable effective performance against different feature spaces as well as different datasets.
Firstly, we evaluate the proposed method in different feature space based on various wavelet transforms. These features are concatenated by either all or statistics (maximum, minimum, mean, and variance) of the decomposition coefficients of wavelets. The wavelets include Bior 6.8, Daubechies 14 (Db 14), Symlets 8, Coiflets 5, Fejer-Korovkin 22 (FK 22), and Reverse Bi-orthogonal 6.8 (RBior 6.8).
We can confirm the robustness of our approach by the results in
Table 17 and
Table 18 that the performances present a stable stair-wise increase from baseline through MLR + MPD to MLR + MBD irrespective of features and datasets. Moreover, the performance discrepancy on the same dataset is not that large among these features, which also shows some flexibility on feature component selection in the scheme. Furthermore, the features concatenated by all of coefficients seem to perform better than statistics of coefficients. This is because the latter is much higher compressed than the former, but the compressed representation is inevitably accompanied with the information loss. However, it is obvious that the full use of coefficients will lead to the high dimensionality of feature vectors, which will add to the burden of computing and storage. Actually, running methods based on features concatenated by statistics of coefficients takes much shorter time and smaller memory than those composed of all the coefficients. By contrast, the strategy using statistics of coefficients better regards the balance between effectiveness and efficiency of feature representation.
4.6. Technique Comparison
ECG beat classification is a well studied problem. In this paper, we deal with this issue from the 16-class classification perspective. This setting is different from much traditional research using AAMI standards (ANSI/AAMI EC57: 1998) that group ECG beats in the MIT-BIH Arrhythmia Database into five big classes [
4,
7,
11,
24,
27,
28]. To compare our method with these techniques in literature, we also conduct experiments on dataset MLII by means of such a widely-used standard classification scheme. Actually, the considered 16 classes contain all the 15 beat types addressed by the AAMI standards as well. According to the standards, the five big classes are Non-Ectopic Beat, Supra-Ventricular Ectopic Beat, Ventricular Ectopic Beat, Fusion Beat, and Unknown Beat. More specifically, Non-Ectopic Beat include beat types N, L, R, e and j; Supra-Ventricular Ectopic Beat include beat types a, A, S and J; Ventricular Ectopic Beat include beat types E and V; Fusion Beat include beat type F; and Unknown Beat include beat types Q, P and f.
In experiments, we use the feature composed of all decomposition coefficients of wavelet Db 14 due to the capable performance of this feature observed. We record the performance of the proposed method and the recent representative techniques in
Table 19. This table shows good performance of the proposed method for ECG beat classification by the popular AAMI standards, which further confirms the method compatibility and capability as well.