1. Introduction
Obstructive sleep apnea (OSA) is characterized by repeated collapse of the upper airway during sleep. It blocks the airway and then causes shallow and laborious breathing [
1]. OSA is very common in patients with cardiovascular disease and is associated with an increased incidence of stroke, heart failure, atrial fibrillation, and coronary heart disease. Severe OSA is further associated with increased all-cause and cardiovascular mortality [
2]. OSA affects approximately 9–24% of the general population, but the number of patients who have been diagnosed is very limited, and about 90% of sufferers are still undiagnosed [
3]. Hence, early diagnosis and treatment of OSA can reduce adverse human health conditions.
The standard approach for the diagnosis of OSA is based on the respiratory signals (including nasal airflow, thoracic and abdominal movements) and blood oxygen concentration measured by polysomnography. The measurements of respiratory irregularities during sleep include apneas and hypopneas. An apnea is a complete or almost complete cessation of airflow, lasting ≥10 s, and is usually associated with oxygen desaturation. A hypopnea is a reduction in airflow (<70% of a baseline level) associated with oxygen desaturation. The apnea-hypopnea index (AHI) [
4], defined as the sum of apneas and hypopneas per hour of sleep, is widely used for diagnosing the severity of OSA, and includes normal (AHI < 5), mild (5 ≤ AHI < 15), moderate (15 ≤ AHI < 30), and severe (AHI ≥ 30) levels. The most serious limitation of polysomnography is that it is inconvenient, time-consuming and expensive. It is an overnight test at a sleep center or hospital and requires numerous electrodes and sensors to monitor various sleep physiological signals.
In recent years, many studies have focused on the development of a more convenient and less expensive OSA diagnostic system based on the analysis of the single-lead ECG signals. Most of them extract and classify the features from the ECG signals, RR intervals, heart rate variability (HRV), or ECG-derived respiration (EDR) signals. It has been shown that the EDR signal can be used to approximate the respiratory rate, and even the respiratory wave morphology [
5,
6,
7]. Hayano et al. [
8] further reported that OSA would cause cyclic variation in the heart rate. Hassan et al. [
9] extracted the features of the ECG signals based on the tunable-Q factor wavelet transform, and classified the data using a machine learning algorithm, namely random under sampling boosting (RUSBoost). Rachim et al. [
10] decomposed the ECG signals into five levels using wavelet decomposition and then extracted 15 features from the detail coefficients (D3–D5). The principal component analysis and support vector machine were applied for feature dimension reduction and classification, respectively. Sharma et al. [
11] and Sharma et al. [
12] extracted the features from the ECG signals based on the optimal biorthogonal antisymmetric and orthogonal wavelet filter banks, respectively, and introduced the least squares and Gaussian support vector machines (SVM) for classification, respectively. Our previous study [
13] proposed a one-dimensional (1D) convolutional neural network (CNN) model which can automatically learn the features of the ECG signals and classify the normal and apnea events. Wang et al. [
14] and Wang et al. [
15] proposed a modified LeNet-5 CNN model and a deep residual neural network, respectively, to extract and classify the features from RR intervals. The HRV and EDR signals were decomposed into different modes using the variational mode decomposition proposed by Sharma and Sharma [
16], and a K-nearest neighbor classifier was designed for classification. Pinho et al. [
17] extracted the features from the HRV and EDR signals based on the time-domain and spectral-domain measures and designed the artificial neural networks (ANN) and SVM for classification.
The above-mentioned research has proposed a variety of methods to extract and classify the features from the ECG and ECG-derived signals. However, when we use the 1D CNN model [
13] to automatically learn the features of RR intervals from ECG signals, they are easily affected by low-frequency and large amplitude P and T waves. Hence, this study aims to further evaluate whether the reduction in the low-frequency P and T waves can improve the accuracy of detecting apnea events. This study proposed filter bank decomposition with Butterworth bandpass filters to decompose the ECG signal into 15 subband signals, and a one-dimensional (1D) convolutional neural network (CNN) model independently cooperating with each subband to extract and classify the features of the given subband signal. The original subject-dependent and newly selected subject-independent training and test datasets using 70 ECG recordings from the MIT PhysioNet Apnea-ECG database [
18,
19] were used in this study to evaluate the contribution of different subbands.
The remainder of this paper is organized as follows.
Section 2 describes the training and test datasets of one-minute ECG signals from the MIT PhysioNet Apnea-ECG database and demonstrates the proposed apnea detection system based on the filter bank decomposition and 1D CNN model. Results are given in
Section 3. A discussion of the study findings is provided in
Section 4. Finally,
Section 5 concludes this study.
3. Results
The original subject-dependent and the newly selected subject-independent training and test datasets from the 70 ECG recordings of the MIT PhysioNet Apnea-ECG database were used to assess the performance of the proposed system for detecting normal and apnea events. The performance parameters for per-minute apnea detection, including accuracy (Acc), sensitivity (Sen), and specificity (Spec), were calculated as follows [
23]:
where TP (true positive) and TN (true negative) are the number of events correctly identified as apnea and normal events, respectively, and FP (false positive) and FN (false negative) are the number of events incorrectly identified as apnea and normal events, respectively.
The proposed 1D CNN model was trained and tested using the preprocessed subband signals of the training and test datasets for a given subband, respectively. Each experiment for training and testing included 50 epochs, and the training and testing accuracies were recorded in each epoch. Because the weights of the CNN and FC layers were initialized with random values, only one experiment may obtain underestimated accuracy of the network. Hence, we repeated the experiment five times and selected the highest test accuracy to determine the per-minute accuracy of each subband. Each ECG recording can be further diagnosed as a non-OSA subject or an OSA patient according to AHI based on the results of per-minute apnea detection. The AHI is defined as the average value of 1-min signals which are identified as apnea events per hour. If the AHI is greater than or equal to 5, the ECG recording is diagnosed as an OSA patient, otherwise it is a non-OSA subject [
13,
14,
16,
24,
25].
Table 4 lists the summary results of the per-minute and per-recording analysis using the ECG signals in different subbands for the subject-dependent and subject-independent test datasets. The per-minute accuracy using the subband signals with the frequency band from 0.5 Hz to 49.5 Hz without z-score normalization can reach 86.1% in the subject-dependent test dataset but is only 74.4% in the subject-independent test dataset. The use of the z-score normalization slightly increased the per-minute accuracy of the subject-dependent test dataset from 86.1% to 86.7% for the frequency band of 0.5–49.5 Hz, but significantly increased the per-minute accuracy of the subject-independent test dataset from 74.4% to 80.7%.
The per-minute accuracies between different frequency bands do not differ greatly in the subject-dependent test dataset, but there is a bigger difference in the subject-independent test dataset. The difference between the per-minute accuracies of 0.5–25 Hz and 25–49.5 Hz using the filter bank with two filters is only 0.2% (87.3% vs. 87.5%) in the subject-dependent test dataset but reaches 6.0% (80.4% vs. 86.4%) in the subject-independent test dataset. The difference between the minimum and maximum per-minute accuracies using the filter bank with four filters is only 2.0% (85.9% of 12.5–25 Hz vs. 87.9% of 25–37.5 Hz) in the subject-dependent test dataset but reaches 4.8% (81.1% of 0.5–12.5 Hz vs. 85.9% of 25–37.5 Hz) in the subject-independent test dataset. The difference between the minimum and maximum per-minute accuracies using the filter bank with eight filters is only 2.7% (85.9% of 6.25–12.5 Hz vs. 88.6% of 18.75–25 Hz) in the subject-dependent test dataset but reaches 6.4% (79.5% of 0.5–6.25 Hz vs. 85.9% of 31.25–37.5 Hz) in the subject-independent test dataset. The highest per-minute accuracy is 88.6% of 18.75–25 Hz with a specificity of 91.5% and sensitivity of 83.8% in the subject-dependent test dataset and is 86.4% of 25–49.5 Hz with a specificity of 87.7% and sensitivity of 84.3% in the subject-independent test dataset.
A higher per-minute accuracy does not always correspond to a higher per-recording accuracy. The highest per-recording accuracies in the subject-dependent test dataset are 100% of 0.5–12.5 Hz with per-minute accuracy of 87.4%, specificity of 93.1%, and sensitivity of 78.1%, and 100% of 31.25–37.5 Hz with per-minute accuracy of 87.5%, specificity of 90.6%, and sensitivity of 82.4%. The highest per-recording accuracy in the subject-independent test dataset is 100% of 31.25–37.5 Hz with per-minute accuracy of 85.8%, specificity of 89.4%, and sensitivity of 80.1%.
4. Discussion
The most obvious effect of OSA on ECG signals is the heart rate or RR interval. A previous study reported that OSA would cause cyclic variation of heart rate (CVHR) consisting of bradycardia during apnea followed by abrupt tachycardia on its cessation [
8]. In other words, the RR intervals would increase during apnea events, and would decrease after these events. However, when we use the 1D CNN model to automatically extract the features of RR intervals from ECG signals, they are easily affected by low-frequency and large-amplitude P and T waves. Accordingly, in this study it was assumed that if we can reduce the P and T waves to enhance the high-frequency R wave, it would be easier to highlight the characteristics of the RR interval and then improve the accuracy of the proposed apnea detection system.
In order to evaluate whether the reduction of lower frequency P and T waves can increase the accuracy of the detection of apnea events, this study proposed the use of filter banks with two, four, and eight Butterworth bandpass filters to decompose the 1-min ECG signal with a bandwidth of 50 Hz into two, four, and eight equal-bandwidth subband signals with bandwidths of 25 Hz, 12.5 Hz, and 6.25 Hz, respectively. A total of 15 subbands were included in this study. Each subband independently cooperated with a 1D CNN model to extract and classify the features of the given subband signal for evaluating its accuracy of apnea detection. The original subject-dependent and newly selected subject-independent training and test datasets from 70 ECG recordings of the MIT PhysioNet Apnea-ECG database were used to evaluate the accuracies of detecting apnea events for ECG signals in different frequency subbands.
The previous studies proposed various apnea detection methods based on features extracted from ECG and ECG-derived signals. The ECG recordings from the MIT PhysioNet Apnea-ECG database were most commonly used to train and test their proposed methods.
Table 5 compares the method and performance of the proposed 1D CNN model with the previous studies for the per-minute apnea detection using subject-dependent datasets from the MIT PhysioNet Apnea-ECG database. The per-minute accuracy of 88.6% using the subband of 18.75–25 Hz proposed by this study outperforms several previous studies [
13,
14,
16,
24,
25,
26] listed in the first part of
Table 5 using the same subject-dependent datasets (the original 35 ECG recordings for training and 35 ECG recordings for testing) as this study. This study and the studies of Chang et al. [
13], Wang et al. [
14], and Li et al. [
25] proposed feature-learning-based methods which can automatically learn the features of ECG signals or RR intervals using neural networks. The proposed 1D CNN model only used filtered and normalized 1D ECG signals as input signals and hence did not require additional signal transformation, R-peaks detection, RR interval or EDR calculation. The per-minute accuracy could reach 87.9% in the study of Chang et al. [
13]. They used Butterworth bandpass filtering with a preselected frequency band from 0.5 Hz to 15 Hz and z-score normalization for the preprocessing of ECG signals, and the 1-D CNN model for feature extraction and classification. In comparison with this study, they did not evaluate the contribution of different subbands, and only used the original subject-dependent datasets. Wang et al. [
14] reported a per-minute accuracy of 87.6%. They proposed a modified LeNet-5 convolutional neural network to automatically extract and classify the features of the input RR intervals. Li et al. [
25] achieved 84.7% accuracy for the per-minute apnea detection. They introduced a sparse auto-encoder to automatically extract features and proposed a decision fusion method to improve the classification accuracy. The studies of Sharma and Sharma [
16], Song et al. [
24] and Surrel et al. [
26] focused on feature-engineering-based methods. Sharma and Sharma [
16] achieved an accuracy of 87.5% for per-minute classification. They decomposed the HRV and EDR signals into different modes using the variational mode decomposition and used the K-nearest neighbor classifier. Song et al. [
24] reported a per-minute accuracy of 86.2% using a sleep apnea detection approach based on the hidden Markov model. Surrel et al. [
26] computed apnea-scores for RR intervals and RS amplitudes using a time-domain filtering and power estimation, and classified normal and apnea events using SVM, which can achieve a per-minute accuracy of 85.7%.
The second part of
Table 5 further compares several studies which only used the original 35 ECG recordings of the training dataset from the MIT PhysioNet Apnea-ECG database to train and test their models based on the k-fold cross-validation method. Because these studies did not specify that the ECG signals from the same study subject were not distributed across different folds, they would appear in both the training and test datasets, and hence their datasets were also subject-dependent. The accuracy reported by Wang et al. [
15] was 94.3%. They proposed a deep residual network to automatically learn the features from the RR intervals and to classify normal and apnea events using the 10-fold cross-validation strategy. The studies of Sharma et al. [
11], Sharma et al. [
12], and Pinho et al. [
17] developed feature-engineering-based methods. Sharma et al. [
11] and Sharma et al. [
12] reported average classification accuracies of 90.1% and 90.87%, respectively. Both of them extracted features based on the wavelet filter bank and classified normal and OSA groups using SVM. Pinho et al. [
17] obtained an accuracy of 82.12%. They selected 20 features from the RR intervals and EDR signals and used the artificial neural network for classification with the 10-fold cross-evaluation method. The study of Surrel et al. [
26] listed in the third part of
Table 5 further grouped the recordings by subject according to the metadata of recordings including the reported age, sex, height and weight. They reported a patient-specific accuracy of 88%, which used the first ECG recording from each patient to train the SVM classifier, and the other recordings to test it. Hence, their datasets were subject-dependent. Although our performance cannot be directly compared with those of the previous studies listed in the second and third parts of
Table 5 due to the use of different methods and datasets, it is worth noting that most of the previous studies adopted the subject-dependent datasets from the MIT PhysioNet Apnea-ECG database.
The main problem with using subject-dependent datasets is that similar ECG signals from the same subject appeared in both the training and test datasets, which may cause accuracy overestimation. Our study results using the original subject-dependent datasets in
Table 4 demonstrate that the per-minute accuracies are as high as from 85.9% to 88.6%, and have a high degree of consistency, such that the difference between the minimum and maximum per-minute accuracies is only 2.7%. Hence, the use of the original subject-dependent datasets cannot test the difference in the accuracy of different subbands. This result is different from what we expected above. We expected that the filtered ECG signals with a higher frequency band could better highlight the features of RR intervals and would have a higher accuracy in the detection of apnea events. The possible reason for the highly consistent accuracies may come from the fact that 23 of the 35 ECG recordings (x01 through x35) in the test dataset correspond to at least one ECG recording in the training set from the same subject. That is, the proposed CNN model uses many similar signals from the test dataset during training. Hence, the use of the original subject-dependent datasets may overestimate the accuracy of each subband. This important issue has not been paid attention to by most previous studies.
In order to allow the proposed CNN model to use ECG signals from different subjects during training and testing, this study further selected new subject-independent training and test datasets to train and test the proposed CNN model. It is obvious that the results of the newly selected subject-independent datasets shown in
Table 4 can meet our expectations, and they can demonstrate the difference in the accuracy of different subbands. The per-minute accuracy of 86.4% of the higher frequency band of 25–49.5 Hz is 6.0% higher than the 80.4% accuracy of the lower frequency band of 0.5–25 Hz using the filter bank with two filters. The mid-high frequency band of 25–37.5 Hz has the highest per-minute accuracy of 85.9% among the accuracies using the filter bank with four filters, which is 4.8% higher than the 81.1% accuracy of the lowest frequency band of 0.5–12.5 Hz. The per-minute accuracy 85.9% of the mid-high frequency band of 25–31.5 Hz is the highest among the accuracies using the filter bank with eight filters, which is 6.4% higher than the 79.5% accuracy of the lowest frequency band of 0.5–6.25 Hz. Furthermore, the mid-high frequency subbands of 25–49.5 Hz, 25–37.5 Hz, and 25–31.5 Hz improve the per-minute accuracies by 5.7% (86.4% vs. 80.7%), 5.2% (85.9% vs. 80.7%), and 5.2% (85.9% vs. 80.7%), respectively, in comparison with the full frequency band of 0.5–49.5 Hz. Hence, a mid-high frequency band that removes the low-amplitude P and T waves does indeed improve per-minute accuracy of detecting the apnea events in comparison with a low frequency band or a full frequency band.
Table 6 compares the method and performance of the proposed 1D CNN model with the study of Surrel et al. [
24] for the per-minute apnea detection using the subject-independent datasets from the MIT PhysioNet Apnea-ECG database. Although both studies used subject-independent datasets, their methods of selecting datasets were different from ours. To the best of our knowledge, only the study of Surrel et al. [
24] among the previous studies reported the subject-independent method to train and test their apnea detection system. Their training and test method was similar to the 28-fold cross-validation method, but the ECG signals from the same study subject were not distributed across different folds. They tested the accuracy of 28 patients one by one. The ECG recordings of one of 28 patients were used as the test dataset each time, and 35 recordings selected from the other 27 patients were adopted as the training dataset. Their per-minute accuracy reached 84% using the subject-independent method for training and testing, which is slightly lower than the accuracy of 86.4% reported by this study using the frequency band of 25–49.5 Hz.
If we further compare the results of the subject-dependent and subject-independent methods in the study of Surrel et al. [
26], we can find that their subject-independent accuracy of 84% in
Table 6 is lower than the subject-dependent accuracies of 85.7% and 88% in
Table 5. This result is consistent with this study. Our results in
Table 4 show that the per-minute accuracies of the newly selected subject-independent test dataset for all subbands in this study were all lower than those of the original subject-dependent test dataset. The differences are more obvious in the low-frequency subband. For example, 80.4% vs. 87.3% in the subband of 0.5–25 Hz, 81.1% vs. 87.4% in the subband of 0.5–12.5 Hz, and 79.5% vs. 86.4% in the subband of 0.5–6.25 Hz. These results can confirm that the use of the original subject-dependent datasets did overestimate the per-minute accuracy. Hence, the use of the newly selected subject-independent datasets is recommended to train and test the apnea detection system so as to avoid accuracy overestimation, instead of using the original subject-dependent datasets in the MIT PhysioNet Apnea-ECG database.
Although the per-minute accuracy of the subject-independent test dataset in this study can achieve 86.4% using the frequency band of 25–49.5 Hz, the corresponding per-recording accuracy is only 91.4%, with one non-OSA subject and two OSA patients being misdiagnosed. If we consider having better per-minute and per-recording accuracies at the same time, the use of the mid-high frequency band of 31.25–37.5 Hz has a slightly lower per-minute accuracy of 85.8%, but it can reach the per-recording accuracy of 100%.
The main limitation of this study is that the MIT PhysioNet Apnea-ECG database is a relatively small database that only contains 70 ECG recordings. This may affect the generalizability of the study results in clinical applications. Although our results have successfully demonstrated the contribution of different subbands, further investigation with larger clinical populations is required to optimize the proposed apnea detection system.