Next Article in Journal
A Nonlinear Suspension Road Roughness Recognition Method Based on NARX-PASCKF
Previous Article in Journal
PE-MCAT: Leveraging Image Sensor Fusion and Adaptive Thresholds for Semi-Supervised 3D Object Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coronary Artery Disease Detection Based on a Novel Multi-Modal Deep-Coding Method Using ECG and PCG Signals

Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
*
Authors to whom correspondence should be addressed.
Sensors 2024, 24(21), 6939; https://doi.org/10.3390/s24216939
Submission received: 14 September 2024 / Revised: 15 October 2024 / Accepted: 28 October 2024 / Published: 29 October 2024
(This article belongs to the Section Biomedical Sensors)

Abstract

:
Coronary artery disease (CAD) is an irreversible and fatal disease. It necessitates timely and precise diagnosis to slow CAD progression. Electrocardiogram (ECG) and phonocardiogram (PCG), conveying abundant disease-related information, are prevalent clinical techniques for early CAD diagnosis. Nevertheless, most previous methods have relied on single-modal data, restricting their diagnosis precision due to suffering from information shortages. To address this issue and capture adequate information, the development of a multi-modal method becomes imperative. In this study, a novel multi-modal learning method is proposed to integrate both ECG and PCG for CAD detection. Along with deconvolution operation, a novel ECG-PCG coupling signal is evaluated initially to enrich the diagnosis information. After constructing a modified recurrence plot, we build a parallel CNN network to encode multi-modal information, involving ECG, PCG and ECG-PCG coupling deep-coding features. To remove irrelevant information while preserving discriminative features, we add an autoencoder network to compress feature dimension. Final CAD classification is conducted by combining support vector machine and optimal multi-modal features. The experiment is validated on 199 simultaneously recorded ECG and PCG signals from non-CAD and CAD subjects, and achieves high performance with accuracy, sensitivity, specificity and f1-score of 98.49%, 98.57%,98.57% and 98.89%, respectively. The result demonstrates the superiority of the proposed multi-modal method in overcoming information shortages of single-modal signals and outperforming existing models in CAD detection. This study highlights the potential of multi-modal deep-coding information, and offers a wider insight to enhance CAD diagnosis.

1. Introduction

Coronary artery disease (CAD), a prevalent cardiovascular disorder, contributes significantly to a high mortality rate globally [1]. It arises from atherosclerosis, characterized by extensively accumulating cholesterol and fatty plaques along coronary walls. These plaques cause various physical ailments, involving fatigue, dizziness, myocardial ischemia, and even myocardial infarction in severe cases [2]. Terminal CAD progresses irreversibly, and eventually leads to the death of the patient. Therefore, timely and precise detection is essential in slowing CAD progression and improving the survival rate of patients [3]. Coronary angiography is the gold standard for diagnosing CAD, but its invasive and high-cost nature constrains its widespread use. In contrast, electrocardiogram (ECG) and phonocardiogram (PCG) are both noninvasive and cost-friendly tools for early CAD screening. However, due to the complexity of cardiovascular activities, single-modal data contain limited identifying information and present information shortages in CAD analysis. Multi-modal learning techniques by integrating diverse sources of information can mitigate this shortcoming and provide more adequate information for better revealing cardiovascular conditions. Consequently, multi-modal information can achieve better classification performance and attract more attention in diverse fields.
ECG, the most commonly used clinical tool, records microstructural changes of cardiac electrical signal during each heartbeat. However, early and moderate CAD patients manifest negligible symptoms in ECG waves, so these patients may be incorrectly diagnosed [4]. Conversely, PCG records synthesized sounds of heart mechanical vibration from autonomous movement of cardiac tissues, involving myocardial contraction, valve vibration and blood flow striking against artery walls. When one or more coronary arteries are occluded, blood flows through the narrowed vessels and forms turbulence, producing high-frequency murmurs. Nevertheless, in CAD cases with blockages more than 95%, weak high-frequency heart murmurs from turbulence in almost blocked coronary arteries may diminish or disappear; thus, it is difficult to observe weak diastolic murmurs from these patients’ PCG signals, and this poses a challenge in CAD detection [5,6]. As mentioned by the above studies, only single ECG or single PCG provides insufficient information in CAD diagnosis. Multi-modal information, combing both ECG and PCG information, can help physicians arrive at a more objective and accurate diagnosis. Moreover, ECG and PCG waves exhibit close temporal correlations during each corresponding heartbeat, such as the onset of the first heart sound closely aligning with the R-wave peak of the ECG, and the second heart sound being located close to the termination of the T-wave [7]. This synchronized nature underscores the potential of multi-modal learning methods that integrate both ECG and PCG signals and offer a wider view to better reveal cardiovascular conditions.
Numerous automatic detection techniques based on single ECG [8,9,10] or single PCG [11,12,13,14] for CAD classification have been widely proposed. However, multi-modal methods for analyzing CAD remain relatively scarce. Recently, the correlation between ECG and PCG has been addressed. An improved D-S theory fused ECG and PCG signals for cardiovascular disease identification and achieved superior accuracy over single ECG or single PCG by utilizing wavelet scattering transform to extract multi-modal time–frequency features [15]. Zarrabi et al. [16] proposed a novel decision system based on multi-modal features combining ECG, PCG and clinical data to predict the risk of myocardial infarction, and outperformed single-modal features. Li et al. [17] designed a novel dual-input neural network integrating ECG with PCG for classifying non-CAD and CAD, and the final testing result indicated that dual-input data had superior performance over single-input data. In the latest study, Li et al. [7] further validated the advantage of multi-modal learning by constructing ECG-net and PCG-net for encoding deep features from ECG and PCG, confirming the superiority of their multi-model method. Additionally, electromechanical coupling information between time intervals of ECG and PCG has been proposed [18]. Dong et al. [19] utilized a novel coupling analysis method based on time intervals of ECG and PCG signals to assess coronary blockage degree, and several types of entropies and cross-entropies were utilized to analyze coronary blockage degree, which underscores the potential of multi-modal fusion in providing more accurate information of the cardiovascular condition.
In the process of computer-aided CAD detection, feature extraction is key in characterizing ECG and PCG signals. A variety of linear features, including time-domain [11,20,21,22], frequency-domain [11,20,21,22] and time–frequency features [8,13,20], have been presented to represent disease-related information. Moreover, considering the inherent nonlinear nature of ECG and PCG signals, various nonlinear features, encompassing multiple types of entropies [10,19], recurrence plot [23,24,25] and multi-fractal parameters [26,27], were also developed for anomaly classification. More recently, deep learning methods with complex structural layers were used to encode ECG and PCG deep features for detecting CAD [9,12,14]. Notably, convolutional neural network (CNN) [28], serving as the most common feature extractor, automatically encodes signal-based or image-based deep features by operating various convolutional and pooling layers. Additionally, long short-term memory (LSTM) [29], bidirectional long short-term memory (BiLSTM) [30], transformer [31], autoencoder [32] and their improved models [33,34] were also adopted to process ECG and PCG, enhancing feature extraction and abnormality detection abilities. However, deep learning methods often yield high-dimensional features that inevitably contain some amount of redundant information, adversely influencing classification performance. To solve this problem, Principal Component Analysis (PCA) [20], a classic dimension reduction technique, was employed. Furthermore, the autoencoder network was also utilized for data compression and feature selection [35], offering an alternative approach to refine the feature space.
Apart from effective feature extraction, CAD classification performance also relies on efficient classifiers. A large number of classifiers have been proposed for CAD detection and prediction, involving support vector machine (SVM) [8,10,22,36], Naïve Bayes [36], decision tree (DT) [36,37], K-nearest neighbor (KNN) [38], boosting and bagging model [13], and artificial neural network (ANN) [9,10,11,12,20,21]. In addition, a novel classification decision strategy based on the integration of manual and automated thresholding techniques was employed to identify abnormal ECG [39].
According to the reference, this study proposed a novel multi-modal learning method to consider both ECG and PCG for CAD detection. We initially carried out the deconvolution of ECG and PCG and produced a novel ECG-PCG coupling signal to reveal the inherent relationship of ECG and PCG. Then ECG, PCG and ECG-PCG coupling signals were transformed into modified recurrence plots (MRPs) to quantify their respective nonlinear dynamic microstructural information. A parallel CNN model was constructed for encoding deep features from each MRP, and then we fused these single-modal deep-coding features into multi-modal information. To remove redundant information while preserving discriminative features, we added an autoencoder network behind the parallel CNN network to reduce feature dimension. A combination of optimal features and SVM classifier was used for final classification. The diagram of this proposed method is shown in Figure 1.
The highlights of this study are as follows:
  • A novel multi-modal learning method by considering both ECG and PCG signals is proposed to detect CAD.
  • The multi-modal deep-coding information involves ECG, PCG and ECG-PCG coupling deep-coding MRP features.
  • The proposed method constructs MRPs to quantify the nonlinear dynamic characteristics of ECG, PCG and their deconvolution signals, and builds the integrating deep learning network to code multi-modal deep-coding features and reduce feature dimension.
  • A combination of optimal multi-modal features and SVM classifier is used for final classification, and the result indicates superiority of the multi-modal learning method.
The remaining sections are organized as follows. Section 2 describes data preprocessing, ECG-PCG coupling signal evaluation, MRP construction, deep learning network construction and performance evaluation. Section 3 illustrates experimental results, while Section 4 compares and discusses the performance of different models. The final conclusion is in Section 5.

2. Materials and Methods

2.1. Data

A total of 199 subjects, recruited at Qianfoshan Hospital (Shandong First Medical University Affiliated Hospital) in Jinan, Shandong Province, China, participated in this study. This experiment was conducted with the permission of the hospital Ethical Review Committee (ethics approval number: S374) and adhered strictly to the guidance of the Declaration of Helsinki and its amendments. All subjects presented various symptoms, involving chest tightness, chest pain, and palpitations over a week, and then signed informed consent before participation. This study excluded individuals who had undergone an intervention of percutaneous coronary and coronary artery bypass surgery, and had been diagnosed with valvular diseases. Each participant underwent coronary angiography and then their diagnosis results were decided by the professional physician based on the results of coronary angiography. Those with blockages ≥ 50% in at least one major coronary artery (left anterior descending, left circumflex, or right coronary artery) were diagnosed as CAD (135 positive cases); others were diagnosed as non-CAD (64 negative cases). Before conducting the experiment, the basic information of subjects is recorded in Table 1, including age, sex, height, weight, heart rate and blood pressure.
To ensure more precise resting data collection, each subject lay supine for at least 10 min in a quiet and controlled temperature (25 ± 3 °C) room. A cardiovascular function detector (CVFD-II, Huiyironggong Technology Co., Ltd., Jinan, China) was employed to simultaneously record standard lead-II ECG and PCG signals for 5 min at a sampling rate of 1000 Hz. As CAD mostly affects the left coronary artery, the electronic stethoscope is positioned in the third intercostal space on the left edge of the sternum for recording the PCG signal. The subjects remained calm and awake during the process of the experiment.

2.2. Data Pre-Processing

2.2.1. Data Denoising

The original collected signals contain various types of noises. To obtain clean ECG and PCG signals, a 0.5–60 Hz Butterworth bandpass filter and a 20 Hz high-pass Butterworth filter were applied to process raw ECG and PCG signals, respectively. And then, an IIR notch filter was utilized to remove power frequency interference from both signals. After that, the pre-processed ECG and PCG signals were cropped into 10 s segments, which were regularized using the z-score normalization to further extract features. Figure 2a,b and Figure 3a,b show ECG and PCG segments from a non-CAD subject and a CAD patient, respectively. As shown in Figure 2a and Figure 3a, the CAD patient ST-waveform in the ECG presents a significant elevation compared with the non-CAD subject. Additionally, in the comparison with the PCG of the CAD patient, as shown in Figure 2b and Figure 3b, the non-CAD subject PCG has obvious boundaries around the first heart sound and second heart sound.

2.2.2. ECG-PCG Coupling Signal Evaluation

ECG and PCG signals convey a great deal of information concerning cardiac electrical activity and mechanical activity, respectively. These two activities are intricately connected through electromechanical coupling and mechanical–electrical feedback mechanisms. Relying on ECG-PCG coupling analysis, we can acquire the intrinsic relationship between ECG and PCG for reflecting complex cardiovascular activities. It contains an amount of effective information concerning cardiovascular disease identification. Hence, we conduct deconvolution of ECG and PCG and produce a novel ECG-PCG coupling signal, further enhancing the understanding of cardiac function.
It should be noted that electrical activity of the heart occurs initially, and then it propels mechanical activity. Therefore, a novel electromechanical coupling system model is designed by using ECG as the input and PCG as the output, based on the sequence of cardiac activity, as defined in Equation (1).
y ( n ) = x ( n ) h ( n )
Here, * denotes convolution operation. y(n) denotes the PCG signal with 2N − 1 sample points. x(n) is the ECG signal with N sample points and h(n) is the novel ECG-PCG coupling signal with N sample points. Along with deconvolution calculation, the ECG-PCG coupling signal h(n) is evaluated successfully, and it shows nonlinear behavior due to the nonlinearity of ECG and PCG.
In the process of deconvolution operation, x(n) and h(n) are padded with N − 1 zeros until the lengths of both signals reach 2N − 1. This preparation facilitates the transformation of the one-dimensional input signal x(n) into the matrix X with rank-L, thereby converting the convolution operation in Equation (1) into a more efficient matrix-vector calculation format in Equation (2).
y = Xh
where X is rank-L convolution matrices of the form
X = x ( 0 ) x ( L 1 ) x ( L 2 ) x ( 1 ) x ( 1 ) x ( 0 ) x ( L 1 ) x ( 2 ) x ( 2 ) x ( 1 ) x ( 0 ) x ( 3 ) x ( L 1 ) x ( L 2 ) x ( L 3 ) x ( 0 )
Here, L is equal to 2N − 1. y is the column vector of output signal y(n) and h is the column vector containing electromechanical system parameters.
y = [ y ( 0 ) y ( 1 ) y ( 2 ) y ( 2 N 1 ) ] T
h = [ h ( 0 ) h ( 1 ) h ( 2 ) h ( 2 N 1 ) ] T
where (.)T represents vector transposition. Within the operating matrix calculation in Equation (2), h(n) is evaluated successfully. Figure 2c and Figure 3c vividly display the ECG-PCG coupling signals of a non-CAD subject and a CAD patient, respectively. Notably, there exist more significant differences between CAD and non-CAD subjects. In the ECG-PCG coupling signal, non-CAD subject has a less change in waves, whereas CAD patient waves change more obviously.

2.3. Modified Recurrence Plot

Given the inherent nonlinear and non-stationary nature of physiological signals, the nonlinear dynamic analysis can more accurately portray the characteristics of these signals. Consequently, we transform each one-dimensional signal into the modified recurrence plot (MPR) based on signal phase space reconstruction to observe hidden patterns and microstructural forms of cardiovascular status, thereby providing valuable insights into the underlying dynamics.

2.3.1. Phase Space Reconstruction

The time series signal can be transformed into a vector form by reconstructing a phase space based on embedding theorem [40]. Specifically, given a time series signal x(t) with N sample points, it can be reconstructed into a new phase space comprising Nm + 1 vectors, expressed as:
X ( i ) = [ x ( i ) x ( i + τ ) x ( ( i + ( m 1 ) τ ) ] , i = 0 , 1 , , ( N ( m 1 ) ) τ
Here, X(i) is a vector in phase space. m and τ are the embedding dimension and time delay, respectively. These critical learning parameters can be approximated using the false nearest neighbor algorithm [41] and mutual information algorithm [42]. Due to the influence of individual specificity, the values of m and τ vary significantly, and inappropriate values can ignore the amount of detail in the signal. Thus, the selection of the appropriate embedding dimension and time delay is essential. In this study, the difference between non-CAD and CAD subjects mainly manifests in detailed changes in physiological signals. To observe more details, we reconstruct phase spaces of ECG, PCG and ECG-PCG coupling signals with τ = 1 and m = 1 [43], and further construct their novel modified recurrence plots on this basis, which provides a more comprehensive understanding of the underlying dynamics and hidden patterns in the signal.

2.3.2. Modified Recurrence Plot Construction

In accordance with phase space reconstruction, the nonlinear dynamic signal shows the significant recurrence characteristics [44]. Thus, this work attempts to further construct the novel recurrence plots for quantifying the microstructure of ECG, PCG and ECG-PCG coupling signals, respectively. The traditional recurrence plot (RP), an effective nonlinear signal processing method, maps the phase space vectors to the two-dimensional image form, thereby visualizing the dynamic change information of physiological signals, which is defined as Equation (5).
R i j = Θ ( ε - X i X j )
Θ ( x ) = 0 , x < 0 1 , x 0
where ε is the threshold value. ‖·‖ denotes Euclidean distance. X(i) and X(j) are phase space vectors. Θ () is the Heaviside function, as expressed in Equation (6).
The threshold ε, a pivotal parameter in traditional RP construction, determines the values of recurrence points in the form of a grayscale image. When ε exceeds the distance between any two vectors, the corresponding value of the recurrence plot is 1; otherwise, it is 0. The smaller threshold ensures adequate recurrence points, while the larger threshold may be necessitated in the presence of noise, as noise can distort the structure of traditional RP.
Despite the fact that traditional RP can offer insights into the nonlinear characteristics of cardiovascular status to a certain degree, the amount of detailed physiological changes in signal waves still fails to be observed. To overcome this limitation, a modified recurrence plot (MRP) is introduced, dispensing with the threshold value and instead utilizing color codes to characterize the distances between phase space vectors. It maps the distances between the vector at time i and all vectors into a color scale, as defined Equation (7), enabling a more nuanced visualization of signal dynamics.
υ i , j = ϑ ( X ( i ) X ( j ) )
Here, ‖·‖ represents Euclidean distance, and υ represents the color code that maps the distance to the color scale. The color code assigned to the pair of vectors X(i) and X(j) is located at the coordinate (i, j) in the novel MRP, which quantifies more nonlinear information of the physiological signal. The MRP employs a gradient of darkness to signify closer distances between vectors, while brighter points represent farther distances. Figure 2 and Figure 3 illustrate MRPs of ECG, PCG and ECG-PCG coupling signals from non-CAD and CAD subjects, respectively. In comparison with the non-CAD subject, the MRPs of the CAD patient exhibit more notable alterations than those of non-CAD individuals.

2.4. Feature Extraction Based on Integrating Deep Learning Network

The deep learning method shows numerous advantages and it proves to be highly effective for feature extraction and anomaly classification in diverse fields. Particularly, CNN automatically encodes the spatial information of the image and confers significant benefits in image recognition tasks. To extract effective disease-related features from MRPs for identifying CAD, we built a model integrating deep learning by adding a parallel CNN and an autoencoder network to encode deep feature representations from each MRP, and the frame is shown in Figure 4. In the proposed model integrating deep learning, the parallel CNN network encoded deep features reflecting detailed information from different MRPs, and then fused three single-modal features to form multi-modal information. Meanwhile, the autoencoder network was utilized to compress feature dimensions and acquire more meaningful disease-related features.

2.4.1. The Parallel CNN Network

In this study, a parallel CNN model is built to encode a deep feature representation from the constructed MRPs, leveraging advantages of the deep learning network. The parallel CNN model, comprising three independent CNN branches, processes ECG, PCG and ECG-PCG coupling MRPs for encoding different single-modal features. Then, the outputs of three CNN branches were concatenated to form multi-modal features.
MRPs of ECG, PCG and ECG-PCG coupling signals are initially resized to 224 × 224 × 3 as the input of the proposed parallel CNN model to encode deep features of each single-modal signal. Each CNN branch of the proposed parallel model contains 13 convolutional layers with 3 × 3 kernel, as shown in Figure 5. All convolutional layers are organized into 5 sections, which contain 64, 128, 256, 512, and 512 convolutional kernels, respectively. A maximum pooling layer with 2 × 2 kernel is located at the end of each section. All parameters of convolutional and pooling layers are detailed in Table 2. The activation function of all hidden layers is the Rectified Linear Unit (ReLU). In the proposed network integrating deep learning, convolutional and pooling layers are core modules for feature extraction. The output of each layer is termed as features carrying an amount of temporal and spatial information of the image. The quantity of spatial image features depends on the number of different kernels in each layer. It is defined as Equations (8) and (9).
y j = i z i j × x i + b i
y t = max ( n z n t × x n )
where xi and xn represent the input image features as inputs of convolutional and max pooling layers, respectively. yj and yt denote the outputs of convolutional and max pooling layers, respectively. zij and znt are convolutional and max pooling kernels, respectively. bi is the bias.
By operating the parallel CNN model, the feature images, the new output of the last pooling layer, are flattened into a high one-dimensional deep feature vector. Considering the over-fitting condition of the high-dimensional feature vector, we implemented a dimension reduction technique to enhance the generalization ability of the model.

2.4.2. Autoencoder Network

This study utilized the parallel CNN network to encode different single-modal deep features from each signal, and then fused them to create high-dimension multi-modal features that inevitably included an amount of redundant information. To remove information redundancy of the high-dimensional features while preserving more discriminative features, an autoencoder network following behind the parallel CNN network was introduced to reduce feature dimension and preserve more salient features. The proposed autoencoder network includes an input, an output and five fully connected hidden layers, which consist of two parts: encoder and decoder, as shown in Figure 6. All parameters of the autoencoder network are listed in Table 3. The encoder maps the high-dimensional input x, which contains both useful and irrelevant information, to the latent representation z characterized by a low-dimensional distribution of effective features, via the application of nonlinear activation function g, as follows in Equation (10).
z = g ( Wx + b )
Subsequently, the decoder performs the inverse operation of the encoder. The latent representation z is processed to reproduce the input signal via activation function f, defined as Equation (11).
x = f ( W z + b )
where W, b and W′, b′ denote the weight and bias parameters of the encoder and decoder, respectively. These values are iteratively updated through backpropagation to minimize the loss value between the desired input x and output x′. Mean square error (MSE) is defined as the loss function in Equation (12).
l o s s = x x 2
Within the optimization process of network parameters, we obtained the latent representation z by minimizing the value of MSE. Then, z, containing more meaningful information from the input date x, is for further non-CAD and CAD classification.

2.5. Statistical Analysis

Statistical analysis is crucial to evaluate the effectiveness of multi-modal features. We verify the normal distribution of features using the Kolmogorov–Smirnov test, and these features from different groups are analyzed using Student’s t-test, while features with non-normal distribution are evaluated using Mann–Whitney U test. p-value < 0.05 denotes statistical difference between features from different groups.

2.6. Classification and Evaluation

After extracting and compressing deep-coding features, this study employed a recursive feature elimination (RFE) algorithm to evaluate the contribution rate of each feature and rank them for further classification tasks [45]. RFE is a simple and robust algorithm in processing small-sample data. It iteratively removes the feature with the least important score until the optimal features are selected.
The selected optimal feature subset was fed into the SVM classifier for identifying CAD. The combination of deep-coding features and a conventional classifier effectively mitigates over-fitting in small-sample datasets, achieving higher classification accuracy with reduced training parameters and accelerated processing speed. Consequently, deep-coding features were sent to the SVM classifier based on both linear kernel and radial basis function (RBF) kernel to identify CAD in this study. The linear kernel employs a hyperparameter C, and the radial basis function kernel utilizes two hyperparameters C and r. These optimal hyperparameter values are trained through grid search with specified ranges; the range of C is 2n and n is the integer from −4 to 13, and the range of r is 2m and m is the integer from −7 to 6. These address potential over-fitting and the classifier’s nonlinear behavior. To validate the model identification performance, the 5-fold cross-validation method is performed, ensuring that training and testing samples are sourced from distinct subjects to uphold the reliability of results.
Four widely accepted evaluation metrics for anomaly classification are used to assess classification performance, including accuracy (ACC), sensitivity (SEN), specificity (SPE) and f1-score (F1).
A C C = t p + t n t p + f p + t n + f n × 100 %
S E N = t p t p + f n × 100 %
S P E = t n t n + f p × 100 %
F 1 = 2 t p 2 t p + f p + f n × 100 %
where tp and tn are the number of positive and negative samples correctly identified, respectively. And fp and fn are the number of positive and negative samples incorrectly identified, respectively.

3. Results

This study executed signal pre-processing, ECG-PCG coupling signal evaluation, MRP construction and statistical analysis in MATLAB R2020b. Deep learning network and classification were conducted in Python 3.9. This section illustrates all results of the proposed method.

3.1. Comparison of Single- and Multi-Modal Data

This study leverages single ECG, PCG and ECG-PCG coupling features to form multi-modal information for better classifying non-CAD and CAD cases. The reason is that these single-modal signals have a close correlation with CAD. Specifically, ECG and PCG are derived from different cardiovascular activities, while ECG-PCG coupling information reflects the inherent correlation between these activities. In order to obtain more adequate multi-modal representations, each CNN branch in the parallel CNN network encodes deep MRP features from each single-modal signal. Subsequently, the extracted deep-coding MRP features are further compressed as 1 × 400 dimension space by the autoencoder network. Following data compression, three single-modal features are integrated into multi-modal information for CAD classification.
Figure 7 depicts the average classification accuracy trend during the five-fold cross-validation using single- and multi-modal data. With an increasing number of more meaningful features, average accuracy rises sharply. When the accuracy reaches the optimal value, it begins to decline with an increasing number of features. This indicates that the high-dimensional features contain redundant information, resulting in an adverse classification effect. Table 4 details the classification results of different modal data. Compared with the results of all single-modal signals, multi-modal information integrating ECG, PCG and ECG-PCG coupling features acquires superior detection performance, with accuracy, sensitivity, specificity, and F1-score of 98.49%, 98.57%, 98.57%, and 98.89%, respectively. Among single-modal signals, the classification performance of ECG-PCG coupling features is superior to single ECG or single PCG. The reason is that the ECG-PCG coupling signal based on deconvolution of ECG and PCG is attributed to the additional novel coupling information, which transcends information about ECG and PCG.

3.2. Overall Classification Results of Multi-Modal Method

To assess comprehensively the performance of our proposed multi-modal learning method, a five-fold cross-validation strategy is adopted. Table 5 shows each fold and average results in five-fold cross-validation based on the stratified sampling principle. With fusing different single-modal features to form multi-modal information, our proposed method achieves remarkable results, with one-fold and two-fold validations of the multi-modal method yielding the highest accuracies of 100%. Across all five-fold validation results, the mean accuracy remains consistently high, at 98.49%, accompanied by a modest standard deviation of 1.24%, underscoring the robustness and stability of our proposed method.

3.3. Features Analysis of Different Modal Signals

In the classification framework of our multi-modal learning method, RFE embedded in SVM selects more salient features based on the classification contribution rates in different modal signals. Following feature compression based on the autoencoder network, each single-modal signal gains 400 features for analyzing non-CAD and CAD cases. Within an increasing number of significant features, the classification accuracy rate rises quickly. However, due to the presence of the inherent information redundancy, the accuracy rate begins to decrease when the optimal feature count is surpassed, with the highest accuracy in each single-modal signal.
To analyze the importance of each feature in CAD detection, we employ statistical analysis to assess the difference between features of non-CAD and CAD groups. As shown in Figure 8, most ECG deep-coding features with p-values ≥ 0.05 show no statistical difference. In contrast, the ECG-PCG coupling signal has the most features with a significant difference. It indicates that the ECG-PCG coupling signal provides the highest contribution rate and effective information for CAD detection, whereas the ECG signal offers the least information in the classification task. These results are consistent with Figure 7. Due to the existence of numerous features with no statistical difference, we select optimal features by RFE for achieving the best detection result.
As seen from Figure 7 and Figure 8, not all features with significant differences are helpful in detecting CAD. Thus, we also analyze the correlation between all features. Correlation coefficients between two features in multi-modal features are computed and are painted as heat maps, as shown in Figure 9. We can see that single-modal features have high correlation coefficients, indicating that only a few features can characterize all signal information. Correlation coefficients between different single-modal features are small, suggesting that there is low correlation between different single-modal features. Different single-modal features comprise positive or negative information for detection CAD. Consequently, by combining different single-modal features, a finite number of optimal features selected can improve the classification result.
Additionally, the top five optimal features based on the contribution rates of each single-modal signal are further analyzed for their efficiency, as shown in Figure 10. ECG features contain f-ecg1~f-ecg5, PCG features span f-pcg1~f-pcg5, and ECG-PCG coupling features include f-e-pcg1~f-e-pcg5. By analyzing the statistical significance (p-value < 0.05) between each feature and the class label, we observe the clearest differences between non-CAD and CAD subjects in the ECG-PCG coupling signal. This indicates the superior classification contributions of ECG-PCG coupling features compared to ECG or PCG features. Furthermore, the combination of different single-modal information sources, with each single-modal signal features carrying complementary information, fosters a notable improvement of classification in the multi-modal learning method.

3.4. Performance Analysis of Different Models

The complexity of the deep learning network and feature reduction may affect the classification accuracy. We remove the autoencoder model from our model and achieve a reasonable result with accuracy, sensitivity, specificity, and f1-score of 98.50 ± 1.22, 100.00 ± 0.00, 98.67 ± 2.16 and 98.91 ± 0.89, but it lost more time.
To confirm the superiority of the proposed method, we compare our method with other advanced models, including ResNet50-based and transformer-based models, and obtain the classification results, as shown in Table 6. We initially use the ResNet50 structure to replace each 2-D CNN branch in our method. ResNet50 contains residual modules, which can enhance more effective information and reduce information loss. The ResNet50-based model yields good accuracy of 90.96%, which is lower than our method, indicating that deeper networks can encode less meaningful features. Similarly, we also build a transformer-based model to validate its performance. By using the transformer structure to replace each 2-D CNN branch, multi-modal signals are fed into the new model to encode deep features based on the advantages of transformer in 1-D signals. However, the transformer-based model achieves lower accuracy of 88.46%. The reason may be that the small-sample dataset limits the model performance.
In addition, the strong classifier is key to improving detection accuracy. Our study employs five traditional classifiers to identify non-CAD and CAD subjects and yields detection results, as shown in Table 7. The SVM classifier achieves the best classification accuracy.

4. Comparison and Discussion

To overcome information shortages of single-modal data, this study proposes a novel multi-modal learning method to integrate ECG, PCG and ECG-PCG coupling features for CAD detection. The proposed model firstly conducts deconvolution of ECG and PCG to produce an ECG-PCG coupling signal, which acquires the inherent related information of cardiovascular activities. And then, we introduce the MRP to quantify the nonlinear dynamic characteristics of each single signal. The network integrating deep learning is designed to encode deep spatial features of each MRP, and fuse them for multi-modal feature extraction, data compression and classification. By leveraging the complementary strengths of multi-modal data and the advantage of deep learning, we address the limitations of the single-modal method and attain remarkable improvements in CAD detection.
As previously mentioned, the ECG signal undergoes significant alterations, particularly in the case of advanced CAD, manifested through various changes such as ST-wave elevation or depression and T-wave inversion. These changes are markedly distinguishable between non-CAD and CAD subjects [4]. Similarly, when coronary arteries occlude to a certain degree, blood flows through the narrowed coronary artery and forms turbulence. Then, weak murmurs begin to occur and PCG waveform change appears gradually [5,6]. Consequently, more significant differences in PCG between non-CAD and CAD appear. However, when CAD patients show more than 90% blockage, one or more coronary arteries are mostly blocked. It leads to a reduction in blood flow and subsequent disappearance of murmurs. Hence, this complexity poses a challenge in accurately distinguishing CAD from non-CAD. Additionally, coronary abnormality disrupts the inherent harmony between cardiac electrical and mechanical activities, prompting us to develop a novel ECG-PCG coupling signal to reflect these changes based on deconvolution of ECG and PCG. To assess the nonlinear characteristic distribution of each single signal between non-CAD and CAD subjects, this study introduces MRP to visually display their differences, as shown in Figure 2 and Figure 3.
In the context of CAD analysis utilizing single-modal data, ECG carries important information of cardiac electrical activity, which easily distinguishes severe CAD patients from non-CAD subjects due to significant ECG waveform alterations in the terminal stage. Nevertheless, the subtle ECG waveform change poses challenges in classifying moderate CAD patients. Conversely, PCG reflects the mechanical activity of the heart and it shows notable waveform change in moderate CAD patients. However, detecting severe CAD patients with little change in PCG signal becomes complicated due to the disappearance of murmurs. As a result, single ECG or single PCG only provides single-aspect information for CAD detection, and leads to lower classification performance, as shown in Figure 7 and Table 4. To enhance detection ability, we innovatively produce the novel ECG-PCG coupling signal by conducting deconvolution of ECG and PCG. By integrating ECG, PCG and ECG-PCG coupling information, the multi-modal learning method improves classification results via deep feature encoding. This underscores the presence of complementary information within single-modal data and demonstrates that identification ability of multi-modal data outperforms that of single-modal data. Notably, ECG-PCG coupling features yield higher results among all single-modal signals, with accuracy of 84.94%, and multi-modal information obtains the highest accuracy of 98.49%. This indicates that the proposed multi-modal learning method effectively overcomes information shortages in single-modal data.
Nonlinear analysis proves more adept at capturing the dynamic characteristics of the cardiac signal for its inherent nonlinear nature. Thus, this proposed model introduces an MRP to quantify each single-modal signal. To enrich disease-related information, we leverage the advantages of the deep learning network by constructing the parallel CNN network. This network encodes ECG, PCG and ECG-PCG coupling MRPs and subsequently concatenates all single-modal deep-coding features. Considering the existence of information redundancy in the extracted high-dimensional features, an autoencoder network following behind the parallel CNN network is added, facilitating feature compression. Ultimately, each single-modal signal obtains 400 features and then combine all single-modal features to form multi-modal information for CAD detection. Feature analysis indicates that not all features are helpful in identifying CAD, as shown in Figure 8, Figure 9 and Figure 10. To select the most discriminative features, an embedded RFE algorithm is employed to estimate the classification importance score of each feature, and ranks them to select optimal features. ECG-PCG coupling features obtain superior results to ECG or PCG in single-modal analysis, while multi-modal data achieve the best classification results.
Table 8 summarizes previous studies on CAD detection based on single ECG or single PCG or both signals, alongside their corresponding testing results on the public or self-collected data. Kaveh et al. [22] used ECG multiple domain features for CAD classification, and validated their model on the MIT-BIH ECG database. In the works of Li et al. [20], Samanta et al. [21] and Huang et al. [46], CAD was analyzed by extracting different PCG features utilizing machine or deep learning methods. Notably, Li et al. [47] combined ECG and PCG for CAD detection by extracting multi-domain deep features. To ensure a fair comparison with existing studies, we further validated our proposed model on the dataset previously used by Li et al. [47], and achieved comparable classification performance, with accuracy, sensitivity and specificity of 96.37%, 98.26% and 90.22%, respectively. However, considering the challenge of imbalanced data, which can restrict the generalization ability of the model, this study solved this issue by randomly selecting 60 patients from all CAD patients to eliminate the imbalanced condition, and then retrained the classification model on the new balanced dataset. The new model, trained on the new balanced data, achieved remarkable detection performance, with accuracy, sensitivity and specificity of 97.08%, 98.12% and 96.22% in classifying 60 non-CAD subjects and 60 CAD patients. Additionally, we also augmented the data quantity by segmenting the ECG and PCG signals of non-CAD subjects into 10 s segments with an overlap of 50% in our study, so that the ratio of non-CAD and CAD sample size was approximately balanced (64 × 60:135 × 30). The augmentation dataset obtained stable performance, with accuracy, sensitivity and specificity of 98.10%, 98.43% and 97.78%, respectively. Furthermore, each signal segment cost an average of 13.15 s, which indicates that this proposed method is a feasible and real-time approach in clinical practice.
In the realm of CAD detection based on single ECG or single PCG classification, numerous previous methods have been validated on two widely used open-source public databases, namely the PhysioNet ECG database and the PhysioNet/CinC Challenge 2016 PCG database. To fairly compare with previous methods, the proposed multi-modal learning method was also validated on these two public databases, and all details are listed in Table 9. In CAD detection using a public ECG database, Kumar et al. [8] and Acharya et al. [10] relied on hand-crafted features obtained by a machine learning method, whereas Tan et al. [9] adopted time-domain encoding features derived from 1D-CNN. More recently, the potential of PCG signals for CAD detection has been addressed, with most authors using hand-crafted or deep-coding features [11,13]. Notably, Noman et al. [12] employed MFCC deep features encoded by 2D-CNN, while Humayun et al. [14] constructed 1D-CNN to extract PCG temporal–spatial features for classification. Due to there being only single-modal signals in these two public databases, our model encoded single-modal features from both ECG and PCG databases for identifying CAD, and yielded better results, with accuracy of 99.87% and 97.56%, respectively. This underscores the feasibility and robust generalization capabilities of this proposed model in CAD detection.
It should be noted that this study still has some limitations. During the data collection, we only record single ECG and single PCG, overlooking the potential benefits of incorporating multiple modalities. The inclusion of additional data sources, such as imaging techniques or other data, could significantly enrich identifying information and improve the classification capabilities. Moreover, the imbalance distribution of data points across classes poses a challenge in achieving optimal classification performance. Balanced data is crucial for obtaining more reliable results. Consequently, data augmentation strategies and the acquisition of additional data are essential for validating and refining our proposed model. In future, we will incorporate additional modal data to capture a broader spectrum of information for addressing information limitations. This will enable us to perform a more comprehensive analysis and validation of the proposed method. Furthermore, we plan to expand our data with a wider variety of samples to mitigate class imbalance and further enhance the detection performance of our model. Ultimately, our goal is to develop a robust and comprehensive CAD detection system that can provide more meaningful insights to patients and healthcare professionals.

5. Conclusions

Single-modal data inherently suffer from information shortages, whereas multi-modal information is pivotal in enhancing the performance of CAD detection. This study proposes a multi-modal learning method to integrate ECG and PCG for the purpose of detecting CAD. Besides ECG and PCG, we successfully produce a novel ECG-PCG coupling signal by conducting deconvolution of ECG and PCG. Considering the inherent nonlinear and non-stationary nature of physiological signals, we construct the MRP to quantify ECG, PCG and ECG-PCG coupling signals, respectively. A network integrating deep learning by incorporating the parallel CNN network and the autoencoder network is designed for multi-modal deep feature encoding, fusion and compression. Subsequently, optimal features are selected based on the classification contribution rate of each feature, which is then sent to the SVM classifier for final classification between non-CAD and CAD cases. Classification results conclusively demonstrate that multi-modal information, encompassing all single-modal features, achieves an improvement of detection performance compared with single-modal data. The multi-modal learning method is a feasible and robust technique for CAD detection and diagnosis.

Author Contributions

Conceptualization, C.L., X.W., C.S. and Y.L.; data curation, X.W. and C.S.; funding acquisition, X.W. and C.L.; investigation, Y.L.; methodology, X.W., C.S. and S.Z.; project administration, X.W.; software, C.S. and S.Z.; supervision, C.L.; validation, C.S.; visualization, C.S.; writing—original draft, C.S.; writing—review and editing, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (grant numbers: 62071277, 61501280), and Technology-based SMEs Innovation Ability Enhancement Project in Shandong Province (project number: 2022TSGC2105).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data are available on request.

Acknowledgments

The authors would like to thank Qianfoshan Hospital of Shandong Province for its full support and all the volunteers who participated in this study. Thanks to Xiaolei Liu (Yantai Vocational College) for conceptual contributions and Ming Zhang (Huiyironggong Technology Co., Ltd.) for contributing to project administration.

Conflicts of Interest

The authors declare that there are no conflicts of interest in this work.

References

  1. Pathak, A.; Samanta, P.; Mandana, K.; Saha, G. Detection of coronary artery atherosclerotic disease using novel features from synchrosqueezing transform of phonocardiogram. Biomed. Signal Process. Control 2020, 62, 102055. [Google Scholar] [CrossRef]
  2. Lih, O.S.; Jahmunah, V.; San, T.R.; Ciaccio, E.J.; Yamakawa, T.; Tanabe, M.; Kobayashi, M.; Faust, O.; Acharya, U.R. Comprehensive electrocardiographic diagnosis based on deep learning. Artif. Intell. Med. 2020, 103, 101789. [Google Scholar] [CrossRef] [PubMed]
  3. Cury, R.C.; Abbara, S.; Achenbach, S.; Agatston, A.; Berman, D.S.; Budoff, M.J.; Dill, K.E.; Jacobs, J.E.; Maroules, C.D.; Rubin, G.D.; et al. CAD-RADSTM coronary artery disease—Reporting and data system. An expert consensus document of the society of cardiovascular computed tomography (SCCT), the american college of radiology (ACR) and the north american society for cardiovascular imaging (NASCI). Endorsed by the American college of cardiology. J. Cardiovasc. Comput. Tomogr. 2016, 10, 269–281. [Google Scholar] [PubMed]
  4. Li, H.; Ren, G.; Yu, X.; Wang, D.; Wu, S. Discrimination of the diastolic murmurs in coronary heart disease and in valvular disease. IEEE Access 2020, 8, 160407–160413. [Google Scholar] [CrossRef]
  5. Giddens, D.P.; Mabon, R.F.; Cassanova, R.A. Measurements of disordered flows distal to subtotal vascular stenosis in the thoracic aortas of dogs. Circ. Res. 1976, 39, 112–119. [Google Scholar] [CrossRef]
  6. Akay, Y.M.; Akay, M.; Welkowitz, W.; Semmlow, J.L.; Kostis, J.B. Noninvasive acoustical detection of coronary artery disease: A comparative study of signal processing methods. IEEE Trans. Biomed. Eng. 1993, 40, 571–578. [Google Scholar] [CrossRef]
  7. Li, P.P.; Hu, Y.M.; Liu, Z.P. Prediction of cardiovascular diseases by integrating multi-modal features with machine learning methods. Biomed. Signal Process. Control 2021, 66, 102474. [Google Scholar] [CrossRef]
  8. Kumar, M.; Pachori, R.B.; Acharya, U.R. Characterization of coronary artery disease using flexible analytic wavelet transform applied on ECG signals. Biomed. Signal Process. Control 2017, 31, 301–308. [Google Scholar] [CrossRef]
  9. Tan, J.H.; Hagiwara, Y.; Pang, W.; Lim, I.; Oh, S.L.; Adam, M.; Tan, R.S.; Chen, M.; Acharya, U.R. Application of stacked convolutional and long short-term memory network for accurate identification of CAD ECG signals. Comput. Biol. Med. 2018, 94, 19–26. [Google Scholar] [CrossRef]
  10. Acharya, U.R.; Hagiwara, Y.; Koh, J.E.W.; Oh, S.L.; Tan, J.H.; Adam, M.; Tan, R.S. Entropies for automated detection of coronary artery disease using ECG signals: A review. Biocybern. Biomed. Eng. 2018, 38, 373–384. [Google Scholar] [CrossRef]
  11. Tschannen, M.; Kramer, T.; Marti, G.; Heinzmann, M.; Wiatowski, T. Heart sound classification using deep structured features. In Proceedings of the 2016 Computing in Cardiology Conference, (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 565–568. [Google Scholar]
  12. Noman, F.; Ting, C.M.; Salleh, S.H.; Ombao, H. Short-segment heart sound classification using an ensemble of deep convolutional neural networks. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1318–1322. [Google Scholar]
  13. Baydoun, M.; Safatly, L.; Ghaziri, H.; Hajj, A.E. Analysis of heart sound anomalies using ensemble learning. Biomed. Signal Process. Control 2020, 62, 102019. [Google Scholar] [CrossRef]
  14. Humayun, A.I.; Ghaffarzadegan, S.; Ansari, M.I.; Feng, Z.; Hasan, T. Towards domain invariant heart sound abnormality detection using learnable filterbanks. IEEE J. Biomed. Health Inform. 2019, 24, 2189–2198. [Google Scholar] [CrossRef] [PubMed]
  15. Li, J.; Ke, L.; Du, Q.; Chen, X.; Ding, X. Multi-modal cardiac function signals classification algorithm based on improved D-S evidence theory. Biomed. Signal Process. Control 2022, 71, 103078. [Google Scholar] [CrossRef]
  16. Zarrabi, M.; Parsaei, H.; Boostani, R.; Zare, A.; Dorfeshan, Z.; Zarrabi, K.; Kojuri, J. A system for accurately predicting the risk of myocardial infarction using PCG, ECG and clinical features. Biomed. Eng. 2017, 29, 1750023. [Google Scholar] [CrossRef]
  17. Li, H.; Wang, X.P.; Liu, C.C.; Wang, Y.; Li, P.; Tang, H.; Yao, L.K.; Zhang, H. Dual-input neural network integrating feature extraction and deep learning for coronary artery disease detection using electrocardiogram and phonocardiogram. IEEE Access 2019, 7, 146457–146469. [Google Scholar] [CrossRef]
  18. Li, P.; Li, K.; Zheng, D.; Li, Z.M.; Liu, C.C. Detection of coupling in short physiological series by a joint distribution entropy method. IEEE Trans. Biomed. Eng. 2016, 63, 2231–2242. [Google Scholar] [CrossRef]
  19. Dong, H.W.; Wang, X.P.; Liu, Y.Y.; Sun, C.F.; Jiao, Y.; Zhao, L.; Zhao, S.; Xing, M.; Zhang, H.; Liu, C. Non-destructive detection of CAD stenosis severity using ECG-PCG coupling analysis. Biomed. Signal Process. Control 2023, 86, 105328. [Google Scholar] [CrossRef]
  20. Li, H.; Wang, X.; Liu, C.; Zeng, Q.; Zheng, Y.; Chu, X.; Yao, L.; Wang, J.; Jiao, Y.; Karmakar, C. A fusion framework based on multi-domain features and deep learning features of phonocardiogram for coronary artery disease detection. Comput. Biol. Med. 2020, 120, 103733. [Google Scholar] [CrossRef]
  21. Samanta, P.; Pathak, A.; Mandana, K.; Saha, G. Classification of coronary artery diseased and normal subjects using multi-channel phonocardiogram signal. Biocybern. Biomed. Eng. 2019, 39, 426–443. [Google Scholar] [CrossRef]
  22. Kaveh, A.; Chung, W. Automated classification of coronary atherosclerosis using single lead ECG. In Proceedings of the 2013 IEEE Conference on Wireless Sensor (ICWISE), Kuching, Malaysia, 2–4 December 2013; pp. 108–113. [Google Scholar]
  23. Eckmann, J.P.; Kamphorst, S.O.; Ruelle, D. Recurrence plots of dynamical systems. Europhys. Lett. 1987, 4, 973–977. [Google Scholar] [CrossRef]
  24. Mathunjwa, B.M.; Lin, Y.T.; Lin, C.H.; Abbod, M.F.; Shieh, J.S. ECG arrhythmia classification by using a recurrence plot and convolutional neural network. Biomed. Signal Process. Control 2021, 64, 102262. [Google Scholar] [CrossRef]
  25. Zhang, H.; Liu, C.; Zhang, Z.; Xing, Y.; Liu, X.; Dong, R.; He, Y.; Xia, L.; Liu, F. Recurrence Plot-Based Approach for Cardiac Arrhythmia Classification Using Inception-ResNet-v2. Front Physiol. 2021, 17, 648950. [Google Scholar] [CrossRef] [PubMed]
  26. Li, J.; Ke, L.; Du, Q. Classification of heart sounds based on the wavelet fractal and twin support vector machine. Entropy 2019, 21, 472. [Google Scholar] [CrossRef] [PubMed]
  27. Fan, X.; Yao, Q.; Cai, Y.; Miao, F.; Sun, F.; Li, Y. Multiscaled fusion of deep convolutional neural networks for screening atrial fibrillation from single lead short ECG recordings. IEEE J. Biomed. Health Inform. 2018, 22, 1744–1753. [Google Scholar] [CrossRef]
  28. Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adam, M.; Gertych, A.; Tan, R.S. A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. 2017, 89, 389–396. [Google Scholar] [CrossRef]
  29. Abrishami, H.; Han, C.; Zhou, X.; Campbell, M.; Czosek, R. Supervised ECG interval segmentation using LSTM neural network. In Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP), Las Vegas, NV, USA, 9 August 2018; pp. 71–77. [Google Scholar]
  30. Nurmaini, S.; Tondas, A.E.; Darmawahyuni, A.; Rachmatullah, M.N.; Effendi, J.; Firdaus, F.; Tutuko, B. Electrocardiogram signal classification for automated delineation using bidirectional long short-term memory. Inform. Med. Unlocked 2021, 22, 100507–100511. [Google Scholar] [CrossRef]
  31. Guan, J.; Wang, W.; Feng, P.; Wang, X.; Wang, W. Low-Dimensional Denoising Embedding Transformer for ECG Classification. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 1285–1289. [Google Scholar]
  32. Matias, P.; Folgado, D.; Gamboa, H.; Carreiro, A.V. Robust anomaly detection in time series through variational AutoEncoders and a local similarity score. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021), Vienna, Astria, 11–13 February 2021; pp. 91–102. [Google Scholar]
  33. Peimankar, A.; Puthusserypady, S. DENS-ECG: A deep learning approach for ECG signal delineation. Expert Syst. Appl. 2021, 165, 113911. [Google Scholar] [CrossRef]
  34. Liang, X.; Li, L.; Liu, Y.; Chen, D.; Wang, X.; Hu, S.; Wang, J.; Zhang, H.; Sun, C.; Liu, C. ECG_SegNet: An ECG delineation model based on the encoder-decoder structure. Comput. Biol. Med. 2022, 145, 105445. [Google Scholar] [CrossRef]
  35. Maggipinto, M.; Masiero, C.; Beghi, A.; Susto, G.A. A convolutional autoencoder approach for feature extraction in virtual metrology. Procedia Manuf. 2018, 17, 126–133. [Google Scholar] [CrossRef]
  36. Lee, H.G.; Noh, K.Y.; Ryu, K.H. Mining Biosignal Data: Coronary Artery Disease Diagnosis Using Linear and Nonlinear Features of HRV. In Proceedings of the Emerging Technologies in Knowledge Discovery and Data Mining: {PAKDD} 2007, International Workshops, Nanjing, China, 22–25 May 2007; pp. 218–228. [Google Scholar]
  37. Acharya, U.R.; Sudarshan, V.K.; Koh, J.E.; Martis, R.J.; Tan, J.H.; Oh, S.L.; Muhammad, A.; Hagiwara, Y.; Mookiah, M.R.K.; Chua, K.P.; et al. Application of higher-order spectra for the characterization of coronary artery disease using electrocardiogram signals. Biomed. Signal Process. Control 2017, 31, 31–43. [Google Scholar] [CrossRef]
  38. Bobillo, I.D. A tensor approach to heart sound classification. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 629–632. [Google Scholar]
  39. Roy, M.; Majumder, S.; Halder, A.; Biswas, U. ECG-NET: A deep LSTM autoencoder for detecting anomalous ECG. Eng. Appl. Artif. Intell. 2023, 124, 106484. [Google Scholar] [CrossRef]
  40. Whitney, H. Differentiable manifolds. Ann. Math. 1936, 37, 645–680. [Google Scholar] [CrossRef]
  41. Kennel, M.B.; Brown, R.; Abarbanel, H.D. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys. Rev. A. 1992, 45, 3403. [Google Scholar] [CrossRef]
  42. Fraser, A.M.; Swinney, H.L. Independent coordinates for strange attractors from mutual information. Phys. Rev. A. 1986, 33, 1134. [Google Scholar] [CrossRef] [PubMed]
  43. Yang, H. Multiscale Recurrence Quantification Analysis of Spatial Cardiac Vectorcardiogram Signals. IEEE Trans. Biomed. Eng. 2011, 58, 339–347. [Google Scholar] [CrossRef] [PubMed]
  44. Deng, M.; Huang, X.; Liang, Z.; Lin, W.; Mo, B.; Liang, D.; Ruan, S.; Chen, J. Classification of cardiac electrical signals between patients with myocardial infarction and normal subjects by using nonlinear dynamics features and different classification models. Biomed. Signal Process. Control 2023, 79, 860–870. [Google Scholar] [CrossRef]
  45. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  46. Huang, Y.; Li, H.; Tao, R.; Han, W.; Zhang, P.; Yu, X.; Wu, R. A customized framework for coronary artery disease detection using phonocardiogram signals. Biomed. Signal Process. Control 2022, 78, 103982. [Google Scholar] [CrossRef]
  47. Li, H.; Wang, X.P.; Liu, C.C.; Li, P.; Jiao, Y. Integrating multi-domain deep features of electrocardiogram and phonocardiogram for coronary artery disease detection. Comput. Biol. Med. 2021, 138, 104914. [Google Scholar] [CrossRef]
Figure 1. The block diagram of this proposed method, including data preprocessing, MRP construction, feature extraction and classification sections.
Figure 1. The block diagram of this proposed method, including data preprocessing, MRP construction, feature extraction and classification sections.
Sensors 24 06939 g001
Figure 2. ECG, PCG and ECG-PCG coupling signals and MRPs of a non-CAD subject. (a1) ECG signal. (a2) MRP of ECG signal. (b1) PCG signal. (b2) MRP of PCG signal. (c1) ECG-PCG coupling signal. (c2) MRP of ECG-PCG coupling signal. Note: the brighter points represent farther distances.
Figure 2. ECG, PCG and ECG-PCG coupling signals and MRPs of a non-CAD subject. (a1) ECG signal. (a2) MRP of ECG signal. (b1) PCG signal. (b2) MRP of PCG signal. (c1) ECG-PCG coupling signal. (c2) MRP of ECG-PCG coupling signal. Note: the brighter points represent farther distances.
Sensors 24 06939 g002
Figure 3. ECG, PCG and ECG-PCG coupling signals and MRPs of a CAD patient. (a1) ECG signal. (a2) MRP of ECG signal. (b1) PCG signal. (b2) MRP of PCG signal. (c1) ECG-PCG coupling signal. (c2) MRP of ECG-PCG coupling signal. Note: the brighter points represent farther distances.
Figure 3. ECG, PCG and ECG-PCG coupling signals and MRPs of a CAD patient. (a1) ECG signal. (a2) MRP of ECG signal. (b1) PCG signal. (b2) MRP of PCG signal. (c1) ECG-PCG coupling signal. (c2) MRP of ECG-PCG coupling signal. Note: the brighter points represent farther distances.
Sensors 24 06939 g003
Figure 4. Frame of the integrating deep learning network. Parallel CNN model encodes deep features from its input MRPs and autoencoder reduces the concatenated multi-modal features.
Figure 4. Frame of the integrating deep learning network. Parallel CNN model encodes deep features from its input MRPs and autoencoder reduces the concatenated multi-modal features.
Sensors 24 06939 g004
Figure 5. The structure of each CNN in the parallel CNN network. CNN includes multiple convolutional and max pooling layers encoding deep features.
Figure 5. The structure of each CNN in the parallel CNN network. CNN includes multiple convolutional and max pooling layers encoding deep features.
Sensors 24 06939 g005
Figure 6. Structure of the autoencoder network. x is the input feature vector and x′ is the output reconstructed feature vector. z is the latent representation encoded as the reduced feature vector.
Figure 6. Structure of the autoencoder network. x is the input feature vector and x′ is the output reconstructed feature vector. z is the latent representation encoded as the reduced feature vector.
Sensors 24 06939 g006
Figure 7. The average accuracy trend of single- and multi-modal data with increasing features.
Figure 7. The average accuracy trend of single- and multi-modal data with increasing features.
Sensors 24 06939 g007
Figure 8. The p-value of each single-modal feature between non-CAD and CAD. (a) The p-value of ECG features. (b) The p-value of PCG features. (c) The p-value of ECG-PCG coupling features.
Figure 8. The p-value of each single-modal feature between non-CAD and CAD. (a) The p-value of ECG features. (b) The p-value of PCG features. (c) The p-value of ECG-PCG coupling features.
Sensors 24 06939 g008
Figure 9. Heat maps of correlation coefficients between features of different signals. (a) Correlation coefficients between ECG features. (b) Correlation coefficients between PCG features. (c) Correlation coefficients between ECG-PCG coupling features. (d) Correlation coefficients between ECG and PCG features. (e) Correlation coefficients between ECG and ECG-PCG coupling features. (f) Correlation coefficients between PCG and ECG-PCG coupling features.
Figure 9. Heat maps of correlation coefficients between features of different signals. (a) Correlation coefficients between ECG features. (b) Correlation coefficients between PCG features. (c) Correlation coefficients between ECG-PCG coupling features. (d) Correlation coefficients between ECG and PCG features. (e) Correlation coefficients between ECG and ECG-PCG coupling features. (f) Correlation coefficients between PCG and ECG-PCG coupling features.
Sensors 24 06939 g009
Figure 10. Significant ECG, PCG and ECG-PCG coupling features analyzed by the proposed multi-modal learning method. The top 5 most significant features of each single-modal signal are selected and the caption contains their names and p-values. (a) Top 5 features of ECG signal. (b) Top 5 features of PCG signal. (c) Top 5 features of ECG-PCG coupling signal.
Figure 10. Significant ECG, PCG and ECG-PCG coupling features analyzed by the proposed multi-modal learning method. The top 5 most significant features of each single-modal signal are selected and the caption contains their names and p-values. (a) Top 5 features of ECG signal. (b) Top 5 features of PCG signal. (c) Top 5 features of ECG-PCG coupling signal.
Sensors 24 06939 g010
Table 1. Basic information of all subjects (mean ± SD).
Table 1. Basic information of all subjects (mean ± SD).
CharacteristicsNon-CADCAD
Age61 ± 1062 ± 10
Male/female30/3489/46
Height164 ± 7166 ± 8
Weight69 ± 1271 ± 11
Heart rate72 ± 1275 ± 16
Systolic blood pressure134 ± 15133 ± 16
Diastolic blood pressure80 ± 1182 ± 12
Table 2. All parameters of each CNN in the parallel CNN model.
Table 2. All parameters of each CNN in the parallel CNN model.
IndexLayerIndexLayer
1conv3_6410max-pooling_2
2conv3_6411conv3_512
3max-pooling_212conv3_512
4conv3_12813conv3_512
5conv3_12814max-pooling_2
6max-pooling_215conv3_512
7conv3_25616conv3_512
8conv3_25617conv3_512
9conv3_25618max-pooling_2
Note: “conv (kernel size)_(number of kernels)” represents the convolutional parameters, and “max-pooling_(kernel size)” represents the max-pooling parameters.
Table 3. All parameters of the proposed autoencoder network.
Table 3. All parameters of the proposed autoencoder network.
IndicatorParameterIndicatorParameter
Structure2000-1000-400-1000-2000Learning rate0.001
OptimizerSGDBatch32
LossMSEEpoch1000
Table 4. The detailed classification results of single- and multi-modal data.
Table 4. The detailed classification results of single- and multi-modal data.
Modal SignalACC (%)SEN (%)SPE (%)F1 (%)
ECG79.38 ± 4.3692.59 ± 4.6851.54 ± 5.7661.75 ± 7.56
PCG77.88 ± 1.9291.85 ± 4.3248.21 ± 11.6357.54 ± 7.72
ECG-PCG coupling84.94 ± 4.9794.81 ± 6.8764.10 ± 3.2473.67 ± 6.12
Multi-modal data98.49 ± 1.2498.57 ± 1.7598.57 ± 2.8698.89 ± 0.90
Table 5. Each fold and average results of multi-modal method using five-fold cross-validation.
Table 5. Each fold and average results of multi-modal method using five-fold cross-validation.
Number-FoldACC (%)SEN (%)SPE (%)F1 (%)
1-fold100.00100.00100.00100.00
2-fold100.00100.00100.00100.00
3-fold97.5096.43100.0098.18
4-fold97.50100.0092.8698.11
5-fold97.4496.43100.0098.18
mean ± std98.49 ± 1.2498.57 ± 1.7598.57 ± 2.8698.89 ± 0.90
Table 6. Classification results of different deep learning-based models.
Table 6. Classification results of different deep learning-based models.
ModelACC (%)SEN (%)SPE (%)F1 (%)
ResNet50-based model90.96 ± 2.8994.81 ± 1.8182.82 ± 5.7185.45 ± 4.80
Transformer-based model88.46 ± 3.3593.33 ± 2.7778.21 ± 8.8181.21 ± 5.61
Our model98.49 ± 1.2498.57 ± 1.7598.57 ± 2.8698.89 ± 0.90
Table 7. Classification results of different classifiers.
Table 7. Classification results of different classifiers.
ClassifierACC (%)
Decision tree88.34
Linear Discriminant Analysis81.65
Bayse81.36
KNN90.83
SVM98.49
Table 8. Summary of existing studies on the diagnosis of CAD.
Table 8. Summary of existing studies on the diagnosis of CAD.
AuthorDataMethodResult (%)
Li et al. [20]Self-collected
135 CAD/60 non-CAD
PCG, multi-domain features, deep features, MLPACC: 90.4
SPE: 83.4
SEN: 93.7
Samanta et al. [21]Self-collected
29 CAD/37 non-CAD
PCG, time domain and frequency domain features, CNNACC: 82.6
SPE: 79.6
SEN: 85.6
Kaveh et al. [22]MIT-BIH
43 CAD/46 non-CAD
ECG, time domain and frequency domain features, SVMACC: 88.0
SPE: 92.6
SEN: 84.2
Huang et al. [46]Self-collected
348 Normal/206 CAD
PCG, MFCCs, PCG sequence, Customized modelACC: 96.05
SPE: 96.12
SEN: 96.12
Li et al. [47]Self-collected
347 CAD/74 non-CAD
ECG and PCG, sequence, spectrum image, ST image, MFCCs imageACC: 96.51
SPE: 90.08
SEN: 99.37
This studySelf-collected
135 CAD/64 non-CAD
ECG and PPG, Multi-modal deep-coding features, SVMACC: 98.49
SPE: 98.57
SEN: 98.57
Self-collected [39]
135 CAD/60 non-CAD
ECG and PCG, Multi-modal deep-coding features, SVMACC: 96.37
SPE: 90.22
SEN: 98.26
Self-collected
60 CAD/60 non-CAD
ECG and PCG, Multi-modal deep-coding features, SVMACC: 97.08
SPE: 96.12
SEN: 98.22
Table 9. Comparison of existing studies on ECG classification using the PhysioNet dataset and PCG classification using the PhysioNet/CinC Challenge 2016 dataset.
Table 9. Comparison of existing studies on ECG classification using the PhysioNet dataset and PCG classification using the PhysioNet/CinC Challenge 2016 dataset.
AuthorClassification MethodInputResult (%)
Studies on ECG classification using the PhysioNet dataset
Kumar et al. [8]SVMTime–frequency featuresACC: 99.60
Tan et al. [9]1-D CNNECG signalACC: 99.85
Acharya et al. [10]1-D CNNEntropy featuresACC: 99.27
This studySVMMRP deep-coding featuresACC: 99.87
Studies on PCG classification using the PhysioNet/CinC Challenge 2016 dataset
Tschannen et al. [11]1-D CNNTime features, Frequency features ACC: 87.00
Noman et al. [12]2-D CNNMFCCs imageACC: 88.80
Baydoun et al. [13]Boosting and bagging modelTime–frequency features, Statistical featuresACC: 91.50
Humayun et al. [14]1D-CNNPCG signalACC: 97.50
This studySVMMRP deep-coding featuresACC: 97.56
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, C.; Liu, C.; Wang, X.; Liu, Y.; Zhao, S. Coronary Artery Disease Detection Based on a Novel Multi-Modal Deep-Coding Method Using ECG and PCG Signals. Sensors 2024, 24, 6939. https://doi.org/10.3390/s24216939

AMA Style

Sun C, Liu C, Wang X, Liu Y, Zhao S. Coronary Artery Disease Detection Based on a Novel Multi-Modal Deep-Coding Method Using ECG and PCG Signals. Sensors. 2024; 24(21):6939. https://doi.org/10.3390/s24216939

Chicago/Turabian Style

Sun, Chengfa, Changchun Liu, Xinpei Wang, Yuanyuan Liu, and Shilong Zhao. 2024. "Coronary Artery Disease Detection Based on a Novel Multi-Modal Deep-Coding Method Using ECG and PCG Signals" Sensors 24, no. 21: 6939. https://doi.org/10.3390/s24216939

APA Style

Sun, C., Liu, C., Wang, X., Liu, Y., & Zhao, S. (2024). Coronary Artery Disease Detection Based on a Novel Multi-Modal Deep-Coding Method Using ECG and PCG Signals. Sensors, 24(21), 6939. https://doi.org/10.3390/s24216939

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop