Review of Phonocardiogram Signal Analysis: Insights from the PhysioNet/CinC Challenge 2016 Database

Zhu, Bing; Zhou, Zihong; Yu, Shaode; Liang, Xiaokun; Xie, Yaoqin; Sun, Qiuirui

doi:10.3390/electronics13163222

Open AccessReview

Review of Phonocardiogram Signal Analysis: Insights from the PhysioNet/CinC Challenge 2016 Database

by

Bing Zhu

¹

,

Zihong Zhou

¹,

Shaode Yu

^1,*

,

Xiaokun Liang

²

,

Yaoqin Xie

²

and

Qiuirui Sun

^3,*

¹

School of Information and Communication Engineering, Communication University of China, Beijing 100024, China

²

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

³

Center of Information & Network Technology, Beijing Normal University, Beijing 100875, China

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(16), 3222; https://doi.org/10.3390/electronics13163222

Submission received: 17 July 2024 / Revised: 9 August 2024 / Accepted: 12 August 2024 / Published: 14 August 2024

(This article belongs to the Special Issue Signal, Image and Video Processing: Development and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The phonocardiogram (PCG) is a crucial tool for the early detection, continuous monitoring, accurate diagnosis, and efficient management of cardiovascular diseases. It has the potential to revolutionize cardiovascular care and improve patient outcomes. The PhysioNet/CinC Challenge 2016 database, a large and influential resource, encourages contributions to accurate heart sound state classification (normal versus abnormal), achieving promising benchmark performance (accuracy: 99.80%; sensitivity: 99.70%; specificity: 99.10%; and score: 99.40%). This study reviews recent advances in analytical techniques applied to this database, and 104 publications on PCG signal analysis are retrieved. These techniques encompass heart sound preprocessing, signal segmentation, feature extraction, and heart sound state classification. Specifically, this study summarizes methods such as signal filtering and denoising; heart sound segmentation using hidden Markov models and machine learning; feature extraction in the time, frequency, and time-frequency domains; and state-of-the-art heart sound state recognition techniques. Additionally, it discusses electrocardiogram (ECG) feature extraction and joint PCG and ECG heart sound state recognition. Despite significant technical progress, challenges remain in large-scale high-quality data collection, model interpretability, and generalizability. Future directions include multi-modal signal fusion, standardization and validation, automated interpretation for decision support, real-time monitoring, and longitudinal data analysis. Continued exploration and innovation in heart sound signal analysis are essential for advancing cardiac care, improving patient outcomes, and enhancing user trust and acceptance.

Keywords:

phonocardiogram; heart sound state classification; signal segmentation; feature extraction; deep learning; machine learning; electrocardiogram

1. Introduction

Cardiovascular diseases (CVDs) continue to be the leading cause of death worldwide and significantly contribute to health deterioration and increased healthcare costs [1]. The diseases cause around 17.9 million deaths annually and account for 32% of all global deaths. Specifically, more than 75% of CVD deaths occur in low- and middle-income countries, and a major proportion of these deaths (38%) are in people under 70 years of age [2].

Heart sounds are crucial for the detection, diagnosis, and monitoring of CVDs. These non-stationary, quasi-periodic acoustic sounds are produced by heart valve pulsations and moving blood. A cardiac cycle consists of the first (S1) and second (S2) heart sounds [3]. Specifically, S1, or the “lub” sound, indicates the start of the heart pumping, caused by the closure of the mitral and tricuspid valves at the beginning of ventricular systole. It occurs with a low pitch and loud volume and lasts for about 0.15 seconds (s). S2, or the “dub” sound, marks the end of the heart pumping and the beginning of ventricular diastole due to the closure of the aortic and pulmonary valves at the end of ventricular systole. It appears with a high pitch and weak volume, and its duration is relatively short (≈0.08 s) [4]. In a cardiac cycle, the third (S3) and fourth (S4) heart sounds can be heard because of abnormal heart functionality [5]. S3, associated with rapid ventricular filling, occurs after S2 and may indicate heart failure or dilated cardiomyopathy in older adults. S4, associated with atrial contraction, occurs before S1 and often indicates conditions like hypertrophic cardiomyopathy or aortic regurgitation. Using heart sounds to differentiate abnormal from normal states is clinically significant.

The phonocardiogram (PCG) is routinely used for CVD detection, monitoring, and diagnosis. It is non-invasive, cost-effective, and sensitive to subtle changes that enhance the availability, capability, and sensitivity in abnormality detection. Meanwhile, the PCG allows for the precise analysis of the timing, duration, and intervals of heartbeats. This is critical in diagnosing conditions related to heart functionality, such as heart blocks, arrhythmias, and other electrical disturbances. In addition, it ensures a high-fidelity visual characterization of heart sound frequency and intensity. This is particularly useful for differentiating between innocent (harmless) and pathological (disease-related) murmurs. Importantly, empowered by artificial intelligence, PCG-based signal analysis could facilitate the detection, monitoring, diagnosis, and management of heart diseases [6].

Open-source PCG signal databases are valuable for CVD investigation in the scientific community. The PhysioNet/CinC Challenge 2016 database [7] (hereafter referred to as PCHSD2016) is one of the most extensive and well-known PCG databases for heart sound analysis [8]. It contains a total of 3126 heart sound recordings from 1072 subjects. Other public PCG databases include the Pascal heart sound dataset (656 recordings from 79 patients) [9], the Shiraz University fetal heart sound database (119 recordings from 109 pregnant women) [10], the Yaseen Khan dataset (1000 recordings) [11], the heart sound Shenzhen corpus (845 recordings from 170 patients) [12], the Indian Institute of Science fetal heart sound database (60 recordings from 60 expecting mothers) [13], the fetal PCG database (26 recordings from 26 pregnant women) [14], and the simultaneous electrocardiogram and PCG database (69 recordings from 24 male healthy subjects) [15]. Besides diverse lasting durations, the heart sound recordings in PCHSD2016 are collected from various body locations of children and adult participants in uncontrolled environments [16]. Comparatively, PCHSD2016 remains the highest quality, rigorously validated, and most standardized open database of heart sound recordings for algorithm analyses.

Profound reviews of PCG signal analysis have been published. Adithya et al. focused on fetal monitoring and summarized the acquisition standards, commercial products, signal-processing, and state classification algorithms [17]. Nabih-Ali et al. presented a comprehensive review covering signal preprocessing, feature extraction, and state classification. The review indicated that PCG signal analysis remains an open problem, and machine learning (ML) shows potential for accurate state recognition [18]. Ismail et al. conducted a survey on PCG-based heartbeat localization and classification [4]. Dwivedi et al. overviewed the automatic analysis and classification of heart sounds, considering research articles from 1963 to 2018 on signal segmentation, feature extraction and classification, and database summaries [19]. Ghosh et al. introduced PCG data acquisition and preprocessing methods, such as signal denoising and segmentation [20]. Kahankova et al. reviewed recent advances and future directions, such as signal extraction and processing, fetal health state classification, remaining challenges, and practical suggestions [21]. Zhao et al. concentrated on an ML-based heart sound analysis, compiling existing public and private datasets, introducing heart sound analysis algorithms, and summarizing the current applications and limitations of ML methods [8]. Chen et al. presented a thorough overview of various sub-tasks of heart sound analysis and examined the improvements made in each sub-task through both ML techniques and deep learning (DL) algorithms. Their review highlighted the potential of artificial intelligence to revolutionize cardiovascular healthcare through accurate and automated heart sound analysis [22].

This study presents a comprehensive technical review of PCG signal analysis. It differs from the previously mentioned reviews [4,8,17,18,19,20,21,22] in several aspects. Firstly, this review starts with the PCHSD2016 database, which presents diverse challenges, ranging from different data quality, recording devices, and acquisition settings. Secondly, novel proposals for PCG signal analysis that have been evaluated on the database are collected and analyzed. This review presents the relevant techniques in data preprocessing, signal segmentation, feature extraction, and heart sound state classification. Thirdly, these techniques cover ML, DL, and other related signal-processing algorithms, and state-of-the-art works on PCG-based normal and abnormal state classifications are summarized. To the best of our knowledge, several existing algorithms have achieved high accuracy (≥95%) on the PCHSD2016 database. Therefore, it is the right time to present a technical review of PCG signal analysis based on this database, which may shed light on other related tasks and downstream applications.

2. Literature Screening

This section introduces the PCHSD2016 database [7]. Based on this database, related publications are retrieved, and papers that propose novel techniques for PCG signal analysis are selected for follow-up investigation.

2.1. The PCHSD2016 Database

The PCHSD2016 database [7] was collected independently by different research teams around the world. The PCG recordings in the database were acquired from children and adult participants with a variety of health conditions, using different recording equipment in both clinical and non-clinical settings. The devices were placed at different auscultation sites, including the aorta, pulmonary artery, tricuspid valve, and mitral valve.

The database contains around 30 hours of 4430 recordings from 1072 subjects. Each subject provided one to six heart sound recordings, totaling 233,512 recordings of 116,865 heartbeats [16]. The duration of the signals ranges from 5 s to over 120 s. Since different equipment was used at different sampling frequencies, all the single-lead PCG signals were downsampled to 2k Hz for consistency.

The recordings in the database are divided into two categories: normal and abnormal. The abnormal category includes recordings of patients with various heart diseases, such as heart valve defects and coronary artery disease. In the challenge, the database was divided into a training set (84,425 beats and 3153 recordings of 308 participants) and a testing set (32,440 beats and 1277 recordings of 308 participants).

In addition, each PCG is recorded as a .wav file and a .hea file. The former contains the signal, and the latter specifies its format, sampling rate, and channel information. It should be noted that in the training-a subset, additional electrocardiogram (ECG) synchronized signals are provided, and these signals are saved as .dat files. Therefore, the training-a subset could be used for joint PCG and ECG signal analysis.

2.2. Literature Retrieval

The keyword “PhysioNet/CinC Challenge 2016” was searched using Google Scholar (accessed on 29 June 2024). The database [7] has been cited 234 times. After excluding non-English papers, books, editorials, review articles, dissertations, and other irrelevant publications, 104 technical papers remained. Figure 1 shows the number of technical publications per year since the database was released.

2.3. Organization of This Technical Review

PCG-based heart sound state classification depends on a series of analytical techniques. Section 3 introduces PCG signal preprocessing, including data resampling, duration standardization, amplitude normalization, and signal filtering and denoising. Section 4 covers signal segmentation to identify different stages of heart sounds and describes hidden Markov models (HMMs) and ML-based models. Section 5 provides heart sound feature extraction, highlighting time-domain, frequency-domain, and time-frequency-domain analyses. In addition, feature collection from dual-modal PCG and ECG signals is summarized. Section 6 presents the evaluation metrics, cross-validation strategies, notable classification approaches, and recent database achievements using PCG and joint PCG-ECG signals. In the end, Section 7 examines the technical findings, challenges, future directions, and study limitations, while Section 8 summarizes the current review study.

3. PCG Signal Preprocessing

The PCG signals in PCHSD2016 were acquired using different recording devices with various settings. Therefore, signal preprocessing, such as signal resampling, duration standardization, amplitude normalization, and signal filtering and denoising, becomes important for follow-up data analysis.

3.1. Signal Resampling

The PCG signals in the database were uniformly downsampled to 2k Hz for accessibility. For specific applications, a series of works resampled the signals to 1k Hz [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]. According to the Nyquist–Shannon theorem [49], a sampling rate of 1k Hz is sufficient to capture the information of heart sound signals. Downsampling reduces the time resolution, decreases the computational workload, and benefits low-frequency component analysis.

3.2. Duration Standardization

Duration standardization aims to adjust the length of PCG signals to a fixed duration. It facilitates batch processing and improves algorithm efficiency. A widely used method is padding or truncation [30,40,43,50,51,52,53,54,55,56]. For example, one can truncate each signal to its first 5 s. However, if a signal is shorter than 5 s, zeros are padded to reach 5 s.

3.3. Amplitude Normalization

Amplitude normalization aims to map signal amplitudes to a uniform range. It mitigates the effect of varying magnitudes on the model’s convergence speed and ensures the stability of numerical calculations. One common method is min-max normalization, which scales numerical values to a specified range [24,52,57,58,59,60,61]. For instance, Equation (1) maps the original signal

x_{s}

to the range [0, 1] and generates the normalized signal x, where

m i n

and

m a x

correspond to the minimum and maximum values of the signal

x_{s}

.

x = \frac{x_{s} - m i n (x_{s})}{m a x (x_{s}) - m i n (x_{s})}

(1)

Another widely used method is z-score normalization. It is particularly useful in statistics and ML when data samples come from different sources or scales [29,51,55,62,63,64,65,66,67,68,69,70]. It maps the signal to a normal distribution with a mean of zero and a standard deviation of one. Equation (2) formulates the computation procedure, where

μ

and

σ

stand for the mean and the standard deviation of the signal

x_{s}

, respectively.

x = \frac{x_{s} - μ (x_{s})}{σ (x_{s})},

(2)

3.4. Signal Filtering and Denoising

The purpose of signal filtering and denoising is to reduce the impact of irrelevant signals or noise and to enhance PCG signal quality. During data acquisition, external signals, such as speech and environmental sounds, and internal interference, such as swallowing, coughing, and breathing, are mixed. These factors often severely degrade the signal quality and pose difficulties in signal analysis [48,71].

3.4.1. Signal Filtering

Signal filtering involves low-pass, high-pass, and band-pass (BP) filters, allowing signal components within a specific frequency range to pass through the filter. Refs. [28,30,46,56,72,73] used Butterworth, Chebyshev, and Hamming window-based low-pass filters, while Refs. [41,69,74] used high-pass filters. BP filters are commonly used since the frequency of heart sounds ranges between 25 and 450 Hz. Refs. [25,32,33,35,36,39,40,42,43,47,55,57,65,68,75,76,77,78] utilized Butterworth, infinite impulse response, and finite impulse response BP filters. These filters have lower cut-off frequencies (15 to 50 Hz) and upper cut-off frequencies (200 to 900 Hz).

Digital filters, such as the elliptic filter [24,29], Savitzky–Golay filter [79,80], and adaptive Wiener filter [81], have been designed. The elliptic filter provides a sharp transition between the pass-band and stop-band, allowing for steep pass-band edges and cutoff characteristics in the frequency response. Mathematical tools, such as elliptic functions and Jacobian elliptic functions, are used to calculate the coefficients [82]. The Savitzky–Golay filter smooths data by fitting a subset of points onto a lower-order polynomial [83]. It preserves the original features and removes noise and artifacts. The adaptive Wiener filter utilizes statistical properties to estimate the optimal filtering coefficients. It can dynamically adjust the coefficients of the filter and find the optimal filter parameters in terms of mean square error [84].

3.4.2. Heart Sound Denoising

Digital filters are suitable for handling stationary signals, while PCG signals are non-stationary. Alternatives to digital filters have been explored for heart sound denoising, such as wavelet transform (WT) denoising [61,81,85,86,87,88], Schmidt spike removal [26,31,35,89], variational mode decomposition [70,90], moving average [91], windowed outlier filtering [67], homomorphic filtering, and total variation filtering [92].

Among these algorithms, WT denoising, Schmidt spike removal, and variational mode decomposition are relatively widely applied. WT denoising uses wavelet decomposition for simultaneous analysis of heart sounds in both the time and frequency domains [93]. The signal is approximated with its low- and high-frequency components. Through hard, soft, or adaptive thresholding, major coefficients are retained for signal reconstruction and analysis [94,95]. Schmidt spike removal aims to detect and remove spike noise [96]. The spike noise removal framework might involve heart sound filtering, signal segmentation, and spike noise detection in each window to identify the window-level maximum absolute amplitude. If a maximum absolute amplitude is three times larger than the median value, it is considered a spike, and the start and end of the noise spike are replaced with zero values. Iterative spike removal is conducted until all the windows are processed. Variational mode decomposition uses the variational method to decompose a signal into several modes with distinct frequency band characteristics [97]. These modes have the smallest bandwidth in a local frequency range, allowing for a precise description of the intrinsic properties of the signal. Selectively retaining some modes and reconstructing the signal results in a denoised signal.

4. Heart Sound Segmentation

Heart sounds can be divided into segments for multi-time-scale analysis. From a temporal perspective, a heart sound signal can be divided into multiple segments using a sliding window [23,25,98,99,100] or framing methods [56,70]. From a spatial perspective, a heart sound signal within a cardiac cycle can be divided into S1, S2, and other stages using clinical knowledge. In general, heart sound segmentation methods include HMMs [67,92,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117], ML [81,86,118,119,120,121,122,123], signal envelope [47,57,124,125,126,127], feature analysis [29,50,52,128,129], WT [129,130,131], and other methods [46,91,132,133,134].

4.1. HMM-Based Heart Sound Segmentation

As a probabilistic statistical model, an HMM describes a series of observations changing over time. It assumes a system composed of several states that can transition between each other, and each state generates an observation. Both the state transitions and the generation of observations follow specific probability distributions.

Heart sound segmentation algorithms based on HMMs and their variants have been designed. Gamero et al. simulated the duration of systolic and diastolic intervals, and HMM networks were designed with syntactic constraints to parse the sequence of these intervals [135]. Schmidt et al. developed a dependent HMM that identifies the most likely heart sound sequences by considering event duration, signal envelope amplitude, and a predefined model structure [96]. One recommended method is the logistic regression-based hidden semi-Markov model (LR-HSMM) [136], which has been widely used [101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117].

Figure 2 shows the procedure of the LR-HSMM for separating a heart sound into S1 and S2 segments for real-world recordings. The method combines wavelet feature optimization, feature extraction, and a hidden semi-Markov model [137]. It estimates the emission probability by extending logistic regression, decodes the most likely sequence of states using a modified Viterbi algorithm, and achieves promising segmentation accuracy.

4.2. ML-Based Heart Sound Segmentation

ML-based heart sound segmentation aims to learn features from a large-scale database to generalize well across different acquisition environments and databases. Related works include unsupervised clustering and supervised learning methods.

4.2.1. Unsupervised Clustering-Based Heart Sound Segmentation

Unsupervised clustering aims to aggregate similar acoustic patterns and distinguish the acoustic features of different events from each other. It requires no prior knowledge about heart sound states or labeled data samples, while effective features are crucial for clustering analysis. The segmentation can be divided into several steps. Firstly, features such as spectral features [119], power spectral density envelopes [121], and signal energy envelopes [138] are extracted. Secondly, a similarity matrix is computed based on these features, which describes the similarity between different sample points of the signals. Thirdly, clustering algorithms, such as spectral clustering and k-means clustering, are applied to the similarity matrix. Fourthly, a heart sound is segmented into different clusters based on similarity, and the resultant segments correspond to different heart sound events or states. Finally, the center of each cluster is defined, each sample point of heart sounds is assigned to the nearest cluster center, and an initial segmentation result is formed. Notably, the procedure iterates, and the cluster centers are refined until convergence or the maximum number of iterations is reached.

4.2.2. Supervised Learning-Based Heart Sound Segmentation

Supervised learning uses labeled training samples to learn the implicit mapping between signal samples and heart sound states. Many deep networks and training strategies have been utilized [139,140,141,142]. Renna et al. modified U-Net [143] and explored different temporal modeling schemes in conjunction with HMMs to predict emission distributions [120]. Babu et al. used empirical WT for signal reconstruction and collected envelope features for U-Net-based heart sound state recognition [123]. Enériz et al. implemented a large 1D U-Net-based model for real-time heart sound state recognition and explored different optimization methods for fully characterized performance [144]. Additionally, DenseNet [145] with an attention mechanism [146] and a bidirectional long short-term memory (LSTM) network [147] incorporating attention [148] have been applied to automatically detect the beginning and ending points of heart sound segments.

5. Heart Sound Feature Extraction

Heart sound feature extraction plays an important role in heart sound state prediction. It quantifies heart sound states—whether normal or abnormal—and different phases of the cardiac cycle, which is crucial for clinical CVD diagnosis.

5.1. PCG Heart Sound Feature Extraction

Generally, handcrafted features can be divided into time-domain features, frequency-domain features, and time-frequency-domain features, which may include entropy features and statistical characteristics. Table 1 groups the publications according to the signal features and feature extraction methods.

5.1.1. Time-Domain Feature Extraction

Time-domain features refer to the characteristics of signals in the time domain. Widely used features include statistical features, and signal reconstruction or approximation is also common. The average magnitude difference function (AMDF) analyzes periodic characteristics by forming a difference signal between the original and delayed signals, and the absolute magnitude of this difference is present for heart sound analysis [167]. It is defined in Equation (3):

\vec{v} = f_{A M D F} (τ) = \frac{1}{N - τ} \sum_{n = 0}^{N - τ - 1} | | x [n] - x [n + τ] | |,

(3)

where

x [n]

denotes the discrete signal,

x [n + τ]

is the signal with time

τ

delay, and N is the length of the signal. The positions of the minimum values correspond to the periodic characteristics of the signal. The AMDF identifies the periodic components of the signal by calculating the amplitude differences at different delays [60].

The autoregressive model describes the self-correlation of signals. It is formulated as shown in Equation (4):

x (t) = \sum_{i = 1}^{p} ψ_{i} x (t - i) + ϵ (t),

(4)

where

ψ_{i}

represents the autoregressive coefficients or parameters, p denotes the order of the model, and

ϵ

is the white noise [168]. The estimated autoregressive coefficients (

{ψ_{i}}_{i = 1}^{p}

) serve as the characteristics of the signal.

Polynomial fitting or regression aims to fit discrete data points using polynomial functions, and it captures the trends and characteristics of the signals [41,154]. Assuming the signal y is approximated by

\hat{y}

, as shown in Equation (5),

\hat{y} = P_{n} (x) = α_{0} + α_{1} x + α_{2} x^{2} + \dots + α_{n} x^{n},

(5)

the estimated polynomial coefficients

{α_{i}}_{i = 0}^{n}

serve as the polynomial features. These features reflect the variation information of the signal at different orders, capturing the trends, fluctuations, and patterns of the HSS.

5.1.2. Frequency-Domain Feature Extraction

Discrete FT (DFT) and fast FT remain the most popular frequency-domain feature extraction methods for heart sound analysis [27,28,29,42,45,59,72,90,99,107,109,156,159]. These methods reveal the amplitude and phase cues of the signal components at different frequencies. DFT and its inverse procedure for heart sound signal recovery are described below.

DFT transforms a discrete signal from the time domain to the frequency domain, which is defined as follows:

X [k] = \sum_{n = 0}^{N - 1} x (n) e^{- j \frac{2 π}{N} k n},

(6)

where

X [k]

(

k = 0, 1, \dots, N - 1

) is the frequency-domain representation of the signal. The inverse procedure converts the signal from the frequency domain back to the time domain, and it is described in Equation (7):

x (n) = \frac{1}{N} \sum_{k = 1}^{N - 1} X [k] e^{j \frac{2 π}{N} k n}

(7)

DFT can extract various frequency-domain features, including the amplitude spectrum, power spectrum, frequency centroid, and bandwidth, which enrich the signal representation and improve heart sound state classification.

5.1.3. Time-Frequency-Domain Feature Extraction

Time-frequency-domain features refer to characteristics that combine both the time- and frequency-domain information of the signal. Widely used time-frequency-domain feature extraction methods include spectrograms [25,32,42,44,52,53,54,60,65,70,101,157,169,170], Mel-frequency cepstral coefficients (MFCCs) [32,33,34,35,36,55,77,111,117,165,171,172], short-time FT (STFT) [25,42,52,53,60,65,68,70], and various WTs [27,28,29,42,45,59,72,90,99,107,109,130,156,159].

STFT obtains the distribution of a signal in both the time and frequency domains for local signal analysis. After a heart sound is segmented into overlapping short time windows, FT is applied to each time window. Then, the spectrum information of each time window, along with the time-frequency spectrogram of the whole signal, is computed. By overlapping and summing the time-frequency spectra of each time window, an overall time-frequency representation of the signal is obtained. Given a discrete signal

x (n)

, its time-frequency spectrogram is given by Equation (8):

X (m, ω) = f_{S T F T} (x (n)) = \sum_{- \infty}^{+ \infty} x (n) W (n - m) e^{- j ω n},

(8)

where

X (m, ω)

represents the STFT of the signal at time m and frequency

ω

, and

W (n - m)

is a window function [173] used to extract a specific local region.

Continuous WT (CWT) quantifies the local frequency characteristics of non-stationary signals in a timely manner [174]. It achieves multi-resolution representation by changing the scale of the wavelet function, allowing it to observe the signal at different time scales. Equation (9) shows the formulation of CWT computation:

W_{x} (a, b) = \frac{1}{\sqrt{| | a | |}} \int_{- \infty}^{+ \infty} x (t) ϕ^{*} (\frac{t - b}{a}) d t

(9)

where a is the scale parameter that controls the expansion and contraction of the wavelet, b is a translation parameter that controls the position of the wavelet in time,

ϕ

represents the mother wavelet,

ϕ^{*}

is its complex conjugate, and

W_{x} (a, b)

stands for the CWT coefficients at scale a and position b. It should be noted that the mother wavelet is the basis function of the WT, and it can generate a set of daughter wavelets through scaling and time translation to represent the local features of the signal. Common mother wavelets include the Haar wavelet, Daubechies wavelet, and Morlet wavelet. Based on the WT, features such as wavelet coefficients and wavelet gradient maps can be extracted from the time-frequency domain.

5.2. Dual-Modal PCG and ECG Heart Sound Feature Extraction

The ECG signal is another important indicator for screening and diagnosing CVDs. The ECG records electrical activity changes, while the PCG records vibrational waveforms of heart sounds. Dual-modal feature extraction and fusion might offer a comprehensive depiction of the physiological and pathological heart states. Among the screened literature, thirteen papers utilized synchronized ECG and PCG signals. Specifically, seven papers used handcrafted features from the signals, while the others designed deep networks for hierarchical feature extraction. Table 2 describes the works that used manual feature extraction methods.

It has been found that high-resolution superlet transform (SLT) and adaptive superlet transform (ASLT) [181] are promising for ECG and PCG signal representation [179]. SLT combines multiple continuous wavelet transforms with different center frequencies to improve the signal’s time-frequency resolution [181]. This method is particularly suitable for analyzing nonlinear and non-stationary signals. Its output provides a visual representation of the signal’s frequency components at different time points, so abnormal patterns in cardiac signals can be detected and classified. Further, ASLT starts with a low order to estimate low frequencies and then increases the order as a function of frequency to enhance representation in both time and frequency across the entire frequency domain. Assuming a set of wavelets of multiple scales is denoted as

ψ = {ψ_{s_{1}}, \dots, ψ_{s_{o}}}

, where o is the order of the wavelets or the number of wavelets in the set and

s_{i}

is the scale parameters of a wavelet, for a signal x, the geometric average of all wavelet responses

R_{S L T} (x)

is given by the complex convolution as follows:

R_{S L T} (x) = {(\prod_{i = 1}^{O} ψ_{s_{i}} * x)}^{\frac{1}{O}},

(10)

where ∗ denotes convolution operations. ASLT selects the optimal Wavelet combination by evaluating the minimum mean squared cross-entropy and ensures the accuracy of the time-frequency representation.

6. Heart Sound State Recognition

Automated state recognition enables the rapid processing of large volumes of heart sound signals, assists in the quick identification and categorization of heart sound states, and improves clinical screening and diagnosis efficiency. The performance of PCG-based binary classification (normal and abnormal states) on the PCHSD2016 database is summarized below.

6.1. Evaluation Metrics

Four metrics—accuracy (

A C C

), sensitivity (

S E N

), specificity (

S P E

), and the PhysioNet/CinC Challenge 2016 official metric (

s c o r e

)—are used for binary classification evaluation. The official evaluation indicator,

s c o r e = \frac{S E N + S P E}{2}

, provides a comprehensive evaluation of model performance, especially when the dataset is unbalanced.

6.2. Cross-Validation Strategies

Cross-validation (CV) strategies are important for comparing different algorithms on heart sound state recognition. These strategies should avoid data leakage, ensure the model generalizing well to unseen cases, and provide reliable performance estimation.

Several strategies are involved. One is the train-test split, where the data samples are randomly divided into a training set and a testing set, with a typical ratio of 80/20 [24,29,165]. Similarly, the train-validate-test-split strategy divides the data into three subsets for training, validation, and testing, with ratios such as 75/15/10 or 80/10/10 [23,25,43,61,102]. Additionally, a widely used strategy is to divide a database into k folders (k-fold CV), and the model is trained and validated k times [35,72,90,100,157]. Each time, a different fold is used as the validation set, and the remaining folds serve as the training set.

6.3. PCG-Based Heart Sound State Recognition

Numerous heart sound state classification models using PCG signals have been proposed. These models can generally be grouped into ML, DL, and hybrid models, as shown in Table 3, which presents the corresponding publications.

In ML methods, diverse classifiers have been designed, modified, or applied to establish the relationship between handcrafted features and heart sound states. These include unsupervised learning, supervised learning, ensemble learning, artificial neural networks, and other classifiers. Among the classifiers, the support vector machine (SVM) is widely used [31,41,56,69,76,77,89,99,103,104,106,107]. The purpose of SVM is to find a hyperplane or optimal boundary that maximizes the margin between two classes of data samples. Its kernel functions can be linear, polynomial, or radial basis. Another preferred classifier is the random forest. It is a parallel ensemble learning algorithm based on decision tree estimators, with its prediction depending on the majority vote of the base estimators. Specifically, each decision tree is trained on a random subset of the dataset through bootstrap sampling from the original training dataset [185].

In DL-based heart sound state recognition methods, different end-to-end learning architectures have been proposed. These architectures facilitate exploration in various areas, including time-series analysis using recurrent neural networks (RNNs) and long short-term memory (LSTM) networks [39,60,81,86,102,104,165,169], spectrogram image analysis using convolutional neural networks (CNNs) [23,25,29,30,32,39,43,51,52,53,57,58,59,68,70,74,80,99,101,103,108,117,125,128,163,169], time-frequency diagram analysis using attention neural networks [31,65,100,118,157,169], and pre-trained efficient networks [25,70,99,128,157,170].

Among hybrid methods, both ML classifiers and DL networks have been integrated to boost recognition performance, such as voting based on network prediction [26], federated averaging [54], and model combinations [65,80,184].

6.4. State-of-the-Art Works on PCG-Based Heart Sound State Recognition

Table 4 presents state-of-the-art works on the PCHSD2016 database. These works have achieved promising performance (ACC ≥ 95.00%), and most of them employed CNNs [23,25,29,43,61,125,157]. It is not surprising that the deep learning method [24] achieved the best overall performance. This method identifies S1 and S2 segments, which are then converted into images. Finally, a CNN is used for classification, resulting in promising results. The second-best model, CardioXNet, builds three parallel CNN pathways and incorporates two learning phases (representation learning and sequence residual learning) for outstanding prediction performance [186].

6.5. PCG-Based Inter-Database Heart State Classification

Literature screening identified three studies (see Table 5) that conducted inter-database validation of PCG-based heart state classification. The studies first trained and validated the proposed models on the PCHSD2016 database (intra-database validation), and then the models were further validated on external databases (inter-database validation). The comparison between the intra-database and inter-database validation results indicates that the evaluation metrics were well maintained and that the models consistently achieved good generalization [25,39,74].

6.6. Dual-Modal Heart State Classification Using ECG and PCG Signals

Several studies explored both ECG and PCG signals for dual-modal state recognition. Table 6 shows the performance of the studies that used synchronized ECG and PCG signals from the training subset of PCHSD2016. Some studies [175,180,187,188,189,190] additionally used the area under the curve (AUC) for performance evaluation.

Some studies used SVMs [175,176,177,187,188] and pre-trained deep networks [128,179,180,188,189,190,191,192]. Ref. [190] achieved the best performance. The authors designed a collaborative learning-based progressive dense fusion network. The network integrates a three-branch interwoven architecture consisting of modality-specific encoders and a progressive dense fusion encoder. Other studies achieved promising results by proposing novel models [189] and fusing different networks [191]. Additionally, appropriate feature collection enabled SVMs to achieve superior performance [175,176,177,187,188] compared to those deep networks [179,180,192].

7. Discussion

PCG-based heart sound state recognition plays an important role in CVD detection, diagnosis, and monitoring. Based on the PCHSD2016 database, a series of technical proposals (104 publications) have been designed, ranging from data filtering and denoising to signal segmentation, feature extraction, and state recognition of PCG heart sounds.

7.1. Our Findings

Our findings are summarized in two parts. The first part is the classification of heart sound states using PCG signals alone and using both PCG and ECG signals. Cross-database validation of model generalization is also presented. The second is the feature engineering of heart sound signals, encompassing feature extraction, signal segmentation, and preprocessing.

7.1.1. Classification of Normal and Abnormal Heart Sound States

Promising performance has been achieved for PCG-based heart sound normal and abnormal state recognition. The metric values of benchmark works (Table 4) were

A C C \geq 95.50 %

,

S E N \geq 87.60 %

,

S P E \geq 93.60 %

, and

s c o r e \geq 92.70 %

. Specifically, 14 studies designed deep networks, including CNNs [23,25,29,43,61,125,157], encoders [24,100], and RNNs [102], while only Ref. [35] explored traditional machine learning algorithms (KNN and SVM). The comparison indicates that DL is the predominant technique leading to good performance. It was also found that many ML, DL, and hybrid methods have been developed (Table 3), including but not limited to ML classifiers, DL networks, and integrated architectures, with a majority of these models trained in a supervised learning manner. The current high-accuracy results suggest that achieving further improvements in heart sound state classification performance on this database will be challenging.

Little attention has been paid to inter-database validation (Table 5). Inter-database or cross-database validation plays an important role in verifying models’ generalization capacity beyond the training data; however, it is difficult due to data distribution differences between databases, varying data quality, domain shifts, and different data collection methodologies. Among the publications, three studies performed inter-database validation and generalized well on external databases. Ref. [25] leveraged transfer learning on unsegmented phonocardiogram spectrograms, Ref. [39] implemented a lightweight end-to-end PCG classification network for wearable devices, and Ref. [74] proposed two-stage decision support using deeply learned features and decision trees. Although good inter-database validation results have been observed [25,39,74], the generalization capacity of these models should be well investigated in future studies.

Up to 12 studies explored joint ECG and PCG signal analysis on the database for heart state classification (Table 6). Among these studies, five models used SVM as the classifier [175,176,177,187,188], while the other models employed or upgraded CNN architectures [128,179,180,187,189,190,192]. It is known that ECG and PCG provide distinct indicators of heart function. ECG records changes in electrical activity with each heartbeat, while PCG demonstrates vibration waveforms of heart sounds. These joint signal analysis approaches might enhance the accuracy of heart state classification and increase confidence in clinical decisions. Further integration of ECG, PCG, other modalities, and clinical information could provide a more comprehensive understanding of the heart’s physiological and pathological states.

7.1.2. Feature Engineering on Heart Sound Signals

Heart sound feature extraction aims to provide an intrinsic representation of heart sounds in different domains for specific applications (Table 1 and Table 2). This study discussed the AMDF [167], autoregressive model [168], and polynomial fitting [154] in the time domain, DFT in the frequency domain, and STFT [173], CWT [174], SLT, and ASLT [181] in the time-frequency domain. In practice, several toolboxes are available, such as OpenSMILE [150] and Librosa [193], which ease the burden of feature collection from signals. By combining different methods, hundreds of features can be handcrafted. Meanwhile, different subsets of features have been used for heart sound quantification, and some features are shared, such as statistical features, MFCCs, and spectrogram features [32,33,34,35,36,55,77,111,117,165,171,172]. Interestingly, these shared features are also preferred in other related signal analysis tasks, such as speech emotion recognition [194] and cough sound analysis [195]. It was also found that deep networks perform well as high-level feature extractors [187,188]. These learned deep features might enhance classification performance but at the expense of reduced interpretability. Properly integrating both handcrafted features and deeply learned features remains challenging yet helpful for specific heart sound signal analysis.

Heart sound segmentation is an intermediate step in PCG signal analysis. It helps identify and diagnose various cardiovascular conditions, particularly abnormalities, through the heart sound cycle [29,44,45,51,55,60,61,67,72,81,86,90,92,101,117,118,125,157]. This study summarized HMM-based and ML-based techniques and presented the workflow of the recommended LR-HSMM (Figure 2). The LR-HSMM employs wavelet feature optimization and identifies informative features to detect the four events of the heart cycle [136]. Heart sound segmentation facilitates different phase localization, specific feature extraction, diagnostic information collection, and clinical interpretation. In contrast, a dozen studies implemented deep learning networks for unsegmented PCG signal analysis [25,32,42,52,53,55,65,70,98,104,156,157,169,170,188]. In these studies, PCG signals were converted into spectrogram, log-Mel spectrogram, and scalogram images. Therefore, end-to-end deep learning architectures could be deployed for hierarchical feature extraction and heart state classification.

Heart sound signal preprocessing is also important. In terms of literature screening, it was found that signal filtering and denoising techniques are continuously evolving. In signal filtering, BP filters are popular due to the prior knowledge of the frequency ranges of heart sound signals [25,32,33,35,36,39,40,42,43,47,55,57,65,68,75,76,77,78]. It was observed that the frequencies of heart sound signals fall into a restricted range, while the lower and upper cut-off frequencies are not determined since heart sounds are dynamic signals. In contrast, denoising methods are more suitable for handling non-stationary signals. A series of denoising methods have been developed using different transforms [61,81,85,86,87,88], signal decomposition [70,90], and filtering strategies [26,31,35,67,89,91,92]. Therefore, a comprehensive comparison of signal filtering and denoising algorithms becomes helpful for identifying suitable methods for heart sounds.

7.2. Analytical and Data Challenges

Besides physiological challenges, such as patient variability and heart rate variability, and technical challenges, such as noise, artifacts, sensor placement, device variability, and recording settings [21], there are analytical challenges and data challenges.

7.2.1. Analytical Challenges

Heart sound analysis involves a broad range of analytical algorithms, from signal preprocessing, sound segmentation, and feature extraction to state recognition. The first challenge arises from the interpretability of DL models in heart sound analysis. According to Table 3, Table 4 and Table 6, DL models have been predominantly used and have achieved promising results on the PCHSD2016 database. However, few studies have explored the interpretability of these models [59,196], which is important for building user trust and deploying models. Many interpretation approaches have been proposed for biomedical imaging, and the core ideas include visualization, visualization by perturbation, visualization by gradient or backpropagation, and explanation by examples [197]. Therefore, these techniques could generally be adapted to explain spectrum image- and MFCC image-based PCG heart sound analysis.

The second challenge is feature selection for identifying discriminative data representation. It is crucial for both ML-based heart sound segmentation and heart sound state recognition. As shown in Table 1 and Table 2, a large number of features can be collected in the time, frequency, and time-frequency domains, and Table 3 shows different frameworks using different subsets of features, achieving competitive performance. Therefore, finding a subset of discriminative features becomes increasingly important. One widely used approach is performance-oriented model building, which involves feature selection and model optimization [198]. Due to the large number of features and diverse classifiers, this approach is time-consuming and may fall into local optimization. Some other approaches have considered model interpretability [196,197], feature selection stability [199,200], and computational complexity in specific applications.

Moreover, transitioning promising achievements in heart sound segmentation and heart sound state recognition algorithms from laboratory research to clinical applications remains challenging. In addition to complex heart sound signals with various components, the transition procedure lacks high-quality databases, standardized procedures, annotation requirements, and regulatory approval, and new diagnostic and monitoring algorithms should undergo rigorous validation.

7.2.2. Data Challenges

In the era of artificial intelligence, the demand for data by DL models is outpacing the supply. Therefore, the first data challenge is the limited data availability. It should be noted that the number of recordings in the PCHSD2016 database is insufficient for training a general DL-based heart sound analysis system [8]. Today, building a large-scale database with millions of PCG recordings seems feasible. However, it necessitates a large number of manual annotations, specialized recording devices, and many well-trained clinicians. This process is time-consuming, labor-intensive, and costly. At the same time, data privacy and sharing must be fully considered due to concerns about regulations and policies for academic and commercial data usage [196].

Besides data augmentation [52,70,110], synthetic data generation shows promise for increasing data volume and addressing data ethics issues [201]. However, its feasibility in relieving data imbalance and bias is under investigation. For music and speech signal generation, the autoregressive model WaveNet [202] has gained considerable attention. For synthesizing PCG signals, a novel model based on coupled ordinary differential equations has been proposed [203]. Notably, the quality of generated normal and abnormal PCG signals has been verified by cardiologists, suggesting potential in assessing signal-processing techniques and clinical diagnosis. However, further investigation is needed to determine whether simulated data are beneficial for PCG-based heart sound analysis.

Data quality poses an additional challenge. Well-labeled or high-quality databases help avoid potential overfitting and guide the training of intelligent models toward intrinsic feature embedding. Consequently, precise data annotation becomes imperative for decreasing inter- and intra-rater differences. At the same time, data diversity can enhance model generalization. This might involve population diversity, recording conditions, technical equipment, and multi-center studies. In particular, noisy annotations or inter-annotator differences could be used to increase model robustness [204]. However, data bias and shift, noisy annotations, and low-quality samples are unavoidable during data collection. Therefore, data quality, data diversity, and noisy annotations should be well balanced during model development, parameter optimization, and cross-database generalization.

7.3. Potential Future Directions

In addition to PCG data acquisition [17,20], signal representation [4], ML-based heart sound diagnosis [18], sensor placement [21], and applications of smartphones and wearable devices [8], potential directions include but are not limited to multi-modal signal fusion, standardization and validation of heart sound analysis, automated interpretation for decision support, real-time monitoring, and longitudinal data analysis.

7.3.1. Multi-Modal Signal Collection, Feature Fusion, and Decision Making

Incorporating PCG signals with other modalities, such as ECG signals, echocardiogram images, blood pressure, blood biomarkers, and cardiac imaging, could provide a more comprehensive diagnostic approach, capturing diverse aspects of cardiovascular function. Challenges related to data quality and annotation, privacy preservation and security, as well as semantic heterogeneity and complexity, must be carefully considered in large-scale, high-quality multi-modal data collection.

Fusing multi-modal signal input or features has become indispensable. Joint PCG and ECG signal analysis has been explored, and promising results have been achieved (Table 6). There are two kinds of approaches. One represents signals as handcrafted or deeply learned features [175,176,177,187,188], recasting the heart sound analysis problem as an ML task. The other converts signals into gray or multi-channel images, making the problem an end-to-end optimization issue [128,179,180,188,189,190,191,192]. Fortunately, ample prior knowledge and empirical experience can be utilized to assess the feasibility, efficiency, and effectiveness of these approaches, including but not limited to multi-branch feature fusion, classifier ensembling, and federated learning.

Notably, large language models (LLMs) and large vision language models (LVLMs) have the potential to aid in feature fusion and decision-making processes [205,206,207]. These models could combine multi-modal data sources, enabling abnormality detection, state classification, and content understanding. For instance, Gu et al. proposed CheX-GPT, which harnesses LLMs to improve the quality of chest X-ray report labeling. The model excelled in labeling accuracy and showcased superior efficiency, flexibility, and scalability [208]. Han et al. investigated the feasibility of several LLMs for predicting the 10-year cardiovascular risk of a patient and observed performance comparable to the Framingham risk score in cardiovascular risk prediction on the UK Biobank cohort [209]. Gala and Makaryus presented a narrative review on using an LLM in cardiology, indicating that a lot of work remains for improved patient outcomes and physician productivity [210].

7.3.2. Standardization and Validation of Heart Sound Analysis

Standardizing heart sound signal analysis is an ongoing challenge critical for ensuring the consistency, reliability, and comparability of classification results across diverse studies, datasets, and applications. In heart sound analysis, achieving this entails more than just collecting diverse PCG recordings with consistently high-quality annotations. It involves several key stages: signal preprocessing, heart sound segmentation, feature extraction and selection, and heart sound state recognition. Identifying the most effective algorithms for each stage is essential, necessitating extensive experimentation to build a standardized, high-accuracy system.

A fair comparison of heart sound analysis algorithms plays an important role in selecting the most effective ones. Intra- and inter-database validation should be carefully considered. Intra-database validation assesses the performance of algorithms on the testing subset of samples. These testing samples and the samples in the training subset come from the same data distribution. To evaluate well-trained algorithms, inter-database validation becomes increasingly crucial since it assesses the generalizability of the algorithms across different populations and recording conditions.

There is still a significant journey ahead to successfully transition well-trained algorithms, validated through intra- and inter-databases, from laboratory research to clinical applications. Substantial efforts will be required for algorithm deployment and external validation in clinical settings, encompassing robustness testing, real-time capability, prediction reproducibility, and risk assessment.

7.3.3. Automated Interpretation for Decision Support

The ultimate goal of intelligent heart sound signal analysis is to provide interpretable decision support for CVD screening and diagnosis. ML-based algorithms excel in explaining quantitative features and classifiers. However, their classification performance often falls short of expectations. DL-based algorithms offer promising potential for achieving high accuracy, although challenges remain regarding their interpretability.

Recent studies on LLMs and LVLMs shed light on automated interpretation for decision support. Hu et al. built a large-scale comprehensive evaluation benchmark for a medical LVLM. The database contained 12 modalities, 118,010 images, and 127,995 question-answering items [211]. Kwon et al. claimed that LLMs are clinical reasoners and proposed a reasoning-aware diagnosis framework with prompt-generated rationales that provides insight into patient cases and their reasoning path toward accurate diagnosis decisions [212]. These ongoing studies may lay the groundwork for advancing large-scale models to achieve breakthroughs in medical diagnosis and health informatics [213].

7.3.4. Real-Time Monitoring Combined with Longitudinal Data Analysis

Real-time monitoring combined with longitudinal heart sound analysis is essential for proactive healthcare management, early detection of changes, personalized medicine, and advancing research in cardiovascular health.

Real-time monitoring is a dynamic procedure that paves the way for longitudinal heart sound analysis. Current studies concentrate on short-time heart sounds, and longitudinal analysis remains largely unexplored. Fortunately, heart sound patterns identified by short-time analysis [186] might facilitate a systematical study over extended periods to monitor changes and trends. This also highlights the need for more attention to be paid to real-time monitoring and longitudinal heart sound analysis.

7.4. Limitations

There are several limitations to the current review. Firstly, this study focuses on the PCHSD2016 database, and other public and private databases are not considered. It is recommended to combine this review and other review publications [18,20,21,22] to broaden our understanding of recent advances in heart sound analysis. Secondly, this study investigates related analytical publications that propose novel techniques and then validate these techniques on the database. It covers analytical algorithms ranging from signal filtering and denoising to heart sound segmentation, feature extraction, and state recognition. Some algorithms and novel workflows are introduced. For information on specific algorithms, we refer the reader to related technical papers and review publications on topics such as heart sound data acquisition and preprocessing techniques [20], as well as artificial intelligence for heart sound classification [22]. Thirdly, this study focuses on PCG-based heart sound analysis. Analytical techniques commonly used for other signals, such as electrocardiograms [214], electroencephalograms [215], and speech signals [139], can be adapted or applied to this analysis.

8. Conclusions

The phonocardiogram is a non-invasive, cost-effective, and easily accessible tool that contributes to the early detection and continuous monitoring of cardiovascular diseases. This review of phonocardiogram signal analysis draws insights from the PhysioNet/CinC Challenge 2016 database, a comprehensive benchmarking dataset, and retrieves 104 publications that propose and validate novel techniques based on the database. The analytical techniques cover signal preprocessing, sound segmentation, feature extraction, and state classification. Among the techniques, machine learning and deep learning have been widely used, and promising results have been achieved by deep learning architectures.

In conclusion, this review highlights the technical advancements in the PhysioNet/CinC Challenge 2016 database for PCG-based heart sound analysis. Despite the promising progress, analytical and data challenges remain. Future studies could focus on models’ interpretability and generalizability to improve user trust and model acceptance. Continued exploration and innovation in phonocardiogram signal analysis are important for advancing cardiac care and improving patient outcomes.

Author Contributions

Conceptualization, B.Z., S.Y. and Q.S.; Data curation, B.Z., Z.Z. and S.Y.; Formal analysis, X.L., Y.X. and Q.S.; Funding acquisition, X.L., Y.X. and Q.S.; Investigation, S.Y., Y.X. and Q.S.; Methodology, B.Z., Z.Z., S.Y. and X.L.; Project administration, Q.S.; Software, B.Z., Z.Z. and S.Y.; Supervision, Q.S.; Validation, X.L. and Y.X.; Visualization, Z.Z., S.Y. and X.L.; Writing—original draft, Z.Z. and S.Y.; Writing—review and editing, B.Z., X.L., Y.X. and Q.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported in part by the National Key Research and Develop Program of China (Grant Nos. 2022ZD0115901 and 2022YFC2409000), the National Natural Science Foundation of China (Grant Nos. 62177007, U20A20373, and 82202954), the China-Central Eastern European Countries High Education Joint Education Project (Grant No. 202012), the Shenzhen Science and Technology Program (Grant No. KQTD20180411185028798), and the Medium- and Long-term Technology Plan for Radio, Television, and Online Audiovisual (Grant No. ZG23011). The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset supporting the current study is available online (the PhysioNet/CinC Challenge 2016 database: https://archive.physionet.org/pn3/challenge/2016/, accessed on 12 August 2024).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CVD	Cardiovascular disease
PCG	Phonocardiogram
PCHSD2016	PhysioNet/CinC Challenge 2016 database
ML	Machine learning
DL	Deep learning
ECG	Electrocardiogram
HMM	Hidden Markov model
BP	Band pass
WT	Wavelet transform
CWT	Continuous wavelet transform
LR-HSMM	Logistic regression-based hidden semi-Markov model
FT	Fourier transform
FFT	Fast Fourier transform
DFT	Discrete Fourier transform
STFT	Short-time Fourier transform
AMDT	Average magnitude difference function
MFCC	Mel-frequency cepstral coefficient
SLT	Superlet transform
ASLT	Adaptive SLT
ACC	Accuracy
SEN	Sensitivity
SPE	Specificity
CV	Cross-validation
SVM	Support vector machine
NN	Neural network
RNN	Recurrent neural network
LSTM	Long short-term memory
CNN	Convolutional neural network
AE	Autoencoder
LLM	Large language model
LVLM	Large vision language model

References

Mensah, G.A.; Fuster, V.; Murray, C.J.L.; Roth, G.A. Global Burden of Cardiovascular Diseases and Risks Collaborators. Global burden of cardiovascular diseases and risks, 1990–2022. J. Am. Coll. Cardiol. 2023, 82, 2350–2473. [Google Scholar] [CrossRef] [PubMed]
García, M.C.; Rossen, U.M.; Matthews, K.; Guy, G.; Trivers, K.F.; Thomas, C.C.; Schieb, L.; Iademarco, M.F. Preventable premature deaths from the five leading causes of death in nonmetropolitan and metropolitan counties, United States, 2010–2022. MMWR Surveill. Summ. 2024, 73, 1–11. [Google Scholar] [CrossRef] [PubMed]
Jia, W.; Wang, Y.; Ye, J.; Li, D.; Yin, F.; Yu, J.; Chen, J.; Shu, Q.; Xu, W. ZCHSound: Open-source ZJU paediatric heart sound database with congenital heart disease. IEEE Trans. Biomed. Eng. 2024, 71, 2278–2286. [Google Scholar] [CrossRef]
Ismail, S.; Siddiqi, I.; Akram, U. Localization and classification of heart beats in phonocardiography signals A comprehensive review. EURASIP J. Adv. Signal Process. 2018, 1, 1–27. [Google Scholar] [CrossRef]
Reyna, M.A.; Kiarashi, Y.; Elola, A.; Oliveira, J.; Renna, F.; Gu, A.; Alday, E.A.P.; Sadr, N.; Sharma, A.; Kpodonu, J.; et al. Heart murmur detection from phonocardiogram recordings: The george b. moody physionet challenge 2022. PLoS Digit. Health 2023, 2, e0000324. [Google Scholar] [CrossRef] [PubMed]
Burns, J.; Ganigara, M.; Dhar, A. Application of intelligent phonocardiography in the detection of congenital heart disease in pediatric patients: A narrative review. Prog. Pediatr. Cardiol. 2022, 64, 101455. [Google Scholar] [CrossRef]
Clifford, G.D.; Liu, C.; Moody, B.; Springer, D.; Silva, I.; Li, Q.; Mark, R.G. Classification of normal/abnormal heart sound recordings: The PhysioNet/Computing in Cardiology Challenge 2016. In Proceedings of the Computing in Cardiology Conference, Vancouver, BC, Canada, 11–14 September 2016; pp. 609–612. [Google Scholar]
Zhao, Q.; Geng, S.; Wang, B.; Sun, Y.; Nie, Y.; Bai, B.; Yu, C.; Zhang, F.; Tang, G.; Zhang, D.; et al. Deep learning for heart sound analysis: A literature review. medRxiv 2017. [Google Scholar] [CrossRef]
Jiang, Z.; Choi, S. A cardiac sound characteristic waveform method for in-home heart disorder monitoring with electric stethoscope. Expert Syst. Appl. 2006, 31, 286–298. [Google Scholar] [CrossRef]
Samieinasab, M.; Sameni, R. Fetal phonocardiogram extraction using single channel blind source separation. In Proceedings of the IEEE Iranian Conference on Electrical Engineering, Tehran, Iran, 10–14 May 2015; pp. 78–83. [Google Scholar]
Yaseen, S.G.Y.; Kwon, S. Classification of heart sound signal using multiple features. Appl. Sci. 2018, 12, 2344. [Google Scholar] [CrossRef]
Dong, F.; Qian, K.; Ren, Z.; Baird, A.; Li, X.; Dai, Z.; Dong, B.; Metze, F.; Yamamoto, Y.; Schuller, B. Machine listening for heart status monitoring: Introducing and benchmarking hss—The heart sounds shenzhen corpus. IEEE J. Biomed. Health Inform. 2019, 24, 2082–2092. [Google Scholar] [CrossRef]
Bhaskaran, A.; Kumar, S.; George, S.; Arora, M. Heart rate estimation and validation algorithm for fetal phonocardiography. Physiol. Meas. 2022, 43, 075008. [Google Scholar] [CrossRef] [PubMed]
Cesarelli, M.; Ruffo, M.; Romano, M.; Bifulco, P. Simulation of foetal phonocardiographic recordings for testing of FHR extraction algorithms. Comput. Methods Programs Biomed. 2012, 107, 513–523. [Google Scholar] [CrossRef] [PubMed]
Kazemnejad, A.; Gordany, P.; Sameni, R. EPHNOGRAM: A simultaneous electrocardiogram and phonocardiogram database. PhysioNet 2021. [Google Scholar] [CrossRef]
Liu, C.; Springer, D.; Li, Q.; Moody, B.; Juan, R.A.; Chorro, F.J.; Castells, F.; Roig, J.M.; Silva, I.; Johnson, A. An open access database for the evaluation of heart sound algorithms. Physiol. Meas. 2016, 37, 2181. [Google Scholar] [CrossRef]
Adithya, P.C.; Sankar, R.; Moreno, W.A.; Hart, S. Trends in fetal monitoring through phonocardiography: Challenges and future directions. Biomed. Signal Process. Control 2017, 33, 289–305. [Google Scholar] [CrossRef]
Nabih-Ali, M.; El-Dahshan, E.S.A.; Yahia, A.S. A review of intelligent systems for heart sound signal analysis. J. Med. Eng. Technol. 2017, 41, 553–563. [Google Scholar] [CrossRef] [PubMed]
Dwivedi, A.K.; Imtiaz, S.A.; Rodriguez-Villegas, E. Algorithms for automatic analysis and classification of heart sounds–a systematic review. IEEE Access 2018, 7, 8316–8345. [Google Scholar] [CrossRef]
Ghosh, S.K.; Nagarajan, P.R.; Tripathy, R.K. Heart sound data acquisition and preprocessing techniques: A review. In Handbook of Research on Advancements of Artificial Intelligence in Healthcare Engineering; IGI Global: Hershey, PA, USA, 2020; pp. 244–264. [Google Scholar]
Kahankova, R.; Mikolasova, M.; Jaros, R.; Barnova, K.; Ladrova, M.; Martinek, R. A review of recent advances and future developments in fetal phonocardiography. IEEE Rev. Biomed. Eng. 2022, 16, 653–671. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Guo, Z.; Xu, X.; Jeon, G.; Camacho, D. Artificial intelligence for heart sound classification: A review. Expert Syst. 2017, 41, e13535. [Google Scholar] [CrossRef]
Dominguez-Morales, J.P.; Jimenez-Fernandez, A.F.; Dominguez-Morales, M.J.; Jimenez-Moreno, G. Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors. IEEE Trans. Biomed. Circuits Syst. 2017, 12, 24–34. [Google Scholar] [CrossRef]
Deperlioglu, O.; Kose, U.; Gupta, D.; Khanna, A.; Sangaiah, A.K. Diagnosis of heart diseases by a secure internet of health things system based on autoencoder deep neural network. Comput. Commun. 2020, 162, 31–50. [Google Scholar] [CrossRef] [PubMed]
Khan, K.N.; Khan, F.A.; Abid, A.; Olmez, T.; Dokur, Z.; Khandakar, A.; Chowdhury, M.E.H.; Khan, M.S. Deep learning based classification of unsegmented phonocardiogram spectrograms leveraging transfer learning. Physiol. Meas. 2021, 42, 095003. [Google Scholar] [CrossRef] [PubMed]
Abdollahpur, M.; Ghaffari, A.; Ghiasi, S.; Mollakazemi, M.J. Detection of pathological heart sounds. Physiol. Meas. 2017, 38, 1616. [Google Scholar] [CrossRef] [PubMed]
Homsi, M.N.; Warrick, P. Ensemble methods with outliers for phonocardiogram classification. Physiol. Meas. 2017, 38, 1631. [Google Scholar] [CrossRef] [PubMed]
Baydoun, M.; Safatly, L.; Ghaziri, H.; El Hajj, A. Analysis of heart sound anomalies using ensemble learning. Biomed. Signal Process. Control 2020, 62, 102019. [Google Scholar] [CrossRef]
Deperlioglu, O. Classification of segmented phonocardiograms by convolutional neural networks. Broad Res. Artif. Intell. Neurosci. 2019, 10, 5–13. [Google Scholar]
Banerjee, R.; Ghose, A. A semi-supervised approach for identifying abnormal heart sounds using variational autoencoder. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 4–8 May 2020; pp. 1249–1253. [Google Scholar]
Hazeri, H.; Zarjam, P.; Azemi, G. Classification of normal/abnormal PCG recordings using a time–frequency approach. Analog Integr. Circuits Signal Process. 2021, 109, 459–465. [Google Scholar] [CrossRef]
Azam, F.B.; Ansari, M.I.; Mclane, I.; Hasan, T. Heart sound classification considering additive noise and convolutional distortion. arXiv 2021, arXiv:2106.01865. [Google Scholar]
Prince, J.; Maidens, J.; Kieu, S.; Currie, C.; Barbosa, D.; Hitchcock, C.; Saltman, A.; Norozi, K.; Wiesner, P.; Slamon, N.; et al. Deep learning algorithms to detect murmurs associated with structural heart disease. J. Am. Heart Assoc. 2023, 12, e030377. [Google Scholar] [CrossRef]
Banerjee, R.; Choudhury, A.D.; Deshpande, P.; Bhattacharya, S.; Pal, A.; Mandana, K.M. A robust dataset-agnostic heart disease classifier from phonocardiogram. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Jeju, Republic of Korea, 11–15 July 2017; pp. 4582–4585. [Google Scholar]
Saeedi, A.; Moridani, M.K.; Azizi, A. An innovative method for cardiovascular disease detection based on nonlinear geometric features and feature reduction combination. Intell. Decis. Technol. 2021, 15, 45–57. [Google Scholar] [CrossRef]
Bopaiah, J.; Kavuluru, R. Precision/recall trade-off analysis in abnormal/normal heart sound classification. In Big Data Analytics; Springer: Cham, Switerland, 2017; pp. 179–194. [Google Scholar]
van der Westhuizen, J.; Lasenby, J. Bayesian LSTMs in medicine. arXiv 2017, arXiv:1706.01242. [Google Scholar]
Li, L.; Wang, X.; Du, X.; Liu, Y.; Liu, C.; Qin, C.; Li, Y. Classification of heart sound signals with BP neural network and logistic regression. In Proceedings of the 2017 Chinese Automation Congress, Jinan, China, 20–22 October 2017; pp. 7380–7383. [Google Scholar]
Zhu, L.; Qiu, W.; Ma, Y.; Tian, F.; Sun, M.; Wang, Z.; Qian, K.; Hu, B.; Yamamoto, Y.; Schuller, B.W. LEPCNet: A lightweight end-to-end PCG classification neural network model for wearable devices. IEEE Trans. Instrum. Meas. 2023, 73, 2511111. [Google Scholar] [CrossRef]
Poirè, A.M.; Simonetta, F.; Ntalampiras, S. Deep feature learning for medical acoustics. In International Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2022; pp. 39–50. [Google Scholar]
Bourouhou, A.; Jilbab, A.; Nacir, C.; Hammouch, A. Heart sounds classification for a medical diagnostic assistance. Int. Assoc. Online Eng. 2019, 15, 88–103. [Google Scholar] [CrossRef]
Bai, Z.; Yan, B.; Chen, X.; Wu, Y.; Wang, P. Murmur detection and clinical outcome classification using a VGG-like network and combined time-frequency representations of PCG signals. In Proceedings of the 2022 Computing in Cardiology (CinC), Tampere, Finland, 4–7 September 2022; Volume 498, pp. 1–4. [Google Scholar]
Hussain, S.S.; Ashfaq, M.; Khan, M.S.; Anwar, S. Deep learning based phonocardiogram signals analysis for cardiovascular abnormalities detection. In Proceedings of the International Conference on Robotics and Automation in Industry, Peshawar, Pakistan, 3–5 March 2023; pp. 1–6. [Google Scholar]
Antink, C.H.; Becker, J.; Leonhardt, S.; Walter, M. Nonnegative matrix factorization and random forest for classification of heart sound recordings in the spectral domain. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 809–812. [Google Scholar]
Wang, X.; Li, Y. Improving classification accuracy of heart sound recordings by wavelet filter and multiple features. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 1149–1152. [Google Scholar]
Aslan, S.; Arica, S. Categorization of Normal and Abnormal Heart Rhythms from Phonocardiogram Signals. In Proceedings of the 2019 Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, 3–5 October 2019; pp. 1–3. [Google Scholar]
Bracke, J. Classifying Recorded Heart Sounds: A Data Mining Case Study. Master’s Thesis, Universiteit Gent, Gent, Belgium, 2019. [Google Scholar]
Chen, P.; Zhang, Q. Classification of heart sounds using discrete time-frequency energy feature based on S transform and the wavelet threshold denoising. Biomed. Signal Process. Control 2020, 57, 101684. [Google Scholar] [CrossRef]
Jerri, A.J. The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proc. IEEE 1977, 65, 1565–1596. [Google Scholar] [CrossRef]
Milani, M.; Abas, P.E.; De Silva, L.C.; Nanayakkara, N.D. Abnormal heart sound classification using phonocardiography signals. Smart Health 2021, 21, 100194. [Google Scholar] [CrossRef]
Riccio, D.; Brancati, N.; Sannino, G.; Verde, L.; Frucci, M. CNN-based classification of phonocardiograms using fractal techniques. Biomed. Signal Process. Control 2023, 86, 105186. [Google Scholar] [CrossRef]
Jeong, Y.; Kim, J.; Kim, D.; Kim, J.; Lee, K. Methods for improving deep learning-based cardiac auscultation accuracy: Data augmentation and data generalization. Appl. Sci. 2021, 11, 4544. [Google Scholar] [CrossRef]
Singh, K.K.; Singh, S.S. An Artificial Intelligence based mobile solution for early detection of valvular heart diseases. In Proceedings of the 2019 IEEE International Conference on Electronics, Computing and Communication Technologies, Bangalore, India, 26–27 July 2019; pp. 1–5. [Google Scholar]
Qiu, W.; Qian, K.; Wang, Z.; Chang, Y.; Bao, Z.; Hu, B.; Schuller, B.W.; Yamamoto, Y. A federated learning paradigm for heart sound classification. In Proceedings of the 2022 Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Glasgow, UK, 11–15 July 2022; pp. 1045–1048. [Google Scholar]
Singh, S.A.; Majumder, S. Short unsegmented PCG classification based on ensemble classifier. Turk. J. Electr. Eng. Comput. Sci. 2020, 28, 875–889. [Google Scholar] [CrossRef]
Zhu, C.; Zhao, Z.; Tan, Y.; Sun, M.; Qian, K.; Jiang, T.; Hu, B.; Schuller, B.W.; Yamamoto, Y. Less is More: A Novel Feature Extraction Method for Heart Sound Classification via Fractal Transformation. In Proceedings of the 2023 Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Sydney, Australia, 24–27 July 2023; pp. 1–4. [Google Scholar]
Karhade, J.; Dash, S.; Ghosh, S.K.; Dash, D.K.; Tripathy, R.K. Time–frequency-domain deep learning framework for the automated detection of heart valve disorders using PCG signals. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Maknickas, V.; Maknickas, A. Recognition of normal–abnormal phonocardiographic signals using deep convolutional neural networks and mel-frequency spectral coefficients. Physiol. Meas. 2017, 38, 1671. [Google Scholar] [CrossRef] [PubMed]
Bhardwaj, A.; Singh, S.; Joshi, D. Explainable deep convolutional neural network for valvular heart diseases classification using pcg signals. IEEE Trans. Instrum. Meas. 2023, 72, 1–15. [Google Scholar] [CrossRef]
Zhang, W.; Han, J.; Deng, S. Abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation. Biomed. Signal Process. Control 2019, 53, 101560. [Google Scholar] [CrossRef]
Hu, J.; Lv, S.; Jie, R.; Ouyang, Y.; He, J. A Cardiac Audio Classification Method Based on Multidimensional Feature Expression. 2024. Available online: https://www.researchsquare.com/article/rs-3958573/v1 (accessed on 16 July 2024).
Potes, C.; Parvaneh, S.; Rahman, A.; Conroy, B. Ensemble of feature-based and deep learning-based classifiers for detection of abnormal heart sounds. In Proceedings of the 2016 computing in cardiology conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 621–624. [Google Scholar]
Li, S.; Li, F.; Tang, S.; Luo, F. Heart sounds classification based on feature fusion using lightweight neural networks. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
Wang, J.; Zang, J.; An, Q.; Wang, H.; Zhang, Z. A pooling convolution model for multi-classification of ECG and PCG signals. Comput. Methods Biomech. Biomed. Eng. 2023, 1–14. [Google Scholar] [CrossRef]
Chen, J.; Guo, Z.; Xu, X.; Zhang, L.B.; Teng, Y.; Chen, Y.; Woźniak, M.; Wang, W. A robust deep learning framework based on spectrograms for heart sound classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 21, 936–947. [Google Scholar] [CrossRef] [PubMed]
Soares, E.; Angelov, P.; Gu, X. Autonomous learning multiple-model zero-order classifier for heart sound classification. Appl. Soft Comput. 2020, 94, 106449. [Google Scholar] [CrossRef]
Noman, F.M.; Salleh, S.-H.; Ting, C.-M.; Samdin, S.B.; Ombao, H.; Hussain, H. A Markov-switching model approach to heart sound segmentation and classification. IEEE J. Biomed. Health Inform. 2019, 24, 705–716. [Google Scholar] [CrossRef]
Guo, Z.; Chen, J.; He, T.; Wang, W.; Abbas, H.; Lv, Z. DS-CNN: Dual-stream convolutional neural networks based heart sound classification for wearable devices. IEEE Trans. Consum. Electron. 2023, 69, 1186–1194. [Google Scholar] [CrossRef]
Alshamma, O.; Awad, F.H.; Alzubaidi, L.; Fadhel, M.A.; Arkah, Z.M.; Farhan, L. Employment of multi-classifier and multi-domain features for PCG recognition. In Proceedings of the 2019 12th International Conference on Developments in eSystems Engineering, Kazan, Russia, 7–10 October 2019; pp. 321–325. [Google Scholar]
Singh, S.A.; Singh, S.A.; Singh, A.D. Enhancing Imbalanced Heart Sound Classification through Transfer Learning and Gammatonegram Image Analysis. 2023. Available online: https://www.researchsquare.com/article/rs-3530451/v1 (accessed on 16 July 2024). [CrossRef]
Humayun, A.I.; Ghaffarzadegan, S.; Ansari, I.; Feng, Z.; Hasan, T. Towards domain invariant heart sound abnormality detection using learnable filterbanks. IEEE J. Biomed. Health Inform. 2020, 24, 2189–2198. [Google Scholar] [CrossRef]
Zeng, W.; Yuan, J.; Yuan, C.; Wang, Q.; Liu, F.; Wang, Y. A new approach for the detection of abnormal heart sound signals using TQWT, VMD and neural networks. Artif. Intell. Rev. 2021, 54, 1613–1647. [Google Scholar] [CrossRef]
Fakhry, M.; Brery, A.F.; Gallardo-Antolin, A. Analysis of Heart Sound Signals using Sparse Modeling with Gabor Dictionary. In Proceedings of the IEEE International Symposium on Multimedia, Naples, Italy, 5–7 December 2022; pp. 92–96. [Google Scholar]
Pasha, S.; Lundgren, J.; Carratù, M.; Wreeby, P.; Liguori, C. Two-stage artificial intelligence clinical decision support system for cardiovascular assessment using convolutional neural networks and decision trees. In Proceedings of the 13th International Conference on Bio-Inspired Systems and Signal Processing, BIOSIGNALS 2020-Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020, Valletta, Malta, 24–26 February 2020; pp. 199–205. [Google Scholar]
Chowdhury, M.E.; Khandakar, A.; Alzoubi, K.; Mansoor, S.; Tahir, A.M.; Reaz, M.B.I.; Al-Emadi, N. Real-time smart-digital stethoscope system for heart diseases monitoring. Sensors 2019, 19, 2781. [Google Scholar] [CrossRef] [PubMed]
Abduh, Z.; Nehary, E.A.; Wahed, M.A.; Kadah, Y.M. Classification of heart sounds using fractional fourier transform based mel-frequency spectral coefficients and traditional classifiers. Biomed. Signal Process. Control 2020, 57, 101788. [Google Scholar] [CrossRef]
Khade, P.J.; Mane, P.; Mahore, S.; Bhole, K. Machine learning approach for prediction of aortic and mitral regurgitation based on phonocardiogram signal. In Proceedings of the International Conference on Computing Communication and Networking Technologies, Kharagpur, India, 6–8 July 2021; pp. 1–5. [Google Scholar]
Abdollahpur, M.; Ghiasi, S.; Mollakazemi, M.J.; Ghaffari, A. Cycle selection and neuro-voting system for classifying heart sound recordings. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 1–4. [Google Scholar]
Gündüz, A.F.; Karci, A. Heart sound classification for murmur abnormality detection using an ensemble approach based on traditional classifiers and feature sets. Comput. Sci. 2022, 5, 1–13. [Google Scholar]
Wu, J.M.-T.; Tsai, M.-H.; Huang, Y.Z.; Islam, S.H.; Hassan, M.M.; Alelaiwi, A.; Fortino, G. Applying an ensemble convolutional neural network with Savitzky–Golay filter to construct a phonocardiogram prediction model. Appl. Soft Comput. 2019, 78, 29–40. [Google Scholar] [CrossRef]
Wang, X.; Liu, C.; Li, Y.; Cheng, X.; Li, J.; Clifford, G.D. Temporal-framing adaptive network for heart sound segmentation without prior knowledge of state duration. IEEE Trans. Biomed. Eng. 2020, 68, 650–663. [Google Scholar] [CrossRef] [PubMed]
Ansari, R. Elliptic filter design for a class of generalized halfband filter. IEEE Trans. Acoust. Speech Signal Process. 1985, 33, 1146–1150. [Google Scholar] [CrossRef]
Savitzky, A.; Golay, M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Abd El-Fattah, M.; Dessouky, M.I.; Diab, S.; Abd El-Samie, F. Speech enhancement using an adaptive wiener filtering approach. Prog. Electromagn. Res. M 2008, 4, 167–184. [Google Scholar] [CrossRef]
Alkhodari, M.; Fraiwan, L. Convolutional and recurrent neural networks for the detection of valvular heart diseases in phonocardiogram recordings. Comput. Methods Programs Biomed. 2021, 200, 105940. [Google Scholar] [CrossRef]
Xu, C.; Zhou, J.; Li, L.; Wang, J.; Ying, D.; Li, Q. Heart sound segmentation based on SMGU-RNN. In Proceedings of the BIBE 2019; The Third International Conference on Biological Information and Biomedical Engineering, Hangzhou, China, 20–22 June 2019; pp. 1–7. [Google Scholar]
He, R.; Zhang, H.; Wang, K.; Li, Q.; Sheng, Z.; Zhao, N. Classification of heart sound signals based on AR model. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 605–608. [Google Scholar]
Kumar, S.S.; Vijayalakshmi, K. Coronary artery disease detection from pcg signals using time domain based automutual information and spectral features. In Proceedings of the International Conference on Computing, Electronics & Communications Engineering, Southend, UK, 17–18 August 2020; pp. 69–74. [Google Scholar]
Bhupalam, M.; Manthoor, H.C.R.; Thalengala, A. Classification of Cardiovascular Diseases using PCG. In Proceedings of the International Conference on Modeling, Simulation & Intelligent Computing, Dubai, United Arab Emirates, 7–9 December 2023; pp. 129–133. [Google Scholar]
Zeng, W.; Lin, Z.; Yuan, C.; Wang, Q.; Liu, F.; Wang, Y. A new learning and classification framework for the detection of abnormal heart sound signals using hybrid signal processing and neural networks. In Proceedings of the 2020 39th Chinese Control Conference, Shenyang, China, 27–29 July 2020; pp. 6363–6368. [Google Scholar]
Stainton, S.; Tsimenidis, C.; Murray, A. Characteristics of phonocardiography waveforms that influence automatic feature recognition. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 1173–1176. [Google Scholar]
Kamson, A.P.; Sharma, L.N.; Dandapat, S. Multi-centroid diastolic duration distribution based HSMM for heart sound segmentation. Biomed. Signal Process. Control 2019, 48, 265–272. [Google Scholar] [CrossRef]
Tanmay, A. Lonare and Mrinal Rahul Bachute. Speech denoising using wavelet transform. IOSR J. Vlsi Signal Process. 2016, 6, 36–41. [Google Scholar]
Johnson, M.T.; Yuan, X.; Ren, Y. Speech signal enhancement through adaptive wavelet thresholding. Speech Commun. 2007, 49, 123–133. [Google Scholar] [CrossRef]
Al-Naami, B.; Fraihat, H.; Al-Nabulsi, J.; Gharaibeh, N.Y.; Visconti, P.; Al-Hinnawi, A.R. Assessment of dual-tree complex wavelet transform to improve SNR in collaboration with neuro-fuzzy system for heart-sound identification. Electronics 2022, 11, 938. [Google Scholar] [CrossRef]
Schmidt, S.E.; Holst-Hansen, C.; Graff, C.; Toft, E.; Struijk, J.J. Segmentation of heart sound recordings by a duration-dependent hidden Markov model. Physiol. Meas. 2010, 31, 513. [Google Scholar] [CrossRef] [PubMed]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Gjoreski, M.; Gradisek, A.; Budna, B.; Gams, M.; Poglajen, G. Machine learning and end-to-end deep learning for the detection of chronic heart failure from heart sounds. IEEE Access 2020, 8, 20313–20324. [Google Scholar] [CrossRef]
Ren, Z.; Cummins, N.; Pandit, V.; Han, J.; Qian, K.; Schuller, B. Learning image-based representations for heart sound classification. In Proceedings of the 2018 International Conference on Digital Health, Lyon, France, 23–26 April 2018; pp. 143–147. [Google Scholar]
Qiao, L.; Gao, Y.; Xiao, B.; Bi, X.; Li, W.; Gao, X. HS-Vectors: Heart sound embeddings for abnormal heart sound detection based on time-compressed and frequency-expanded TDNN with dynamic mask encoder. IEEE J. Biomed. Health Inform. 2022, 27, 1364–1374. [Google Scholar] [CrossRef] [PubMed]
Bozkurt, B.; Germanakis, I.; Stylianou, Y. A study of time-frequency features for CNN-based automatic heart sound classification for pathology detection. Comput. Biol. Med. 2018, 100, 132–143. [Google Scholar] [CrossRef]
Latif, S.; Usman, M.; Rana, R.; Qadir, J. Phonocardiographic sensing using deep learning for abnormal heartbeat detection. IEEE Sens. J. 2018, 18, 9393–9400. [Google Scholar] [CrossRef]
Nogueira, D.M.; Ferreira, C.A.; Gomes, E.F.; Jorge, A.M. Classifying heart sounds using images of motifs, MFCC and temporal features. J. Med. Syst. 2019, 43, 168. [Google Scholar] [CrossRef] [PubMed]
Khan, F.A.; Abid, A.; Khan, M.S. Automatic heart sound classification from segmented/unsegmented phonocardiogram signals using time and frequency features. Physiol. Meas. 2020, 41, 055006. [Google Scholar] [CrossRef] [PubMed]
Maddikunta, L.; Menta, M.; Pathri, D.; Swapna, C. Detection of chronic heart failure using ML & DL. Mater. Sci. 2019, 18, 95–108. [Google Scholar]
Whitaker, B.M.; Suresha, P.B.; Liu, C.; Clifford, G.D.; Anderson, D.V. Combining sparse coding and time-domain features for heart sound classification. Physiol. Meas. 2017, 38, 1701. [Google Scholar] [CrossRef] [PubMed]
Nogueira, D.M.; Zarmehri, M.N.; Ferreira, C.A.; Jorge, A.M.; Antunes, L. Heart sounds classification using images from wavelet transformation. In Progress in Artificial Intelligence: 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, 3–6 September 2019, Proceedings, Part I 19; Springer: Berlin/Heidelberg, Germany, 2019; pp. 311–322. [Google Scholar]
Rubin, J.; Abreu, R.; Ganguli, A.; Nelaturi, S.; Matei, I.; Sricharan, K. Recognizing abnormal heart sounds using deep learning. arXiv 2017, arXiv:1707.04642. [Google Scholar]
Kay, E.; Agarwal, A. DropConnected neural networks trained on time-frequency and inter-beat features for classifying heart sounds. Physiol. Meas. 2017, 38, 1645. [Google Scholar] [CrossRef] [PubMed]
Han, W.; Yang, Z.; Lu, J.; Xie, S. Supervised threshold-based heart sound classification algorithm. Physiol. Meas. 2018, 39, 115011. [Google Scholar] [CrossRef] [PubMed]
Han, W.; Xie, S.; Yang, Z.; Zhou, S.; Huang, H. Heart sound classification using the SNMFNet classifier. Physiol. Meas. 2019, 40, 105003. [Google Scholar] [CrossRef]
Bobillo, I.J.D. A tensor approach to heart sound classification. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 629–632. [Google Scholar]
Her, H.-L.; Chiu, H.-W. Using time-frequency features to recognize abnormal heart sounds. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 1145–1147. [Google Scholar]
Li, F.; Tang, H.; Shang, S.; Mathiak, K.; Cong, F. Classification of heart sounds using convolutional neural network. Appl. Sci. 2020, 10, 3956. [Google Scholar] [CrossRef]
Ukil, A.; Roy, U.K. Smart cardiac health management in IoT through heart sound signal analytics and robust noise filtering. In Proceedings of the 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, Canada, 8–13 October 2017; pp. 1–5. [Google Scholar]
Banerjee, R.; Biswas, S.; Banerjee, S.; Choudhury, A.D.; Chattopadhyay, T.; Pal, A.; Mandana, K.M.; Deshpande, P.; Mandana, K.M. Time-frequency analysis of phonocardiogram for classifying heart disease. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 573–576. [Google Scholar]
Duggento, A.; Conti, A.; Guerrisi, M.; Toschi, N. A novel multi-branch architecture for state of the art robust detection of pathological phonocardiograms. Philos. Trans. R. Soc. A 2021, 379, 20200264. [Google Scholar] [CrossRef]
Guo, Y.; Yang, H.; Guo, T.; Pan, J.; Wang, W. A novel heart sound segmentation algorithm via multi-feature input and neural network with attention mechanism. Biomed. Phys. Eng. Express 2022, 9, 015012. [Google Scholar] [CrossRef]
Das, S.; Pal, S.; Mitra, M. Acoustic feature based unsupervised approach of heart sound event detection. Comput. Biol. Med. 2020, 126, 103990. [Google Scholar] [CrossRef]
Renna, F.; Oliveira, J.H.; Coimbra, M.T. Deep convolutional neural networks for heart sound segmentation. IEEE J. Biomed. Health Inform. 2019, 23, 2435–2445. [Google Scholar] [CrossRef]
Das, S.; Pal, S.; Mitra, M. Automated fundamental heart sound detection using spectral clustering technique. In Proceedings of the 2017 IEEE Calcutta Conference, Kolkata, India, 2–3 December 2017; pp. 264–267. [Google Scholar]
Messner, E.; Zohrer, M.; Pernkopf, F. Heart sound segmentation—An event detection approach using deep recurrent neural networks. IEEE Trans. Biomed. Eng. 2018, 65, 1964–1974. [Google Scholar] [CrossRef] [PubMed]
Babu, K.A.; Ramkumar, B. Automatic recognition of fundamental heart sound segments from PCG corrupted with lung sounds and speech. IEEE Access 2020, 8, 179983–179994. [Google Scholar] [CrossRef]
Chowdhury, T.H.; Poudel, K.N.; Hu, Y. Time-frequency analysis, denoising, compression, segmentation, and classification of PCG signals. IEEE Access 2020, 8, 160882–160890. [Google Scholar] [CrossRef]
Boulares, M.; Alotaibi, R.; AlMansour, A.; Barnawi, A. Cardiovascular disease recognition based on heartbeat segmentation and selection process. Int. J. Environ. Res. Public Health 2021, 18, 10952. [Google Scholar] [CrossRef] [PubMed]
Plesinger, F.; Viscor, I.; Halamek, J.; Jurco, J.; Jurak, P. Heart sounds analysis using probability assessment. Physiol. Meas. 2017, 38, 1685. [Google Scholar] [CrossRef]
Choi, S.; Jiang, Z. Comparison of envelope extraction algorithms for cardiac sound signal segmentation. Expert Syst. Appl. 2008, 34, 1056–1069. [Google Scholar] [CrossRef]
Hettiarachchi, R.; Haputhanthri, U.; Herath, K.; Kariyawasam, H.; Munasinghe, S.; Wickramasinghe, K.; Samarasinghe, D.; De Silva, A.; Edussooriya, C.U. A novel transfer learning-based approach for screening pre-existing heart diseases using synchronized ecg signals and heart sounds. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; pp. 1–5. [Google Scholar]
Varghees, V.N.; Ramachandran, K.I. Effective heart sound segmentation and murmur classification using empirical wavelet transform and instantaneous phase for electronic stethoscope. IEEE Sens. J. 2017, 17, 3861–3872. [Google Scholar] [CrossRef]
Sawant, N.K.; Patidar, S.; Nesaragi, N.; Acharya, U.R. Automated detection of abnormal heart sound signals using Fano-factor constrained tunable quality wavelet transform. Biocybern. Biomed. Eng. 2021, 41, 111–126. [Google Scholar] [CrossRef]
Narváez, P.; Gutierrez, S.; Percybrooks, W.S. Automatic segmentation and classification of heart sounds using modified empirical wavelet transform and power features. Appl. Sci. 2020, 10, 4791. [Google Scholar] [CrossRef]
Prasad, R.; Yilmaz, G.; Chetelat, O.; Doss, M.M. Detection of S1 and S2 locations in phonocardiogram signals using zero frequency filter. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 4–8 May 2020; pp. 1254–1258. [Google Scholar]
Babu, K.A.; Ramkumar, B.; Manikandan, M.S. S1 and S2 heart sound segmentation using variational mode decomposition. In Proceedings of the TENCON 2017-2017 IEEE Region 10 Conference, Penang, Malaysia, 5–8 November 2017; pp. 1629–1634. [Google Scholar]
Talal, M.; Aziz, S.; Khan, M.U.; Ghadi, Y.; Naqvi, S.Z.H.; Faraz, M. Machine learning-based classification of multiple heart disorders from PCG signals. Expert Syst. 2023, 40, e13411. [Google Scholar] [CrossRef]
Gamero, L.G.; Watrous, R. Detection of the first and second heart sound using probabilistic models. In Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Cancun, Mexico, 17–21 September 2003; Volume 3, pp. 2877–2880. [Google Scholar]
Springer, D.B.; Tarassenko, L.; Clifford, G.D. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans. Biomed. Eng. 2015, 63, 822–832. [Google Scholar] [CrossRef] [PubMed]
Yu, S.Z. Hidden semi-Markov models. Artif. Intell. 2010, 174, 215–243. [Google Scholar] [CrossRef]
Xu, X.; Geng, X.; Gao, Z.; Yang, H.; Dai, Z.; Zhang, H. Optimal heart sound segmentation algorithm based on k-mean clustering and wavelet transform. Appl. Sci. 2023, 13, 1170. [Google Scholar] [CrossRef]
Mehrish, A.; Majumder, N.; Bharadwaj, R.; Mihalcea, R.; Poria, S. A review of deep learning techniques for speech processing. Inf. Fusion 2023, 99, 101869. [Google Scholar] [CrossRef]
Zou, L.; Yu, S.; Meng, T.; Zhang, Z.; Liang, X.; Xie, Y. A technical review of convolutional neural network-based mammographic breast cancer diagnosis. Comput. Math. Methods Med. 2019, 1, 6509357. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Enériz, D.; Rodriguez-Almeida, A.J.; Fabelo, H.; Ortega, S.; Balea-Fernandez, F.J.; Callico, G.M.; Medrano, N.; Calvo, B. Low-cost FPGA implementation of deep learning-based heart sound segmentation for real-time CVDs screening. IEEE Trans. Instrum. Meas. 2024, 73, 1–16. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Guan, T.; Xu, D.; Cai, S.; Hu, N. A Deep-Learning-based Cardiac Sound Segmentation Method for Smart Auscultation Applications. In Proceedings of the International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering, Hangzhou, China, 25–27 August 2023; pp. 171–175. [Google Scholar]
Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 2, pp. 207–212. [Google Scholar]
Fernando, T.; Ghaemmaghami, H.; Denman, S.; Sridharan, S.; Hussain, N.; Fookes, C. Heart sound segmentation using bidirectional LSTMs with attention. IEEE J. Biomed. Health Inform. 2019, 24, 1601–1609. [Google Scholar] [CrossRef] [PubMed]
Chua, K.C.; Chandran, V.; Acharya, U.R.; Lim, C.M. Application of higher order statistics/spectra in biomedical signals—A review. Med. Eng. Phys. 2010, 32, 679–689. [Google Scholar] [CrossRef] [PubMed]
Eyben, F.; Wöllmer, M.; Schuller, B. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Italy, 25–29 October 2010; pp. 1459–1462. [Google Scholar]
Guven, M.; Uysal, F. A new method for heart disease detection: Long short-term feature extraction from heart sound data. Sensors 2023, 23, 5835. [Google Scholar] [CrossRef] [PubMed]
Nassralla, M.; El Zein, Z.; Hajj, H. Classification of normal and abnormal heart sounds. In Proceedings of the 2017 Fourth International Conference on Advances in Biomedical Engineering, Beirut, Lebanon, 19–21 October 2017; pp. 1–4. [Google Scholar]
Taipalmaa, J.; Zabihi, M.; Kiranyaz, S.; Gabbouj, M. Feature-Based Cardiac Cycle Segmentation in Phonocardiogram Recordings. In Proceedings of the 2018 Computing in Cardiology Conference (CinC), Maastricht, The Netherlands, 23–26 September 2018; Volume 45, pp. 1–4. [Google Scholar]
Tong, Y.; Yu, L.; Li, S.; Liu, J.; Qin, H.; Li, W. Polynomial fitting algorithm based on neural network. ASP Trans. Pattern Recognit. Intell. Syst. 2021, 1, 32–39. [Google Scholar] [CrossRef]
Tang, H.; Dai, Z.; Jiang, Y.; Li, T.; Liu, C. PCG classification using multidomain features and SVM classifier. BioMed Res. Int. 2018, 2018, 4205027. [Google Scholar] [CrossRef] [PubMed]
Langley, P.; Murray, A. Heart sound classification from unsegmented phonocardiograms. Physiol. Meas. 2017, 38, 1658. [Google Scholar] [CrossRef] [PubMed]
Barnawi, A.; Boulares, M.; Somai, R. Simple and powerful PCG classification method based on selection and transfer learning for precision medicine application. Bioengineering 2023, 10, 294. [Google Scholar] [CrossRef] [PubMed]
Morshed, M.; Fattah, S.A.; Saquib, M. Automated heart valve disorder detection based on PDF modeling of formant variation pattern in PCG signal. IEEE Access 2022, 10, 27330–27342. [Google Scholar] [CrossRef]
Rath, A.; Mishra, D.; Panda, G.; Pal, M. Development and assessment of machine learning based heart disease detection using imbalanced heart sound signal. Biomed. Signal Process. Control 2022, 76, 103730. [Google Scholar] [CrossRef]
Lee, J.A.; Kwak, K.C. Heart sound classification using wavelet analysis approaches and ensemble of deep learning models. Appl. Sci. 2023, 13, 11942. [Google Scholar] [CrossRef]
Ibarra-Hernández, R.F.; Alonso-Arévalo, M.Á; García-Canseco, E.D.C. Comparison of spectral and sparse feature extraction methods for heart sounds classification. Rev. Mex. Ing. Bioméd. 2023, 44, 6–22. [Google Scholar] [CrossRef]
Dhar, P.; Dutta, S.; Mukherjee, V. Cross-wavelet assisted convolution neural network (AlexNet) approach for phonocardiogram signals classification. Biomed. Signal Process. Control 2021, 63, 102142. [Google Scholar] [CrossRef]
Satyasai, B.; Sharma, R.; Bansal, M. A Gammatonegram based Abnormality Detection in PCG Signals using CNN. In Proceedings of the International conference on Artificial Intelligence and Signal Processing, Vijayawada, India, 18–20 March 2023; pp. 1–5. [Google Scholar]
Qiao, L.; Li, Z.; Xiao, B.; Shu, Y.; Wang, L.; Shi, Y.; Li, W.; Gao, X. QDRJL: Quaternion dynamic representation with joint learning neural network for heart sound signal abnormal detection. Neurocomputing 2023, 652, 126889. [Google Scholar] [CrossRef]
Khaled, S.; Fakhry, M.; Mubarak, A.S. Classification of pcg signals using a nonlinear autoregressive network with exogenous inputs (narx). In Proceedings of the International Conference on Innovative Trends in Communication and Computer Engineering, Aswan, Egypt, 8–9 February 2020; pp. 98–102. [Google Scholar]
Deng, M.; Meng, T.; Cao, J.; Wang, S.; Zhang, J.; Fan, H. Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Netw. 2020, 130, 22–32. [Google Scholar] [CrossRef] [PubMed]
Ross, M.; Shaffer, H.; Cohen, A.; Freudberg, R.; Manley, H. Average magnitude difference function pitch extractor. IEEE Trans. Acoust. Speech Signal Process. 1974, 22, 353–362. [Google Scholar] [CrossRef]
Arnold, M.; Milner, X.; Witte, H.; Bauer, R.; Braun, C. Adaptive AR modeling of nonstationary time series by means of Kalman filtering. IEEE Trans. Biomed. Eng. 1998, 45, 553–562. [Google Scholar] [CrossRef] [PubMed]
Liu, A.; Zhang, S.; Wang, Z.; Tang, Y.; Zhang, X.; Wang, Y. A learnable front-end based efficient channel attention network for heart sound classification. Physiol. Meas. 2023, 44, 095003. [Google Scholar] [CrossRef] [PubMed]
Maity, A.; Pathak, A.; Saha, G. Transfer learning based heart valve disease classification from Phonocardiogram signal. Biomed. Signal Process. Control 2023, 85, 104805. [Google Scholar] [CrossRef]
Kurada, S. A Customized Machine Learning Pipeline to Build State-of-the-Art Audio Classifiers. 2019. Available online: https://cjsjournal.org/2019-cjsj (accessed on 16 July 2024).
Qiu, W.; Quan, C.; Zhu, L.; Yu, Y.; Wang, Z.; Ma, Y.; Sun, M.; Chang, Y.; Qian, K.; Hu, B.; et al. Heart sound abnormality detection from multi-institutional collaboration: Introducing a federated learning framework. IEEE Trans. Biomed. Eng. 2024. [Google Scholar] [CrossRef]
Huang, J.; Chen, B.; Yao, B.; He, W. ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network. IEEE Access 2019, 7, 92871–92880. [Google Scholar] [CrossRef]
Aguiar-Conraria, L.; Soares, M.J. The continuous wavelet transform: Moving beyond uni- and bivariate analysis. J. Econ. Surv. 2014, 28, 344–375. [Google Scholar] [CrossRef]
Chakir, F.; Jilbab, A.; Nacir, C.; Hammouch, A. Recognition of cardiac abnormalities from synchronized ECG and PCG signals. Phys. Eng. Sci. Med. 2020, 43, 673–677. [Google Scholar] [CrossRef] [PubMed]
Ajitkumar Singh, S.; Ashinikumar Singh, S.; Dinita Devi, N.; Majumder, S. Heart abnormality classification using PCG and ECG recordings. Comput. Sist. 2021, 25, 381–391. [Google Scholar]
Li, J.; Ke, L.; Du, Q.; Chen, X.; Ding, X. Multi-modal cardiac function signals classification algorithm based on improved DS evidence theory. Biomed. Signal Process. Control 2022, 71, 103078. [Google Scholar] [CrossRef]
Huang, Q.; Yang, H.; Zeng, E.; Chen, Y. A deep-learning-based multi-modal ECG and PCG processing framework for cardiac analysis. TechRxiv 2023. [Google Scholar] [CrossRef]
Singhal, S.; Kumar, M. Cardiovascular diseases classification using high-resolution superlet transform on ECG and PCG signals. In Proceedings of the International Conference on Computing Communication and Networking Technologies, Delhi, India, 6–8 July 2023; pp. 1–5. [Google Scholar]
Vieira, H. Multimodal deep learning for heart sound and electrocardiogram classification. Master’s Thesis, Universidade do Porto, Porto, Portugal, 2023. [Google Scholar]
Moca, V.V.; Bârzan, H.; Nagy-Dăbâcan, A.; Mureșan, R.C. Time-frequency super-resolution with superlets. Nat. Commun. 2021, 12, 337. [Google Scholar] [CrossRef] [PubMed]
Khaled, S.; Fakhry, M.; Esmail, H.; Ezzat, A.; Hamad, E. Analysis of training optimization algorithms in the NARX neural network for classification of heart sound signals. Int. J. Sci. Eng. Res. 2022, 13, 382–390. [Google Scholar]
Yoon, H.-j. Classificatin of normal and abnormal heart sounds using neural network. J. Converg. Inf. Technol. 2018, 8, 131–135. [Google Scholar]
Adepu, A.; Jain, A.; Soundarya, B.N.K.L.S.; Naidu, G.; Rishitha, G. Heart failure detection using sound signals. NeuroQuantology 2022, 20, 870–878. [Google Scholar]
Parmar, A.; Katariya, R.; Patel, V. A review on random forest: An ensemble classifier. In International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018; Springer: Cham, Switerland, 2019; pp. 758–763. [Google Scholar]
Shuvo, S.B.; Ali, S.N.; Swapnil, S.I.; Al-Rakhami, M.S.; Gumaei, A. CardioXNet: A novel lightweight deep learning framework for cardiovascular disease classification using heart sound recordings. IEEE Access 2021, 9, 36955–36967. [Google Scholar] [CrossRef]
Li, P.; Hu, Y.; Liu, Z.-P. Prediction of cardiovascular diseases by integrating multi-modal features with machine learning methods. Biomed. Signal Process. Control 2021, 66, 102474. [Google Scholar] [CrossRef]
Zhu, J.Y.; Liu, H.; Liu, X.W. Cardiovascular disease detection based on multi-modal data fusion and multi-branch residual network. In Proceedings of the 2023 International Conference on Frontiers of Artificial Intelligence and Machine Learning, Online, 28–30 April 2023; pp. 25–28. [Google Scholar]
Morshed, M.; Fattah, S.A. A deep neural network for heart valve defect classification from synchronously recorded ECG and PCG. IEEE Sens. Lett. 2023, 7, 1–4. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, P.; Lin, F.; Chao, L.; Wang, Z.; Ma, F.; Li, Q. Co-learning-assisted progressive dense fusion network for cardiovascular disease detection using ECG and PCG signals. Expert Syst. Appl. 2024, 238, 122144. [Google Scholar] [CrossRef]
Li, J.; Ke, L.; Du, Q.; Ding, X.; Chen, X. Research on the classification of ECG and PCG signals based on BiLSTM-GoogLeNet-DS. Appl. Sci. 2022, 12, 11762. [Google Scholar] [CrossRef]
Rong, Y.; Fynn, M.; Nordholm, S.; Siaw, S.; Dwivedi, G. Wearable electro-phonocardiography device for cardiovascular disease monitoring. In Proceedings of the 2023 IEEE Statistical Signal Processing Workshop, Hanoi, Vietnam, 2–5 July 2023; pp. 413–417. [Google Scholar]
McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. Librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference 2015, Austin, TX, USA, 6–12 July 2015; pp. 18–24. [Google Scholar]
Yu, S.; Meng, J.; Fan, W.; Chen, Y.; Zhu, B.; Yu, H.; Xie, Y.; Sun, Q. Speech emotion recognition using dual-stream representation and cross-attention fusion. Electronics 2024, 13, 2191. [Google Scholar] [CrossRef]
Zhu, B.; Li, X.; Feng, J.; Yu, S. VGGish-BiLSTM-attention for COVID-19 identification using cough sound analysis. In Proceedings of the International Conference on Signal and Image Processing, Wuxi, China, 8–10 July 2023; pp. 49–53. [Google Scholar]
Fernandez-Quilez, A. Deep learning in radiology: Ethics of data and on the value of algorithm transparency, interpretability and explainability. AI Ethics 2023, 3, 257–265. [Google Scholar] [CrossRef]
Nazir, S.; Dickson, D.M.; Akram, M.U. Survey of explainable artificial intelligence techniques for biomedical imaging with deep neural networks. Comput. Biol. Med. 2023, 156, 106668. [Google Scholar] [CrossRef] [PubMed]
Yu, S.; Zhang, Z.; Liang, X.; Wu, J.; Zhang, E.; Qin, W.; Xie, Y. A Matlab toolbox for feature importance ranking. In Proceedings of the International Conference on Medical Imaging Physics and Engineering, Shenzhen, China, 22–24 November 2019; Volume 9, pp. 1–6. [Google Scholar]
Yu, S.; Jin, M.; Wen, T.; Zhao, L.; Zou, X.; Liang, X.; Xie, Y.; Pan, W.; Piao, C. Accurate breast cancer diagnosis using a stable feature ranking algorithm. BMC Med. Inform. Decis. Mak. 2023, 23, 64. [Google Scholar] [CrossRef] [PubMed]
Khaire, U.M.; Dhanalakshmi, R. Stability of feature selection algorithm: A review. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 1060–1073. [Google Scholar] [CrossRef]
Eigenschink, P.; Reutterer, T.; Vamosi, S.; Vamosi, R.; Sun, C.; Kalcher, K. Deep generative models for synthetic data: A survey. IEEE Access 2023, 11, 47304–47320. [Google Scholar] [CrossRef]
Van Den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
Jabloun, M.; Ravier, P.; Buttelli, O.; Lédée, R.; Harba, R.; Nguyen, L.-D. A generating model of realistic synthetic heart sounds for performance assessment of phonocardiogram processing algorithms. Biomed. Signal Process. Control 2013, 8, 455–465. [Google Scholar] [CrossRef]
Yu, S.; Chen, M.; Zhang, E.; Wu, J.; Yu, H.; Yang, Z.; Ma, L.; Gu, X.; Lu, W. Robustness study of noisy annotation in deep learning based medical image segmentation. Phys. Med. Biol. 2020, 65, 175007. [Google Scholar] [CrossRef] [PubMed]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
Zhang, J.; Huang, J.; Jin, S.; Lu, S. Vision-language models for vision tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5625–5644. [Google Scholar] [CrossRef]
Chen, F.-L.; Zhang, D.-Z.; Han, M.-L.; Chen, X.-Y.; Shi, J.; Xu, S.; Xu, B. VLP: A survey on vision-language pre-training. Mach. Intell. Res. 2023, 20, 38–56. [Google Scholar] [CrossRef]
Gu, J.; Cho, H.C.; Kim, J.; You, K.; Hong, E.K.; Roh, B. CheX-GPT: Harnessing large language models for enhanced chest X-ray report labeling. arXiv 2024, arXiv:2401.11505. [Google Scholar]
Han, C.; Kim, D.W.; Kim, S.; You, S.C.; Bae, S.; Yoon, D. Large-language-model-based 10-year risk prediction of cardiovascular disease: Insight from the UK biobank data. medRxiv 2023. [Google Scholar] [CrossRef]
Gala, D.; Makaryus, A.N. The utility of language models in cardiology: A narrative review of the benefits and concerns of ChatGPT-4. Int. J. Environ. Res. Public Health 2023, 20, 6438. [Google Scholar] [CrossRef]
Hu, Y.; Li, T.; Lu, Q.; Shao, W.; He, J.; Qiao, Y.; Luo, P. Omnimedvqa: A new large-scale comprehensive evaluation benchmark for medical LVLM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 22170–22183. [Google Scholar]
Kwon, T.; Ong, K.T.-I.; Kang, D.; Moon, S.; Lee, J.R.; Hwang, D.; Sohn, B.; Sim, Y.; Lee, D.; Yeo, J. Large language models are clinical reasoners: Reasoning-aware diagnosis framework with prompt-generated rationales. Proc. AAAI Conf. Artif. Intell. 2024, 38, 18417–18425. [Google Scholar] [CrossRef]
Qiu, J.; Li, L.; Sun, J.; Peng, J.; Shi, P.; Zhang, R.; Dong, Y.; Lam, K.; Lo, F.P.-W.; Xiao, B.; et al. Large AI models in health informatics: Applications, challenges, and the future. IEEE J. Biomed. Health Inform. 2023, 27, 6074–6087. [Google Scholar] [CrossRef] [PubMed]
Hong, S.; Zhou, Y.; Shang, J.; Xiao, C.; Sun, J. Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review. Comput. Biol. Med. 2020, 122, 103801. [Google Scholar] [CrossRef] [PubMed]
Pahuja, S.K.; Veer, K. Recent approaches on classification and feature extraction of EEG signal: A review. Robotica 2022, 40, 77–101. [Google Scholar]

Figure 1. The number of technical publications per year since the database was released.

Figure 2. LR-HSMM, a recommended heart sound segmentation algorithm.

Table 1. PCG signal features and extraction methods.

	Features	Methods
Time domain	statistical features; waveform amplitudes; higher-order cumulants; absolute sum of first differences; aggregated features of auto-correlation coefficients; symmetry of S1 (S2) time segments; signal envelope; polynomial fitting coefficients	peak detection [77]; waveform duration [66]; Hilbert transform [101]; zero-crossing rate [130]; zero-frequency filtering [132]; homomorphic filtering [55,92]; statistical analysis [27,61,66,75,104,109,149]; feature pooling [150,151]; average magnitude difference [60]; autoregression [152]; polynomial fitting [153,154]; fuzzy set theory [155]
Frequency domain	formant frequency, amplitude, and width; spectral density, spectral centroid, spectral roll-off point, and spectral flux; harmonic coefficients; power spectral density; frequency sub-bands; maximum frequency; signal amplitude; energy ratio	discrete Fourier transform (FT) [66,110]; fast FT [75,156,157]; power truncation [104]; tunable Q-factor WT [72,90]; non-negative matrix factorization [44]; averaged periodogram [35]; Burg autoregression [158]
Time-frequency domain	WT coefficients and gradient map; MFCCs; spectrogram features; scalogram; mean and deviation of instantaneous frequency; mean and variance of eigenvalues; time-varying spectral flatness; statistical and geometric features from time-frequency images	Chirplet transform [57]; fractional FT [76]; STFT [25,65,68]; WT [48,54,59,99,159,160]; Gabor filtering [73,161]; spare representation [106]; wavelet decomposition [26,28,156]; cross WT [162]; Gammatone filtering [70,163]; Mel-filter banks and discrete Cosine transform [109,111,117,125,164,165,166]

Table 2. Dual-modal (ECG and PCG) feature extraction methods.

	Year	Feature	Feature Extraction
[175]	2020	time-domain features, including the average of S1/S2 amplitudes, the zero-crossing rate, and the average of R-wave amplitudes	R-peak detection and statistical analysis
[128]	2021	scalograms	Morlet wavelets (PCG) and complex Morlet wavelets (ECG)
[176]	2021	Q-, R-, and S-wave composite (ECG) and features, including signal envelope and power spectral density (PCG)	statistical analysis, peak detection, Hilbert Transform, CWT, DWT, and STFT
[177]	2022	time-frequency-domain features	wavelet scattering transform
[178]	2023	time- and frequency-domain features	statistical analysis, DFT, and R-wave detection
[179]	2023	time-frequency spectrogram	adaptive super-lattice technique
[180]	2023	scalograms	CWT

Table 3. Heart sound state classification models using PCG signals.

		Models	References
ML	time series	Gaussian-mixture HMM; nonlinear autoregressive networks	[67,117,182]
	unsupervised learning	autoencoder (AE); k-means	[24,30,76]
	supervised learning	SVM; decision tree; k-nearest neighbor; logistic regression; Bayes; linear discriminant analysis	[24,31,35,36,38,41,45,46,47,56,69,75,76,77,79,89,103,104,105,106,107,156,159,171]
	ensemble classifier	random forest; nested ensemble algorithms; bagging; boosting; adaBoost	[24,27,28,35,36,44,47,55,76,79,89,98,103,104,105,130,159,172]
	NN	artificial NN; radial basis function NN; multi-branch artificial NNs; dynamic recursive NN	[38,50,72,90,117]
	others	zero-order autonomous learning neuro-fuzzy approach; semi-supervised learning with variational AE	[30,66]
DL	CNN	various architectures of CNN models	[24,25,29,32,33,40,42,43,48,51,53,57,58,59,61,70,74,86,99,101,103,105,108,110,125,157,163]
	RNN	RNN; LSTM	[39,60,81,86,102,104,165,169]
	attention	time-delay network; dynamic masked-attention module; efficient channel attention network	[31,65,100,118,157,169]
	others	pre-trained network; joint learning network; double-stream network	[39,68,111,164,170,183]
Hybrid		network voting; federated averaging network; Bayesian LSTMs; CNN-SVM	[26,37,54,65,80,81,172,184]

Table 4. Current achievements on the PCHSD2016 database using PCG signals.

	Year	Train/Valid./Test	Model	$ACC$	$SEN$	$SPE$	$Score$
[23]	2017	75/15/10	modified AlexNet	97.10%	93.20%	95.10%	94.20%
[102]	2018	75/15/10	RNN	-	98.90%	98.40%	98.60%
[29]	2019	80/0/20	CNN	97.20%	94.80%	99.70%	97.20%
[76]	2020	-	AE	95.50%	89.30%	97.00%	93.20%
[90]	2020	10-fold CV	dynamic NN	97.60%	97.50%	97.70%	97.60%
[24]	2020	80/0/20	AE	99.80%	99.70%	99.10%	99.40%
[165]	2020	80/0/20	autoregressive net	99.00%	100%	98.00%	99.00%
[25]	2021	75/15/10	CNN	95.80%	96.30%	94.10%	95.20%
[125]	2021	-	VGG19	97.00%	94.60%	94.60%	94.60%
[186]	2021	-	CardioXNet	99.60%	99.52%	-	-
[72]	2021	10-fold CV	RBF NN	97.90%	97.70%	98.10%	97.90%
[35]	2021	10-fold CV	KNN + SVM	98.20%	99.90%	96.40%	98.20%
[100]	2022	10-fold CV	mask encoder	95.60%	87.60%	97.70%	92.70%
[43]	2023	70/15/15	1D CNN	95.50%	97.40%	93.60%	95.50%
[157]	2023	3-fold CV	VGG19	97.00%	94.60%	94.60%	94.60%
[61]	2024	80/10/10	ResNet50	95.70%	-	-	-

Table 5. Inter-database heart state classification using PCG signals.

	Intra-Database Validation				Inter-Database Validation
	ACC	SEN	SPE	Testing on	ACC	SEN	SPE
[25]	95.40%	96.30%	92.40%	PASCAL	96.80%	95.80%	98.00%
[39]	93.10%	-	-	[11]	99.40%	-	-
[74]	74.19%	-	-	UCI MLR	83.00%	-	-

Table 6. Current achievements on the PCHSD2016 database using both ECG and PCG signals.

	Year	Train/Valid./Test	Model	ACC	SEN	SPE	AUC
[175]	2020	60/0/40	SVM	92.50%	92.30%	92.90%	95.10%
[176]	2021	70/0/30	SVM	93.10%	94.40%	90.00%	-
[177]	2022	10-fold CV	SVM	86.40%	85.00%	93.10%	-
[187]	2021	5-fold CV	LSTM+SVM	87.30%	90.30%	84.50%	93.60%
[188]	2023	-	ResNet+SVM	93.10%	-	-	96.70%
[128]	2021	70/10/20	dual-CNN	87.70%	87.70%	87.50%	-
[191]	2022	70/0/30	BiLSTM-GoogLeNet	96.10%	98.50%	90.80%	-
[192]	2023	80/0/20	CNN	81.50%	94.90%	45.50%	-
[189]	2023	90/0/10	CNN	96.10%	90.90%	-	99.00%
[180]	2023	70/0/30	VGG16	82.80%	-	-	88.50%
[179]	2023	-	VGG19	82.10%	80.20%	83.00%	-
[190]	2024	-	CPDNet	-	99.90%	98.90%	99.90%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, B.; Zhou, Z.; Yu, S.; Liang, X.; Xie, Y.; Sun, Q. Review of Phonocardiogram Signal Analysis: Insights from the PhysioNet/CinC Challenge 2016 Database. Electronics 2024, 13, 3222. https://doi.org/10.3390/electronics13163222

AMA Style

Zhu B, Zhou Z, Yu S, Liang X, Xie Y, Sun Q. Review of Phonocardiogram Signal Analysis: Insights from the PhysioNet/CinC Challenge 2016 Database. Electronics. 2024; 13(16):3222. https://doi.org/10.3390/electronics13163222

Chicago/Turabian Style

Zhu, Bing, Zihong Zhou, Shaode Yu, Xiaokun Liang, Yaoqin Xie, and Qiuirui Sun. 2024. "Review of Phonocardiogram Signal Analysis: Insights from the PhysioNet/CinC Challenge 2016 Database" Electronics 13, no. 16: 3222. https://doi.org/10.3390/electronics13163222

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Review of Phonocardiogram Signal Analysis: Insights from the PhysioNet/CinC Challenge 2016 Database

Abstract

1. Introduction

2. Literature Screening

2.1. The PCHSD2016 Database

2.2. Literature Retrieval

2.3. Organization of This Technical Review

3. PCG Signal Preprocessing

3.1. Signal Resampling

3.2. Duration Standardization

3.3. Amplitude Normalization

3.4. Signal Filtering and Denoising

3.4.1. Signal Filtering

3.4.2. Heart Sound Denoising

4. Heart Sound Segmentation

4.1. HMM-Based Heart Sound Segmentation

4.2. ML-Based Heart Sound Segmentation

4.2.1. Unsupervised Clustering-Based Heart Sound Segmentation

4.2.2. Supervised Learning-Based Heart Sound Segmentation

5. Heart Sound Feature Extraction

5.1. PCG Heart Sound Feature Extraction

5.1.1. Time-Domain Feature Extraction

5.1.2. Frequency-Domain Feature Extraction

5.1.3. Time-Frequency-Domain Feature Extraction

5.2. Dual-Modal PCG and ECG Heart Sound Feature Extraction

6. Heart Sound State Recognition

6.1. Evaluation Metrics

6.2. Cross-Validation Strategies

6.3. PCG-Based Heart Sound State Recognition

6.4. State-of-the-Art Works on PCG-Based Heart Sound State Recognition

6.5. PCG-Based Inter-Database Heart State Classification

6.6. Dual-Modal Heart State Classification Using ECG and PCG Signals

7. Discussion

7.1. Our Findings

7.1.1. Classification of Normal and Abnormal Heart Sound States

7.1.2. Feature Engineering on Heart Sound Signals

7.2. Analytical and Data Challenges

7.2.1. Analytical Challenges

7.2.2. Data Challenges

7.3. Potential Future Directions

7.3.1. Multi-Modal Signal Collection, Feature Fusion, and Decision Making

7.3.2. Standardization and Validation of Heart Sound Analysis

7.3.3. Automated Interpretation for Decision Support

7.3.4. Real-Time Monitoring Combined with Longitudinal Data Analysis

7.4. Limitations

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI