Next Article in Journal
AFGN: Adaptive Filtering Graph Neural Network for Few-Shot Learning
Previous Article in Journal
Electrocatalytic Nitrate Reduction for Brackish Groundwater Treatment: From Engineering Aspects to Implementation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Non-Invasive Fetal QRS Complex Detection Method Based on a Multi-Feature Fusion Neural Network

1
School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
2
Technical Department, Beijing Health State Monitoring & Consulting Co., Ltd., Beijing 100080, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(19), 8987; https://doi.org/10.3390/app14198987 (registering DOI)
Submission received: 9 September 2024 / Revised: 27 September 2024 / Accepted: 28 September 2024 / Published: 5 October 2024
(This article belongs to the Section Biomedical Engineering)

Abstract

:
Fetal heart monitoring, as a crucial part of fetal monitoring, can accurately reflect the fetus’s health status in a timely manner. To address the issues of high computational cost, inability to observe fetal heart morphology, and insufficient accuracy associated with the traditional method of calculating the fetal heart rate using a four-channel maternal electrocardiogram (ECG), a method for extracting fetal QRS complexes from a single-channel non-invasive fetal ECG based on a multi-feature fusion neural network is proposed. Firstly, a signal entropy data quality detection algorithm based on the blind source separation method is designed to select maternal ECG signals that meet the quality requirements from all channel ECG data, followed by data preprocessing operations such as denoising and normalization on the signals. After being segmented by the sliding window method, the maternal ECG signals are calculated as data in four modes: time domain, frequency domain, time–frequency domain, and data eigenvalues. Finally, the deep neural network using three multi-feature fusion strategies—feature-level fusion, decision-level fusion, and model-level fusion—achieves the effect of quickly identifying fetal QRS complexes. Among the proposed networks, the one with the best performance has an accuracy of 95.85% and sensitivity of 97%.

1. Introduction

Monitoring the fetal heart rate during pregnancy is a crucial diagnostic tool for assessing the in utero condition of the fetus, enabling the timely detection of fetal distress, arrhythmias, acidosis, and certain pathological conditions related to fetal hypoxia [1]. Fetal heart rate evaluation during the perinatal period is one of the primary measures to reduce pregnancy-related risks and enhance the quality of fetal outcomes [2].
Currently, mainstream fetal heart rate monitors are categorized into three types based on their operational principles: fetal phonocardiography, fetal cardiotocography, and fetal electrocardiography [3]. The fetal QRS complex, observed in fetal electrocardiograms, represents the depolarization of the fetal ventricles, which consists of Q, R, and S waves. Fetal electrocardiography involves placing electrodes on the fetus’s head or the maternal abdomen to obtain the QRS complex that characterizes fetal cardiac activity, divided into invasive and non-invasive detection methods. The non-invasive approach, which directly uses the maternal abdominal electrical signals as the measurement signal, is entirely non-invasive, more readily accepted by pregnant women subjectively, and can be used for long-term monitoring. However, separating the fetal ECG signal can be challenging due to interference from other physiological signals and noise [4], such as maternal ECG, baseline drift, and electromyographic interference [5]. The amplitude of the maternal ECG signal is significantly larger than that of the fetal ECG signal, and there is also spectral overlap [6]. Therefore, to obtain a reliable fetal ECG signal, it is crucial and necessary to eliminate the interference of the maternal ECG.
To address the challenges of fetal heart rate extraction, several mathematical methods have been proposed, including blind source separation algorithms based on Independent Component Analysis (ICA) and Singular Value Decomposition (SVD) [7], algorithms combining power spectral density and matched filtering techniques [8], adaptive filtering algorithms combining SVD and smooth window methods [9], wavelet separation methods combining clustering algorithms with extrema pairs and ICA [10], and a negative entropy-based blind source separation method combined with a template subtraction method [11]. Kahankova et al. proposed a method combining ICA and LMS, achieving Se, PPV, and F1 scores of 89.41%, 90.42%, and 89.19%, respectively [12]. Mansourian et al. proposed a method for extracting fetal QRS complexes from single-channel abdominal ECG signals based on an Adaptive Improved Permutation Entropy (AIPE) algorithm, achieving an F1 score and Se of 90.95% and 90.77%, respectively [13]. Other researchers have used methods such as Principal Component Analysis [14,15], Recursive Least Squares [16], Tensor Decomposition [17], and Linear Combination Algorithms [18] for fetal heart rate detection research, all achieving satisfactory experimental outcomes.
With significant advancements in deep learning technologies, numerous studies have used neural networks to extract fetal ECG signals without separating the maternal ECG [19,20,21]. Nguyen et al. proposed a novel method based on Conditional Generative Adversarial Networks (cGANs) for extracting fetal electrocardiogram (ECG) signals in the time–frequency domain using the Short-Time Fourier Transform (STFT) method. This method demonstrates higher robustness and resistance to interference compared to traditional approaches [22]. Zhong et al. employed a one-dimensional convolutional neural network (CNN) to detect fetal QRS complexes without removing the maternal ECG signal, aiming for a similar goal as this study, achieving an accuracy of 75.33%, a recall rate of 80.54%, and an F1 score of 77.85% [23]. Lee et al. proposed a more profound CNN architecture that utilizes four-channel original ECG signals to detect fetal QRS, resulting in a positive predictive value (PPV) of 92.77% and an average sensitivity of 89.06% [24]. Vo et al. used the Short-Time Fourier Transform (STFT) method to prepare features for training, employing ResNet and a one-dimensional octave convolution network to design a four-class deep learning model, which achieved an accuracy of 91.1% [25].
This study introduces a non-invasive fetal QRS complex recognition method based on a multi-feature fusion neural network. Initially, we designed a signal entropy data quality detection algorithm based on blind source separation to ensure the selected maternal ECG signals meet quality requirements. Subsequently, the signals were subjected to data preprocessing operations such as denoising and normalization to prepare for further analysis. Following this, the maternal ECG signals were segmented using a sliding window technique and computed across four modalities: time domain, frequency domain, time–frequency domain, and nonlinear characteristic values. Ultimately, we proposed three types of deep neural networks that integrate multiple features to identify fetal QRS complexes rapidly. The overall algorithmic workflow of this topic is illustrated in Figure 1.
The method proposed in this paper has four main advantages. First, traditional methods require at least four channels to separate fetal ECG signals from maternal ECG signals, which can be uncomfortable for pregnant women in clinical applications and have significant limitations. The method used in this paper does not require the separation of maternal and fetal ECG signals. It can directly detect fetal heart QRS complexes from a single-channel fetal ECG signal obtained from the abdomen of pregnant women. Second, fetal ECG waveforms can intuitively display the fetal cardiac activity status, containing valuable information for doctors to diagnose and treat. Traditional methods can only calculate the fetal heart rate and fetal heart rate variability, whereas the technique proposed in this paper can extract complete fetal ECG waveforms with apparent features. This allows for the precise prenatal monitoring of fetal health based on morphological information from the fetal ECG. Third, this paper explores lightweight networks with lower storage and computation requirements. By using a window moving method, it is possible to quickly label the position and shape of the fetal heart, achieving the real-time monitoring of the fetal ECG. Fourth, including multimodal data provides the deep learning model with more prosperous and adequate information. The model’s overall performance is improved by leveraging the complementarity of information across multiple modes.

2. Signal Collection and Preprocessing

2.1. Signal Sources

The PhysioNet/CinC dataset, widely utilized due to its substantial volume of data, limited number of channels, and available reference annotations, was employed in this paper’s experiments. Specifically, the PhysioNet/CinC Challenge 2013 dataset, Set A, was used. This subset offers 75 multi-channel non-invasive fetal ECG signals along with reference annotations marking the locations of each fetal QRS complex. Each record includes four channels and lasts 1 min, with a sampling frequency 1000 Hz and a 16-bit resolution. Following the challenge organizers’ advice, due to inaccurate annotations, seven recordings (a33, a38, a47, a52, a54, a71, and a74) were discarded [26,27,28]. The abdominal ECG signals of pregnant women primarily consist of maternal ECG signals, fetal ECG signals, and various noise components. As illustrated in Figure 2, the signal source represents the four abdominal ECG signals from different electrode placements on a pregnant woman.
The Abdominal and Direct Fetal ECG Database (ADFECGDB) from PhysioNet comprises ECG recordings from five pregnant women [26,29]. Each record includes four abdominal channels and one direct fetal channel, with each segment lasting 5 min. The signals were sampled at 1 kHz with a 16-bit resolution. We utilize the four abdominal channels of this database as our test set to validate the reliability and accuracy of the proposed algorithm in this paper.

2.2. Data Quality Assessment Algorithm Based on Blind Source Separation and Sample Entropy

The fundamental concept of ICA technology is that under the assumption of independence, it can decompose multi-channel mixed signals into several independent components. Since the heart of the mother and the fetus are two independent bioelectrical signal sources, it is assumed that maternal and fetal ECG signals are statistically independent. The data quality assessment algorithm proposed in this paper aims to select pregnant women’s ECG signals where the fetal heart is more apparent. Therefore, it is not sufficient to perform a data quality assessment on maternal ECG signals; instead, the aim is to assess the quality of the amplified fetal signals. A novel unsupervised algorithm for detecting fetal QRS complexes, which integrates ICA and maternal ECG signal cancellation, was introduced in the literature [5]. The process begins with signal preprocessing to remove baseline drift and power line interference, followed by the steps below: ICA to extract maternal ECG signals; maternal QRS complex detection; maternal ECG signal elimination through weighted SVD; and the enhancement of fetal ECG signals through secondary ICA. This paper adopts the algorithm by Varanini et al. to amplify fetal ECG signals from four channels of raw abdominal signals.
Sample entropy is a metric that quantifies the complexity of a time series by measuring the likelihood of new pattern generation within a signal [30]. This study calculates sample entropy values based on ten randomly selected segments, each comprising 500 points from the data, to better represent the overall condition of the data through averaging [31]. The formula for calculating sample entropy is as follows:
B m r = 1 N m + 1 i = 1 N m + 1 B i m r S a m p E n m , r = l i m N { l n [ B m + 1 ( r ) B m ( r ) ] }
In this context, S a m p E n ( m , r ) denotes sample entropy, where m represents the embedding dimension, r represents the similarity tolerance threshold, and N represents the length of the signal. B i m ( r ) signifies the ratio of the number of sample pairs with a distance less than or equal to r to the total number of samples N m . For this study, the embedding dimension was set to 2, and the similarity tolerance threshold was established at 5.2 times the standard deviation (std) of the time series. Through extensive comparison between the mean sample entropy values and data quality, signals with a sample entropy greater than 0.0055 were considered high quality and suitable for further analysis and research.

2.3. Signal Denoising and Normalization

It is essential to denoise the abdominal electrical signals of pregnant women before use [32,33,34]. ECG signals extracted from maternal data are often characterized by a low signal-to-noise ratio (SNR), and stochastic non-stationarity. The wavelet soft threshold denoising algorithm is particularly well-suited for handling these challenges, as it effectively removes high-frequency noise while preserving the critical features of the original signal. This adaptability makes it especially beneficial for processing maternal ECG signals, where maintaining signal integrity is crucial despite the presence of noise. Through experiments, we found that using the 3-level db4 wavelet basis for denoising yielded better results compared to other methods, effectively preserving the critical features of the signal [35].
Signal normalization eliminates unit differences within the signal, benefiting the stability and convergence of algorithms, accelerating convergence in neural networks, and enhancing the model’s generalizability. Additionally, normalization ensures that the network can more effectively process and classify the denoised signals, improving accuracy and consistency across varying datasets. Therefore, this paper normalizes the denoised data.

2.4. Sliding Window Segmentation of the Signal

This paper segments the signal using a sliding window approach, aiming to quickly output the shape and precise location of the fetal QRS complexes as the window slides during application. Hence, two types of data were used to train the neural network: signals containing fetal QRS complexes and signals without fetal QRS complexes. The dataset containing fetal QRS complexes requires that a window includes a complete set of fetal ECG QRS complexes in a manageable amount of time, ensuring the timeliness of the output. A fetal QRS complex is approximately 20 to 50 points long. Based on the characteristics of fetal ECG signals and application requirements, the chosen window length for this study is 60 points.
As illustrated in Figure 3, the first image represents the raw data. The second image shows the signal after the amplification of the fetal ECG signals. For the signal in the second image, a sample entropy quality detection algorithm is applied to filter out maternal ECG signals that meet the quality criteria. The third image demonstrates (a) a signal of length 60 that includes fetal ECG components and (b) a signal that does not contain fetal ECG components. Through the signal segmentation operation, 46,620 segments containing fetal ECG signals and 46,821 segments without fetal ECG signals were identified across all channels of the maternal ECG.

2.5. Signal Multi-Feature Extraction and Combination

When processing ECG signals from pregnant women, scholars typically employ various signal processing techniques to extract and analyze the signal features, primarily including the time domain, frequency domain, time–frequency domain, and nonlinear feature values [36,37,38]. Time domain analysis, one of the most direct signal analysis methods, uses the ECG signal as a model input to preserve complete feature information. This paper extracts 60-point signals as time domain features directly. Frequency domain analysis reveals periodic and frequency information contained within the signal. This study applies the Fourier transform method to convert the signal from the time domain to the frequency domain, obtaining 60-point frequency domain information. Time–frequency domain analysis provides joint distribution information of time and frequency domains, enhancing morphological features of the data and expanding feature diversity [21,39]. This study calculates time–frequency domain signals using the wavelet transform method. Given the varying sizes of the time–frequency plots after wavelet transformation, to facilitate processing by CNNs, the dimensions of the time–frequency domain data are uniformly adjusted to 24 × 60.
The fetal heart rate is a high-dimensional chaotic signal that follows specific nonlinear rules; thus, in the analysis of fetal heart rate signals, research methods from nonlinear dynamics, such as approximate entropy and Lyapunov exponents, are often considered [40]. Nonlinear analysis can help us understand ECG signals’ complex structures and dynamic behaviors. This paper primarily extracts seven types of feature values from the ECG signal.

2.5.1. ECG Signal Feature Values

Calculating specific data feature values of ECG signals, such as extremal values, area, and extremal points, represents the signal’s energy and time domain distribution characteristics.

2.5.2. Lyapunov Exponent

The Lyapunov exponent can measure the chaotic state of a system. A higher Lyapunov exponent suggests the system may exhibit chaotic behavior, indicating poorer local stability [41]. Since the ECG signal is stochastic, this paper employs the trajectory-based Wolf algorithm to solve for the Lyapunov exponent, with seven chosen as the embedding dimension and three as the time delay. The following formula defines the maximum Lyapunov exponent:
λ m a x = 1 t m t 0 k = 1 M ln L t k L t k 1
where L ( t k ) represents the distance between the two points closest to each other at the time t k ; M is the total number of steps calculated.

2.5.3. Higher-Order Statistics

Higher-order statistics describe the higher-order numerical characteristics of a signal. Higher-order moments can be represented as follows:
μ k = E [ X E ( X k ] , for k 2
where k represents the order of the moment, and μ k represents the value of the k th order moment. The first-order moment μ 1 = E X  represents the mean; the second-order moment represents the variance, describing the degree of dispersion; the third-order moment represents skewness, used to measure the symmetry of the distribution; the fourth-order moment represents kurtosis, used to describe the degree of peakedness in the signal distribution.

2.5.4. Approximate Entropy and Sample Entropy

Approximate entropy is an indicator used to measure the complexity and irregularity of a signal with a specific resistance to noise. Higher values of approximate entropy usually reflect the signal’s irregularity and complexity. This paper extracts approximate entropy information from the ECG signal. The formula for the signal’s approximate entropy is as follows:
A p E n m , r = ϕ m r ϕ m + 1 r
where A p E n m , r represents approximate entropy, m  represents the length of the subsequence, r represents the matching threshold, and ϕ m r represents the conditional probability given the template length m and the matching threshold r . The formula is as follows:
ϕ m r = 1 N m + 1 i = 1 N m + 1 ln C i m ( r )
where N  is the length of the time series, and C i m ( r ) represents the ratio of the approximate number of matches to the total number. To achieve better statistical properties and smaller errors, the embedding dimension m is set to 2, and the similarity tolerance r is set to 0.2 × std.

2.5.5. Fractal Dimension

The fractal dimension is used to describe a signal’s self-similarity and fractal structure, reflecting the signal’s complexity and regularity [42].

2.5.6. SVD

SVD provides an effective way to extract the main features from data. By retaining the most significant singular values, the data’s most important structures and information can be preserved, thus achieving the dimensionality reduction of the data [43]. The decomposition formula for SVD is as follows:
A = U Σ V H
Let A C m × n . There exist unitary matrices U C m × m and V C n × n such that A = U Σ V H , where H denotes the conjugate transpose. It follows that U H U = U U H = I , illustrating the preservation of orthonormality within these transformations.

2.5.7. Hilbert–Huang Transform

The Hilbert–Huang Transform (HHT) method is an adaptive method for analyzing nonlinear and nonstationary signals. It combines the Hilbert Transform method with local feature extraction techniques, enabling the local analysis of signals. Thus, it offers an effective way to extract features in signal processing [44].
H x t = 1 π x ( τ ) t τ d τ
Herein, H x t notes the signal after the Hilbert Transform, where x t is the original signal and τ represents the integration variable.
x t = i = 1 n c i ( t ) + r n ( t )
wherein, c i t represents the i Intrinsic Mode Function (IMF), and r n ( t ) denotes the residue. Synthesizing the steps above, the overall process of the HHT method begins with applying the Hilbert Transform to the signal, followed by the Empirical Mode Decomposition of the transformed signal to yield a series of IMFs.
Upon calculation, the time domain, frequency domain, and eigenvalue signals are all matrices of size 1 × 60, while the time–frequency domain signal is size 24 × 60. Through amalgamation, a matrix of size 27 × 60 is obtained.

3. Multi-Feature Fusion Neural Networks

Numerous scholars have utilized CNNs to extract fetal ECG signals from pregnant women [33,34,35,45]. However, these algorithms exhibit certain limitations. Firstly, although CNNs can autonomously learn effective features from raw ECG data, these features often need more interpretability. Moreover, some essential features that can be easily extracted manually are challenging to learn through CNNs. Secondly, scholars tend to employ multiple layers of complex neural networks to enhance the network’s recognition accuracy, resulting in slower computational speeds and higher demands on hardware resources, thereby limiting the practical application of these advancements.
Therefore, developing multi-feature fusion neural networks that are both highly accurate and cost-efficient in computation holds significant importance. Such networks are better equipped to adapt to data distributions and task requirements. Common multimodal fusion strategies are primarily categorized into early fusion, late fusion, and multi-stage fusion. Early fusion refers to integrating data from different modalities into a single feature representation during the initial feature extraction phase. Late fusion involves the integration of results after features from different modalities have been extracted and processed independently. Multi-stage fusion strategies integrate information from multiple modalities at various stages. This paper designs three distinct model fusion networks and compares the accuracy and merits of three fetal ECG classification algorithms that fuse features across multiple domains.

3.1. Feature-Level Fusion

Feature-level or early fusion is the most commonly used strategy in multimodal recognition systems. It involves the direct combination of time-domain, frequency-domain, time–frequency domain, and nonlinear features obtained after preprocessing into a 27 × 60 matrix. This matrix is then fed into a neural network for learning and classification. The network diagram for feature-level fusion is shown in Figure 4. This study employs Resnet34 to learn the early fusion of multimodal data, with a training duration of 50 rounds [46]. ResNet34 efficiently captures complex patterns from multimodal data, achieving a high classification accuracy through the early fusion of diverse feature types while maintaining computational efficiency.

3.2. Late Fusion

The cross-modal attention mechanism dynamically allocates different weights to each modality by learning the correlations between modalities, thus better-integrating information for specific tasks. This study utilizes a multimodal attention model that accepts four types of input data: time-domain data processed by a Long Short-Term Memory (LSTM) model, frequency-domain data handled by a 1D-CNN model, nonlinear feature data analyzed by an FCN model, and all feature data processed by a ShuffleNet model. Initially, each model processes its input data, which are then concatenated along the feature dimension. Through a multi-head attention layer, features are integrated and weighted for attention, facilitating the interaction and integration of information between different features [47,48]. The final classification result is output through a fully connected (FC) layer with an output dimension of two.
  • Using LSTM for Time-Domain Signals: ECG signals are time-domain signals, making LSTM networks suitable for processing them [49]. The LSTM model used in this study consists of an LSTM layer and an FC layer, designed to process sequential signals with an input dimension of 1 × 60. The output of this network is four feature values.
  • Using 1D-CNN for Frequency-Domain Signals: The 1D-CNN network used here includes two convolutional layers, ReLU activation functions, a max pooling layer, and two FC layers, effectively extracting and classifying local features from frequency-domain data [50]. The input to this network is sixty values, and the output is four values.
  • Using DNN for Feature Value Signals: Given the small amount of feature value data extracted from each segment of the ECG signal, this study uses an FC neural network model to analyze feature value signals. The network consists of four FC layers, each followed by a ReLU activation function, automatically extracting valuable features for classification tasks. The input to this network is sixty values, and the output is four values.
  • Using ShuffleNet for All Modal Data: ShuffleNet is a lightweight deep neural network architecture designed for efficient computation and parameter usage [51]. Given that the time–frequency domain signals are 24 × 60, significantly larger than the signals from the first three modalities, ShuffleNet was chosen for its ability to quickly analyze such data. The output of this network is four feature values.
The network diagram for late fusion based on the cross-modal attention algorithm is shown in Figure 5. Since the LSTM network is suitable for processing time-domain signals, it can capture the temporal dependencies within ECG signals. The 1D-CNN can efficiently extract local features from frequency-domain signals. The DNN model can automatically extract valuable information from a small amount of feature value data. ShuffleNet, due to its lightweight design, can quickly process larger size time–frequency domain signals. Finally, the multimodal attention model can comprehensively utilize the characteristics of various types of data, achieving the efficient and accurate classification of ECG signals. The combination of these specialized networks, alongside the cross-modal attention mechanism, allows the model to dynamically allocate computational resources based on the importance of different features. This design ensures that high classification accuracy is achieved without excessive computational overhead, striking a balance between accuracy and efficiency.

3.3. Model-Level Fusion Algorithm

The Multi-Layer LSTM (ML-LSTM) method is a model-level fusion method that integrates different modal data at various stages of the network, enabling the acquisition of a joint feature representation of four types of characteristics from the pregnant woman’s ECG signal for the recognition of fetal ECG QRS complexes [49].
First, the time-domain features are fed into the first layer of the LSTM network, with an input dimension of 60 and a hidden unit size of 64. Then, the frequency-domain features are concatenated with the hidden state outputs of the first-layer LSTM (64 dimensions) and input into the second-layer LSTM, with an input dimension of 65 (64 + 1) and a hidden unit size of 64. Next, nonlinear features are concatenated with the hidden state outputs of the second-layer LSTM (64 dimensions) and introduced into the third-layer LSTM, with an input dimension of 65 (64 + 1) and a hidden unit size of 64. Finally, the hidden state of the third-layer LSTM (64 dimensions) is concatenated with the time–frequency domain features (24 dimensions) and fed into the fourth-layer LSTM, with an input dimension of 89 (64 + 25) and a hidden unit size of 64. In this way, the fourth-layer LSTM can simultaneously process information from the time domain, frequency domain, time–frequency domain, and nonlinear features. These features are then fed into a fully connected layer (FC layer) for the final classification task, generating the ultimate prediction results. The network architecture of the model-level fusion algorithm based on ML-LSTM is illustrated in Figure 6.
By progressively combining different types of features at each LSTM layer, the ML-LSTM model efficiently captures the complex relationships between various signal characteristics. This layered fusion approach not only enhances classification accuracy but also optimizes resource allocation, ensuring that the computational load remains manageable.

4. Results

4.1. Model Training and Classification

The Cosine Annealing Learning Rate Scheduler is a commonly employed strategy for adjusting the learning rate in the training process of neural networks. In this study, the Cosine Annealing Learning Rate Scheduler was applied, and the learning rate was decreased from an initial value of 0.001 to a final value of 0.0001. The dataset was divided into non-overlapping training and test sets, with a ratio of 7:3, resulting in a training set of 65,408 and a test set of 28,033 entries.
This study utilized the entire ADFECGDB dataset as the test set to validate the accuracy and reliability of the proposed algorithm. Each data entry consisted of 1 min recordings, with a total of 11,492 entries in the dataset.

4.2. Evaluation Metrics

This study employs accuracy, sensitivity, specificity, PPV, F1 score, and the ROC curve to measure the performance of the classification model. Accuracy refers to the ratio of the number of correctly predicted samples to the total number of samples; sensitivity, also known as recall, indicates the model’s ability to identify positive cases correctly; specificity refers to the model’s ability to identify negative cases correctly; PPV represents the proportion of samples predicted as positive that are positive; the F1 score considers both the precision and recall of the model, representing their harmonic mean. The ROC curve is a graphical tool used to evaluate the performance of binary classification models. The area under the ROC curve (AUC) is often used to quantify the model’s overall performance. An AUC value closer to 1 indicates better model performance and a more remarkable ability to distinguish between positive and negative samples.

4.3. Results and Discussion

Table 1 presents a comparative analysis of experimental results from various research methods, comparing the multimodal and unimodal methods proposed in this paper, ablation studies, and other research. Figure 7a illustrates the accuracy trend of different models during training. It can be clearly observed that the feature-level fusion method based on ResNet34 converges faster and achieves a higher final accuracy compared to other methods. Meanwhile, Figure 7b presents the ROC curve for the ResNet34 feature-level fusion model, with an area under the curve (AUC) of 0.99, indicating excellent classification performance. These results further validate the robustness of the proposed method in handling multimodal data.
From Table 1 and the figures, it is evident that multimodal methods, especially the ResNet34-feature level fusion, significantly outperform unimodal methods in all evaluation metrics. The ResNet34-feature level fusion achieved a 95.85% accuracy, 97% sensitivity, 95% specificity, and an F1 score of 91%, demonstrating its strong overall performance. In comparison, the unimodal methods, such as LSTM in the time domain and CNN in the frequency domain, show noticeably lower results, with accuracy values of 85.42% and 88.41%, respectively. The fusion approach effectively leverages the complementary information among various features, thereby enhancing classification performance.
Moreover, the results of testing the ResNet34-feature level fusion algorithm on the ADFECGDB dataset are also impressive, achieving a high accuracy of 97.89%, with a sensitivity of 97% and a specificity of 98%. This indicates that the proposed method not only demonstrates high reliability but also possesses transferability, as it performs effectively across different datasets, validating its potential for broader applications in fetal heart monitoring.
The ablation study further validates the importance of each feature to the model’s performance. Removing the time–frequency domain features has the most significant impact on performance, indicating their crucial role in fetal heart signal classification. In contrast, removing time-domain, frequency-domain, and feature values has a relatively minor impact. A comparison with other studies shows that the methods proposed in this paper generally outperform others in various performance metrics, with all compared studies utilizing the same database, the PhysioNet/CinC Challenge 2013 dataset.
Overall, the multimodal methods proposed in this paper demonstrate strong performance and wide applicability in the classification of fetal heart ECG signals. The comprehensive analysis across multiple databases and the ablation study further validate the robustness of the proposed methods, providing important references for their potential application in clinical and wearable health monitoring systems.

5. Conclusions

High-quality fetal ECG signals are of significant clinical importance for diagnosing the health status of fetuses. Our method leverages a sliding window approach to identify fetal QRS complexes from maternal ECG signals sequentially while employing a lightweight deep neural network for the rapid and accurate output of fetal QRS complex waveforms, paving the way for the intelligent morphological analysis of fetal ECG signals. This approach overcomes the limitations of traditional fetal heart rate monitoring, which requires the separation of maternal and fetal ECG signals. Compared to conventional methods that necessitate four-channel detection, our method achieves the reliable detection performance of fetal QRS complexes with data from only a single channel under the premise of ensuring data quality. This means that pregnant women, using a single electrode patch attached to the abdomen, can independently monitor the fetal heartbeat status at home using a smartphone, presenting a broad application prospect.
However, the current dataset does not include cases of high-risk pregnancies or multiple pregnancies, which are important scenarios for clinical practice. Future research should focus on adapting fetal heart rate detection methods to effectively function in such challenging conditions. Looking forward, the further training of the network with more extensive and diverse datasets is necessary to enhance analysis accuracy and robustness. Future improvements could also involve integrating advanced noise reduction techniques and adaptive algorithms that account for different maternal conditions and environments, thereby increasing the method’s applicability and reliability in clinical practice.

Author Contributions

Conceptualization, Z.H. and J.Y.; methodology, Z.H.; software, Y.S.; validation, Z.H., J.Y. and X.W.; formal analysis, Z.H.; investigation, Y.S.; resources, J.Y.; data curation, Z.H.; writing—original draft preparation, Z.H.; writing—review and editing, J.Y. and X.W.; visualization, Y.S.; supervision, J.Y.; project administration, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The publicly available datasets used in this study have received ethical approval from the relevant institutions, and no new ethical review was required.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets analyzed during the current study are available in the PhysioNet repository, https://physionet.org/content/challenge-2013/1.0.0/ (accessed on 18 December 2023) and https://physionet.org/content/adfecgdb/1.0.0/ (accessed on 20 July 2024), respectively.

Conflicts of Interest

Author Junsheng Yu was employed by the company Beijing Health State Monitoring & Consulting Co., Limited, Beijing, China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interes.

References

  1. Alfirevic, Z.; Gyte, G.M.; Cuthbert, A.; Devane, D. Continuous Cardiotocography (CTG) as a Form of Electronic Fetal Monitoring (EFM) for Fetal Assessment during Labour. Cochrane Database Syst. Rev. 2017, 2019, CD006066. [Google Scholar] [CrossRef] [PubMed]
  2. Anisha, M.; Kumar, S.S.; Nithila, E.E.; Benisha, M. Detection of Fetal Cardiac Anomaly from Composite Abdominal Electrocardiogram. Biomed. Signal Process. Control 2021, 65, 102308. [Google Scholar] [CrossRef]
  3. Hamelmann, P.; Vullings, R.; Kolen, A.F.; Bergmans, J.W.M.; Van Laar, J.O.E.H.; Tortoli, P.; Mischi, M. Doppler Ultrasound Technology for Fetal Heart Rate Monitoring: A Review. IEEE Trans. Ultrason. Ferroelect. Freq. Contr. 2020, 67, 226–238. [Google Scholar] [CrossRef] [PubMed]
  4. Verkruysse, W.; Svaasand, L.O.; Nelson, J.S. Remote Plethysmographic Imaging Using Ambient Light. Opt. Express 2008, 16, 21434. [Google Scholar] [CrossRef] [PubMed]
  5. Varanini, M.; Tartarisco, G.; Billeci, L.; Macerata, A.; Pioggia, G.; Balocchi, R. A Multi-Step Approach for Non-Invasive Fetal ECG Analysis. In Proceedings of the Computing in Cardiology 2013, Zaragoza, Spain, 22–25 September 2013; pp. 281–284. [Google Scholar]
  6. Ghodsi, M.; Hassani, H.; Sanei, S. Extracting Fetal Heart Signal from Noisy Maternal ECG by Multivariate Singular Spectrum Analysis. Stat. Its Interface 2010, 3, 399–411. [Google Scholar] [CrossRef]
  7. Varanini, M.; Tartarisco, G.; Billeci, L.; Macerata, A.; Pioggia, G.; Balocchi, R. An Efficient Unsupervised Fetal QRS Complex Detection from Abdominal Maternal ECG. Physiol. Meas. 2014, 35, 1607. [Google Scholar] [CrossRef] [PubMed]
  8. Jaeger, K.M.; Nissen, M.; Rahm, S.; Titzmann, A.; Fasching, P.A.; Beilner, J.; Eskofier, B.M.; Leutheuser, H. Power-MF: Robust Fetal QRS Detection from Non-Invasive Fetal Electrocardiogram Recordings. Physiol. Meas. 2024, 45, 055009. [Google Scholar] [CrossRef] [PubMed]
  9. Basak, P.; Nazmus Sakib, A.H.M.; Chowdhury, M.E.H.; Al-Emadi, N.; Cagatay Yalcin, H.; Pedersen, S.; Mahmud, S.; Kiranyaz, S.; Al-Maadeed, S. A Novel Deep Learning Technique for Morphology Preserved Fetal ECG Extraction from Mother ECG Using 1D-CycleGAN. Expert Syst. Appl. 2024, 235, 121196. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Yu, S. Single-Lead Noninvasive Fetal ECG Extraction by Means of Combining Clustering and Principal Components Analysis. Med. Biol. Eng. Comput. 2020, 58, 419–432. [Google Scholar] [CrossRef]
  11. Zhang, Y.; Gu, A.; Xiao, Z.; Dong, K.; Cai, Z.; Zhao, L.; Yang, C.; Li, J.; Zhang, H.; Liu, C. An Effective Integrated Framework for Fetal QRS Complex Detection Based on Abdominal ECG Signal. J. Med. Biol. Eng. 2024, 44, 99–113. [Google Scholar] [CrossRef]
  12. Kahankova, R.; Mikolasova, M.; Martinek, R. Optimization of Adaptive Filter Control Parameters for Non-Invasive Fetal Electrocardiogram Extraction. PLoS ONE 2022, 17, e0266807. [Google Scholar] [CrossRef] [PubMed]
  13. Mansourian, N.; Sarafan, S.; Torkamani-Azar, F.; Ghirmai, T.; Cao, H. Fetal QRS Extraction from Single-Channel Abdominal ECG Using Adaptive Improved Permutation Entropy. Phys. Eng. Sci. Med. 2024, 47, 563–573. [Google Scholar] [CrossRef] [PubMed]
  14. Petrolis, R.; Krisciukaitis, A. Multi Stage Principal Component Analysis Based Method for Detection of Fetal Heart Beats in Abdominal ECGs. In Proceedings of the Computing in Cardiology 2013, Zaragoza, Spain, 22–25 September 2013; pp. 301–304. [Google Scholar]
  15. Deogire, A.D. Multi Lead Fetal QRS Detection with Principal Component Analysis. In Proceedings of the 2018 3rd International Conference for Convergence in Technology (I2CT), Pune, India, 6–8 April 2018; pp. 1–5. [Google Scholar]
  16. Algunaidi, M.M.S.; Ali, M.A.M.; Islam, M.F. Evaluation of an Improved Algorithm for Fetal QRS Detection. Int. J. Phys. Sci. 2011, 6, 213–220. [Google Scholar]
  17. Niknazar, M.; Rivet, B.; Jutten, C. Fetal QRS Complex Detection Based on Three-Way Tensor Decomposition. In Proceedings of the Computing in Cardiology 2013, Zaragoza, Spain, 22–25 September 2013; pp. 185–188. [Google Scholar]
  18. Perlman, O.; Katz, A.; Zigel, Y. Noninvasive Fetal QRS Detection Using a Linear Combination of Abdomen ECG Signals. In Proceedings of the Computing in Cardiology 2013, Zaragoza, Spain, 22–25 September 2013; pp. 169–172. [Google Scholar]
  19. Ghonchi, H.; Abolghasemi, V. A Dual Attention-Based Autoencoder Model for Fetal ECG Extraction From Abdominal Signals. IEEE Sens. J. 2022, 22, 22908–22918. [Google Scholar] [CrossRef]
  20. Sharma, K.; Masood, S. Deep Learning-Based Non-Invasive Fetal Cardiac Arrhythmia Detection. In Applications of Artificial Intelligence and Machine Learning; Springer: Singapore, 2021; pp. 511–523. ISBN 9789811630668. [Google Scholar]
  21. Vadivu, M.S.; Kavithaa, G. Fetal QRS Complexes Detection Using Deep Learning Technique. J. Electr. Eng. Technol. 2023, 19, 1909–1918. [Google Scholar] [CrossRef]
  22. Nguyen, V.D. Fetal ECG Extraction on Time-Frequency Domain Using Conditional GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
  23. Zhong, W.; Liao, L.; Guo, X.; Wang, G. A Deep Learning Approach for Fetal QRS Complex Detection. Physiol. Meas. 2018, 39, 045004. [Google Scholar] [CrossRef]
  24. Lee, J.S.; Seo, M.; Kim, S.W.; Choi, M. Fetal QRS Detection Based on Convolutional Neural Networks in Noninvasive Fetal Electrocardiogram. In Proceedings of the 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), Poitiers, France, 24–27 September 2018; pp. 75–78. [Google Scholar]
  25. Vo, K.; Le, T.; Rahmani, A.M.; Dutt, N.; Cao, H. An Efficient and Robust Deep Learning Method with 1-D Octave Convolution to Extract Fetal Electrocardiogram. Sensors 2020, 20, 3757. [Google Scholar] [CrossRef] [PubMed]
  26. Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]
  27. Clifford, G.D.; Silva, I.; Behar, J.; Moody, G.B. Non-Invasive Fetal ECG Analysis. Physiol. Meas. 2014, 35, 1521. [Google Scholar] [CrossRef]
  28. Behar, J.; Andreotti, F.; Zaunseder, S.; Oster, J.; Clifford, G.D. A Practical Guide to Non-Invasive Foetal Electrocardiogram Extraction and Analysis. Physiol. Meas. 2016, 37, R1–R35. [Google Scholar] [CrossRef]
  29. Jezewski, J.; Matonia, A.; Kupka, T.; Roj, D.; Czabanski, R. Determination of Fetal Heart Rate from Abdominal Signals: Evaluation of Beat-to-Beat Accuracy in Relation to the Direct Fetal Electrocardiogram. Biomed. Tech./Biomed. Eng. 2012, 57, 383–394. [Google Scholar] [CrossRef] [PubMed]
  30. Richman, J.S.; Moorman, J.R. Physiological Time-Series Analysis Using Approximate Entropy and Sample Entropy. Am. J. Physiol.-Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef] [PubMed]
  31. Liu, C.; Li, P.; Maria, C.D.; Zhao, L.; Zhang, H.; Chen, Z. A Multi-Step Method with Signal Quality Assessment and Fine-Tuning Procedure to Locate Maternal and Fetal QRS Complexes from Abdominal ECG Recordings. Physiol. Meas. 2014, 35, 1665. [Google Scholar] [CrossRef] [PubMed]
  32. Khadiri, K.E.; Elouaham, S.; Nassiri, B.; Melhoaui, O.E.; Said, S.; Kamoun, N.E.; Zougagh, H. A Comparison of the Denoising Performance Using Capon Time-Frequency and Empirical Wavelet Transform Applied on Biomedical Signal. Int. J. Eng. Appl. (IREA) 2023, 11, 358–365. [Google Scholar] [CrossRef]
  33. Mohebbian, M.R.; Vedaei, S.S.; Wahid, K.A.; Dinh, A.; Marateb, H.R.; Tavakolian, K. Fetal ECG Extraction From Maternal ECG Using Attention-Based CycleGAN. IEEE J. Biomed. Health Inform. 2022, 26, 515–526. [Google Scholar] [CrossRef] [PubMed]
  34. Fotiadou, E.; Van Sloun, R.J.G.; Van Laar, J.O.E.H.; Vullings, R. A Dilated Inception CNN-LSTM Network for Fetal Heart Rate Estimation. Physiol. Meas. 2021, 42, 045007. [Google Scholar] [CrossRef]
  35. Darmawahyuni, A.; Tutuko, B.; Nurmaini, S.; Rachmatullah, M.N.; Ardiansyah, M.; Firdaus, F.; Sapitri, A.I.; Islami, A. Accurate Fetal QRS-Complex Classification from Abdominal Electrocardiogram Using Deep Learning. Int. J. Comput. Intell. Syst. 2023, 16, 158. [Google Scholar] [CrossRef]
  36. Karvounis, E.C.; Tsipouras, M.G.; Fotiadis, D.I.; Naka, K.K. An Automated Methodology for Fetal Heart Rate Extraction From the Abdominal Electrocardiogram. IEEE Trans. Inform. Technol. Biomed. 2007, 11, 628–638. [Google Scholar] [CrossRef] [PubMed]
  37. Krishna, B.T. Fetal ECG Extraction Using Time-Frequency Analysis Techniques. In Proceedings of the 2017 International Conference on Robotics and Automation Sciences (ICRAS), Hong Kong, China, 26–29 August 2017; pp. 167–171. [Google Scholar]
  38. Ting, Y.-C.; Lo, F.-W.; Tsai, P.-Y. Implementation for Fetal ECG Detection from Multi-Channel Abdominal Recordings with 2D Convolutional Neural Network. J. Sign. Process. Syst. 2021, 93, 1101–1113. [Google Scholar] [CrossRef]
  39. Germán-Salló, Z.; Germán-Salló, M. Non-Linear Methods in HRV Analysis. Procedia Technol. 2016, 22, 645–651. [Google Scholar] [CrossRef]
  40. Papadimitriou, S.; Bezerianos, A. Nonlinear Analysis of the Performance and Reliability of Wavelet Singularity Detection Based Denoising for Doppler Ultrasound Fetal Heart Rate Signals. Int. J. Med. Inform. 1999, 53, 43–60. [Google Scholar] [CrossRef] [PubMed]
  41. Yan, J.; Xia, C.; Wang, H.; Wang, Y.; Guo, R.; Li, F.; Yan, H. Nonlinear Dynamic Analysis of Wrist Pulse with Lyapunov Exponents. In Proceedings of the 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, 16–18 May 2008; pp. 2177–2180. [Google Scholar]
  42. Theiler, J. Estimating Fractal Dimension. J. Opt. Soc. Am. A 1990, 7, 1055. [Google Scholar] [CrossRef]
  43. Baker, K. Singular value decomposition tutorial. Ohio State Univ. 2005, 24, 22. [Google Scholar]
  44. Huang, N.E. (Ed.) Hilbert-Huang Transform and Its Applications; Interdisciplinary Mathematical Sciences; Repr.; World Scientific: Hackensack, NJ, USA, 2008; ISBN 978-981-256-376-7. [Google Scholar]
  45. Lee, K.J.; Lee, B. End-to-End Deep Learning Architecture for Separating Maternal and Fetal ECGs Using W-Net. IEEE Access 2022, 10, 39782–39788. [Google Scholar] [CrossRef]
  46. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  47. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  48. Hori, C.; Hori, T.; Lee, T.-Y.; Zhang, Z.; Harsham, B.; Hershey, J.R.; Marks, T.K.; Sumi, K. Attention-Based Multimodal Fusion for Video Description. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4203–4212. [Google Scholar]
  49. Salman, A.G.; Heryadi, Y.; Abdurahman, E.; Suparta, W. Single Layer & Multi-Layer Long Short-Term Memory (LSTM) Model with Intermediate Variables for Weather Forecasting. Procedia Comput. Sci. 2018, 135, 89–98. [Google Scholar] [CrossRef]
  50. Zhao, J.; Mao, X.; Chen, L. Speech Emotion Recognition Using Deep 1D & 2D CNN LSTM Networks. Biomed. Signal Process. Control 2019, 47, 312–323. [Google Scholar] [CrossRef]
  51. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Figure 1. The overall algorithmic workflow.
Figure 1. The overall algorithmic workflow.
Applsci 14 08987 g001
Figure 2. Four-channel signal sources.
Figure 2. Four-channel signal sources.
Applsci 14 08987 g002
Figure 3. Amplification of fetal ECG and sliding window segmentation.
Figure 3. Amplification of fetal ECG and sliding window segmentation.
Applsci 14 08987 g003
Figure 4. The network diagram for feature-level fusion.
Figure 4. The network diagram for feature-level fusion.
Applsci 14 08987 g004
Figure 5. The network diagram for late fusion based on the cross-modal attention algorithm.
Figure 5. The network diagram for late fusion based on the cross-modal attention algorithm.
Applsci 14 08987 g005
Figure 6. The network diagram for the model-level fusion algorithm based on ML-LSTM.
Figure 6. The network diagram for the model-level fusion algorithm based on ML-LSTM.
Applsci 14 08987 g006
Figure 7. The comparison of accuracy across different methods (a) and the ROC curve for the feature-level fusion method based on ResNet (b).
Figure 7. The comparison of accuracy across different methods (a) and the ROC curve for the feature-level fusion method based on ResNet (b).
Applsci 14 08987 g007
Table 1. Comparative analysis of experimental results.
Table 1. Comparative analysis of experimental results.
LiteratureResearch MethodsAcc (%)Se (%)Sp (%)PPV (%)F1 (%)
This paper (multimodal)Resnet34-Feature Level Fusion95.8597959591
Cross-Modal Attention-Post Fusion94.4895939490
MLLSTM-Model Level Fusion92.5192939388
ADFECGDB (Resnet34)97.8997989893
Comparative experiment of this paper (unimodal)LSTM-Time Domain85.4281908980
CNN-Frequency Domain88.4191868784
FCN-Feature Value85.0987838481
ResNet-Time–Frequency Domain94.0594949489
Ablation studyRemove time domain92.8493929288
Remove frequency domain94.3694959590
Remove feature values94.4594949590
Remove time–frequency domain84.5683868580
Kahankova et al. [12]ICA + LMS89.4190.4289.19
Mansourian et al. [13]AIPE90.7791.3290.95
Zhong et al. [23] QRStree61.561.761.6
Lee et al. [24]CNN + Post-processing89.192.8
Vo et al. [25]OctConv + ResNet91.890.391.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Z.; Yu, J.; Shan, Y.; Wang, X. A Non-Invasive Fetal QRS Complex Detection Method Based on a Multi-Feature Fusion Neural Network. Appl. Sci. 2024, 14, 8987. https://doi.org/10.3390/app14198987

AMA Style

Huang Z, Yu J, Shan Y, Wang X. A Non-Invasive Fetal QRS Complex Detection Method Based on a Multi-Feature Fusion Neural Network. Applied Sciences. 2024; 14(19):8987. https://doi.org/10.3390/app14198987

Chicago/Turabian Style

Huang, Zhuya, Junsheng Yu, Ying Shan, and Xiangqing Wang. 2024. "A Non-Invasive Fetal QRS Complex Detection Method Based on a Multi-Feature Fusion Neural Network" Applied Sciences 14, no. 19: 8987. https://doi.org/10.3390/app14198987

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop