Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank

Lee, Jukyung; Joo, Hyosung; Woo, Jihwan

doi:10.3390/app142311107

Open AccessArticle

Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank

by

Jukyung Lee

¹

,

Hyosung Joo

²

and

Jihwan Woo

^1,2,*

¹

Department of Biomedical Engineering, University of Ulsan, Ulsan 44610, Republic of Korea

²

Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(23), 11107; https://doi.org/10.3390/app142311107

Submission received: 25 September 2024 / Revised: 20 November 2024 / Accepted: 27 November 2024 / Published: 28 November 2024

(This article belongs to the Special Issue Monitoring of Human Physiological Signals)

Download

Browse Figures

Versions Notes

Abstract

:

Remote photoplethysmography (rPPG) is a non-contact technology that monitors heart activity by detecting subtle color changes within the facial blood vessels. It provides an unconstrained and unconscious approach that can be widely applied to health monitoring systems. In recent years, research has been actively conducted to improve rPPG signals and to extract significant information from facial videos. However, rPPG can be vulnerable to degradation due to changes in the illumination and motion of a subject, and overcoming these challenges remains difficult. In this study, we propose a machine learning-based filter bank (MLFB) noise reduction algorithm to improve the quality of rPPG signals. The MLFB algorithm determines the optimal spectral band for extracting information on cardiovascular activity and reconstructing an rPPG signal using a support vector machine. The proposed approach was validated with an open dataset, achieving a 35.5% (i.e., resulting in a mean absolute error of 2.5 beats per minute) higher accuracy than those of conventional methods. The proposed algorithm can be integrated into various rPPG algorithms for the pre-processing of RGB signals. Moreover, its computational efficiency is expected to enable straightforward implementation in system development, making it broadly applicable across the healthcare field.

Keywords:

remote photoplethysmography; heart rate; machine learning; filter bank; support vector machine

1. Introduction

Heart rate (HR) is a primary biomarker of human physical and mental health and provides information on cardiovascular activity [1]. Heart rate is typically estimated using electrocardiography (ECG) or contact photoplethysmography (cPPG) [2,3]. ECG records the electrical activity of the heart. cPPG employs a wearable optical sensor, commonly placed on the fingertip and earlobe, to measure changes in blood volume and oxygen saturation levels within the microvascular bed. cPPG specifically measures oxygen saturation levels in blood vessels and computes heart rate. Remote photoplethysmography (rPPG) is an unconstrained and non-contact heart rate monitoring approach that detects subtle changes in facial color due to variations in blood volume within the facial blood vessels. By measuring non-contact facial videos, rPPG can estimate heart rate and other related cardiovascular parameters [4]. This approach has been used to monitor heart rate during sleeping, driving, and exercising, where a physical contact sensor is not appropriate. It is also employed to determine the authenticity of deepfakes [5,6,7,8].

In recent years, rPPG research has focused on extracting pulse signals from facial videos [9,10]. De Haan et al. (2014) suggested the PBV (i.e., blood volume pulse signature method) for better motion robustness [11]. CHROM (i.e., chrominance-based method) was developed to enhance motion robustness [12]. Wang et al. (2017) introduced the plane-to-orthogonal-skin (POS) approach, which isolates pulse signals by establishing a projection plane perpendicular to the skin tone within the RGB color space [13]. Poh et al. (2011) separated the RGB color signal to estimate cardiac activity using independent component analysis (ICA), and Lewandowska et al. (2011) employed principal component analysis (PCA) and demonstrated that it reduced computational complexity while preserving accuracy compared with the ICA approach [14,15]. Deep learning has also been incorporated to extract pulse signals effectively. Wang et al. (2016) proposed 2SR (i.e., spatial subspace rotation), a data-driven algorithm designed to estimate the spatial subspace of skin pixels and analyze their temporal changes to determine the HR [16]. Chen et al. (2018) introduced DeepPhys, which uses end-to-end convolutional networks to learn facial color changes as time-series data and to estimate pulse signals [17].

In rPPG recordings, movements of the head, face, or other body parts can cause unwanted variation or distortion in the rPPG signal. This motion artifact can reduce the accuracy of heart rate measurement. Most recent research has focused on effectively rejecting this motion noise and accurately extracting pulse signals. Abdulrahaman (2024) showed that rPPG spectral filtering techniques that divide and analyze signals in the frequency domain and wavelet transforms can reduce noise and reconstruct pulsed signals [18]. Wang et al. (2017) developed amplitude-selective filtering (ASF) to detect signs of cardiac activity [19]. This method uses an R-channel, which is susceptible to high-amplitude motion artifacts, potentially impacting the signal. The amplitudes that fall outside a specified threshold range are regarded as noise. They also suggested that the color distortion function distinguishes between color changes induced by pulses and noise [20]. The relative pulsatile amplitude, which represents the characteristics of the rPPG signal according to the red (R), green (G), and blue (B) channels, has been employed in previous studies. However, spatial redundancy, which represents the characteristics of the rPPG signal appearing in the entire face area, has not been studied yet [21].

In this study, a machine learning approach was developed to refine the rPPG signal by effectively isolating signal components with prominent cardiac activity. The key to this advancement is the use of spatial redundancy features (including phase coherence and frequency variability), which can reflect subtle color changes in the face and other features suggested in previous works. A predefined filter bank decomposes the pulse signal into frequency bands, and machine learning is used to select the signals from bands that reflect dominant heart activity. The approach was validated with an open-access dataset, demonstrating its potential for improved pulse signal detection. We also aimed to assess the relative importance of these features.

The main contributions of this work are as follows:

Proposed a novel machine learning approach to isolate pulse signal components with dominant cardiac activity to refine pulse signal;
Evaluated the importance of features demonstrating their effectiveness in improving pulse signal detection accuracy.

2. Materials and Methods

2.1. Facial Video Dataset

This study employed the Vicar-2 dataset, which consisted of 40 facial videos from ten subjects (seven men and three women) [22]. The facial videos were recorded in H264 format using an RGB Camera (Brio webcam, Logitech, Lausanne, Switzerland) in four different situations: BASE (representing the resting state), HRV (representing a naturalistic situation with conversation and movement in a Stroop task situation), RUN (representing the post-workout state), and MOV (representing a situation with various movements). Each facial video was recorded for 5 min at a frame rate of 30 frame/s and a resolution of 720 × 1280. The dataset included both ECG (AD8232 device, Analog Devices, Wilmington, MA, USA) and cPPG (CMS50E device, CONTEC, Shanghai, China) data, which were simultaneously recorded with facial video at sampling rates of 60 and 250 Hz, respectively.

2.2. Extraction of RGB Signal on Facial Regions

To extract RGB signals from facial video, detecting the desired area from video data is necessary. Kim et al. (2021) reported that the accuracy in rPPG signal extraction depended on the facial region and provided a rank list for each region [23]. By optimizing the number of regions, nine regions (central forehead, left forehead, right forehead, upper-left cheek, upper-right cheek, lower-left cheek, lower-right cheek, nose, and glabella) of interest (ROIs) were selected in this study (see the white-lined area of video in Figure 1). Each ROI was manually categorized from the 468 landmarks marked in green-dots in Figure 1, which were computed by the media-pipe face mesh library [24]. Each R, G, and B channel data was averaged across each segmented ROI as follows:

C_{i} = \sum_{n \in R O I} \frac{P_{n}}{N}

(1)

where C_i represents the averaged value for each color channel i (i.e., C_R, C_G, and C_B for red, green, and blue, respectively), P_n is the color intensity, N is the total number of pixels in the segmented ROI, and n is the pixel index.

2.3. Machine Learning-Based Filter Bank (MLFB) Algorithm

Figure 1 shows the overall scheme of the MLFB algorithm proposed in this study. First, the pulse signal of rPPG_POS(t) was processed from the RGB component using a POS algorithm that eliminated intensity variations by projecting the RGB components onto a plane perpendicular to a specified normalized skin-tone vector [13]. Subsequently, the MLFB algorithm operates in four major steps: (1) decomposing using 16 sub-band pass filters, (2) feature extraction, (3) computation of the most dominant spectral sub-band over time, and (4) generation of the pulse signal. First, the pulse signal was decomposed to 16 Y_i by 16 narrow band-pass filters, each with a bandwidth of 0.4 Hz and an overlap of 0.2 Hz, covering the range from 0.8 to 4.2 Hz. The filter bank was specifically designed to cover the range of human heart-beating activity from 48 to 252 bpm [22].

Eleven features were extracted from each filtered signal, as listed in Table 1, and denoted by F_i,j, where i represents a sub-band (i.e., one of the bands produced by the filter bank) numbered from 1 to 16, and j denotes a feature class ranging from 1 to 11. The support vector machine (SVM) was trained using 10-fold cross-validation to identify the sub-band where the pulse signal is most dominant at each time point. The model produced a binary output, with 0 indicating a less-dominant sub-band and 1 indicating a dominant sub-band. The series of binary outputs was named the dominant sub-band temporal matrix (DSTM). Finally, the refined pulse signal of rPPG_MLFB was computed by multiplying the filtered signal of Y_i with DSTM.

2.3.1. Feature Extraction from Sub-Band Pulse Signal

The 11 features were computed from each decomposed pulse signal, as listed in Table 1. Phase coherence is crucial for assessing the coherence of the rPPG signals. The Hilbert transform computes the coherence of HR signals, in particular facial regions [26]. To extract robust phase coherence, a moving average using a 20 s window was applied to identify the frequency information that was less susceptible to noise. The frequency variability was computed by identifying the frequency peaks within each sub-band and calculating the standard deviation of the intervals between these peaks. The smallest value of this standard deviation was then used as a feature. As the POS amplitude and the mean and standard deviation of the RGB amplitude indicate the strength of the rPPG signal, they were also used as features. Finally, these features formed a 16 × 11 × T (time) matrix, which was used as the input of the SVM to distinguish between blood flow signals and noise effectively.

2.3.2. Generation of Refined Pulse Signal

As described in the previous section, the SVM was trained using a 10-fold cross-validation approach with data from 10 subjects [27]. The SVM classified the decomposed pulse signal into binary values of 0 or 1 dependent on informativeness. The accuracy of the binary series, which are output by the SVM, was evaluated by comparing them with the referenced binary series. These binary series were calculated by assigning binary values to the sub-bands, where each binary value is set to 1 if the cPPG heart rate frequency is within the sub-band and 0 otherwise. To evaluate the performance of the SVM, these binary values of 1 and 0 were assigned to positive and negative predictions, respectively. The true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values were calculated between the predicted and reference binary series. Subsequently, the performance of the SVM model was quantified using the following metrics:

\begin{array}{l} Accuracy = (TP + TN) / (TP + TN + FP + FN) \\ F 1 score = 2 \times TP / (2 \times TP + FP + FN) \\ Sensitivity = TP / (TP + FN) \\ Specificity = TN / (TN + FP) \end{array}

(2)

Finally, the resulting binary series, DSTM(t), was multiplied with the decomposed pulse signals to produce the refined pulse signal, rPPG_MLFB(t), which retains only the frequency bands.

{rPPG}_{M L F B} (t) = Y (t) \cdot D S T M (t)

(3)

2.3.3. Evaluation of the Performance of MLFB Algorithm

The signal reconstructed by rPPG_MLFB was then subjected to band-pass filtering (BPF), which filtered the spectral components in the frequency band 0.8 to 4.2 Hz, containing a large proportion of rPPG signals, to remove frequency noise outside the bpm. The filtered signal over the entire time period of the dataset was calculated using a fast Fourier transform in the form of bpm by multiplying the frequency corresponding to the most significant energy peak by a sampling of 60 Hz. The mean absolute error (MAE) compared with the HR obtained from cPPG was used (as a reference) for validation. The HR was computed at concatenated intervals of 20, 60, and 100 s. After a 0.8 to 4.2 Hz BPF of the cPPG signal, the frequency at which the dominant power was determined was set to the HR at each interval. Here, MAE denotes the difference between the predicted HR_rPPG and reference HR_cPPG and was computed as

MAE = \frac{1}{N} \sum_{i = 1}^{N} |H R_{r P P G} (i) - H R_{c P P G} (i)|

(4)

where N denotes the number of concatenated intervals. To evaluate the level of signal quality, the SNR was evaluated using the average power density approach, as follows:

S N R_{d e n s i t y} = 10 \log_{10} \frac{\sum_{f = 0.8}^{4.2} U (f) \hat{S} (f)}{\sum_{f = 0.8}^{4.2} (1 - U (f)) \hat{S} (f)} \times \frac{\sum_{f = 0.8}^{4.2} 1 - U (f)}{\sum_{f = 0.8}^{4.2} U (f)}

(5)

where f is the frequency ranging from 0.8 to 4.2 Hz,

\hat{S} (f)

is the spectral power of rPPG,

U (f)

is a binary window that contains the signal frequency component, and

1 - U (f)

is a binary window that contains the noise frequency component [12]. A binary window for the signal was centered based on the frequency analysis of cPPG, with the bandwidth set to 0.08 Hz.

The performance of our approach was compared with four different noise reduction algorithms: no pre-processing (none), BPF, ASF, and the MLFB algorithm. “None” refers to the application of the POS algorithm without noise reduction preprocessing. We applied these noise-reduction techniques to six different pulse extraction algorithms for rPPG signals: G [4], G-R [28], PCA [15], ICA [14], CHROM [12], and POS [13].

3. Results

3.1. Evaluating Pulse Signal Extraction

Figure 2 shows an example of rPPG signals computed from facial video data under three conditions (Resting, Stroop task, and Movement) and cPPG (as a reference for heart activity). The upper panel shows the rPPG_POS obtained in a previous study using only the POS algorithm, and the lower panel shows the rPPG_MLFB obtained using the proposed MLFB algorithm. Under resting conditions, both rPPG_MLFB and rPPG_POS exhibited morphological patterns similar to those of the reference signal; however, rPPG_POS demonstrated increased noise levels. The extent of signal degradation in rPPG_POS became more pronounced during movement conditions. Nevertheless, rPPG_MLFB maintained a strong correspondence with the reference signal, even under movement.

3.2. Evaluating the Results of Applying MLFB Algorithm

Figure 3A illustrates the correlation between HR_rPPG derived from rPPG_MLFB and HR_cPPG obtained via cPPG across four conditions (Resting, Stroop task, Post-workout, and Movement) from 10 subjects. The results demonstrated a strong correlation between the two approaches, with a correlation coefficient of 0.99, except for the condition involving movement, where the correlation coefficient was 0.95.

Figure 3B shows the Bland–Altman plot depicting the difference between the HRs measured by the two methods as a function of the average HR_rPPG and HR_cPPG. The mean differences and 95% limits of agreement were as follows: Resting (−0.39 bpm, −4.39–4.09 bpm), Stroop task (−0.42 bpm, −3.55–3.09 bpm), Post-workout (−0.83 bpm, −3.12–1.72 bpm), and Movement (2.25 bpm, −7.03–10.53 bpm). All data except for the movement condition showed differences of <5. Furthermore, the results of applying various noise reduction algorithms to pulse signals extracted using the same method indicate that, in most cases, the highest accuracy was achieved when the proposed algorithm was applied (compare the data in the columns in Table 2).

3.3. Evaluating MAE by Window Size

The MAE of HR estimated using the previously reported no pre-processing (none), BPF, and ASF algorithms, as well as the proposed MLFB algorithm, is represented in Figure 4. The window size was linearly divided into 40 s intervals, starting from the shortest time containing the blood flow signal, and time intervals of 20, 60, and 100 s were used to drive the HR estimation from each rPPG. The MLFB algorithm approach resulted in significantly (*: p < 0.05, **: p < 0.01, and ***: p < 0.001) lower errors than the other approaches. As the window size increased, the errors decreased. However, a longer window size required more stacked rPPG data, and the HR updated it less frequently.

Table 2 summarizes the MAE of HR according to the combined approach of pulse extraction algorithm (G [4], G-R [28], PCA [15], ICA [14], CHROM [12], PBV [11], POS [13]), and noise reduction algorithm (none, BPF, ASF, and MLFB). Because the evaluation results showed significant differences with a window size of 60 s, the data were evaluated using a 60 s window. When the video data were processed using a combination of the POS and MLFB algorithms, the calculated HR exhibited the lowest MAE of 2.53 bpm. The signal-to-noise ratios for each case in Table 2 are summarized in Table 3. The combination of POS and the proposed MLFB algorithms resulted in the highest value of −1.02 dB. Among the four noise reduction algorithms, the MLFB effectively processed noise in most cases, thereby improving signal quality.

3.4. Machine Learning Evaluation Result

The importance of the 11 features used in the proposed MLFB algorithm model was investigated using the SHapley Additive exPlanations (SHAP) method [29]. The SHAP values for each feature are shown in Figure 5. As seen, phase coherence was the most important feature, with a value of 95%. The next most important features were the standard deviation and amplitude average of the POS signal, with values of 40% and 25%, respectively.

The SVM used the features representing various signal characteristics to label the sub-bands with binary values of 1 and 0. As explained in Section 2.3.2, these labels were then compared to the referenced binary series derives from the cPPG. The accuracy, F1-score, sensitivity, and specificity across different combinations of model features are summarized in Table 4. Evidently, most models achieved an accuracy of over 97% and a specificity of over 98%, indicating the effectiveness of the SVM in identifying sub-bands with prominent cardiac activity. Interestingly, there were no significant differences in performance between models using different numbers of features.

Table 5 lists the performance and computational time based on a combination of these three features. When all 11 features were used, the computational time was significantly (p < 0.001) longer than when three or fewer features were used. However, no significant differences were observed between the models.

4. Discussion

In this study, we developed a noise reduction algorithm, referred to as the MLFB algorithm, aimed at enhancing pulse signal retrieval from RGB face video datasets. The results demonstrated that the MLFB algorithm outperformed previous methodologies in effectively denoising the pulse signals. The robustness of rPPG against variations in movement is critical because this method relies on video data for signal extraction [30]. Our study further demonstrated the reliability of the proposed algorithm by calculating rPPG from video data collected under four distinct measurement conditions: Resting, Stroop task, Post-workout, and Movement.

The improved performance of the MLFB algorithm highlights its potential applicability in various contexts in which accurate pulse signal extraction is crucial, including daily activities, sports, and driving. By enhancing the signal quality, this algorithm can facilitate more reliable cardiovascular assessments based on facial video data. HR variability analysis can also be performed, offering a deeper understanding of the autonomic nervous system that controls homeostatic mechanisms in response to internal and external stimuli [31]. Moreover, the proposed algorithm allows sequential integration with existing pulse extraction methodologies, thereby expanding its applicability across various applications.

Although the results of this study were superior, further refinement of the algorithm and exploration of its integration with other techniques to optimize its performance across diverse environments and conditions is necessary. The number of features significantly influences both computational speed and accuracy. Specifically, although an increase in features may enhance accuracy, it concurrently slows down the calculation speed, as summarized in Table 5. This necessitates an optimization strategy for feature selection that balances computational efficiency with signal accuracy when applying the proposed algorithm. Furthermore, the retrieval of rPPG signals is significantly affected by the movement of the subject and the illumination conditions of the measurement environment. Low light conditions challenge reliable video data acquisition, highlighting the need for alternative methodologies to effectively assess facial color changes under such circumstances. Recent advancements have explored the combination of infrared and RGB camera data to address these limitations [32]. Strategies to enhance signal reliability across varying environmental conditions will be investigated in the near future.

5. Conclusions

This study presents a novel denoising algorithm, MLFB, to enhance the quality and improve the accuracy of rPPG signals. The MLFB algorithm proceeds through the following steps: (1) decomposing using 16 sub-band pass filters, (2) feature extraction, (3) computation of the most informative spectral sub-band over time, and (4) generation of the refined pulse signal. Validation on a face video dataset demonstrated that the MLFB algorithm achieved lower MAE in HR estimation and improved SNR compared to the previous methods. The proposed approach offers a robust and reliable method for capturing heart activity. It is suitable for diverse applications such as fitness monitoring, driver alertness assessment, sleep analysis, and various areas requiring non-contact and continuous HR monitoring.

Author Contributions

Conceptualization, J.L., H.J. and J.W.; methodology, J.L. and H.J.; validation, J.L. and J.W.; formal analysis, J.L. and H.J.; writing and editing, J.L. and J.W.; visualization, J.L. and J.W.; project administration, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the 2024 Research Fund of the University of Ulsan.

Data Availability Statement

The video data used in this study is an open-access dataset by Gudi et al. (2020) [22].

Conflicts of Interest

The authors declare no conflicts of interest.

References

McDuff, D.; Gontarek, S.; Picard, R. Remote measurement of cognitive stress via heart rate variability. In Proceedings of the 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Chicago, IL, USA, 26–30 August 2014; pp. 2957–2960. [Google Scholar] [CrossRef]
Fye, W.B. A history of the origin, evolution, and impact of electrocardiography. Am. J. Cardiol. 1994, 73, 937–949. [Google Scholar] [CrossRef] [PubMed]
Allen, J. Photoplethysmography and its application in clinical physiological measurement. Physiol. Meas. 2007, 28, R1–R39. [Google Scholar] [CrossRef] [PubMed]
Verkruysse, W.; Svaasand, L.O.; Nelson, J.S. Remote plethysmographic imaging using ambient light. Opt. Express 2008, 16, 21434–21445. [Google Scholar] [CrossRef] [PubMed]
Cobos-Torres, J.-C.; Abderrahim, M.; Martínez-Orgado, J. Non-contact, simple neonatal monitoring by photoplethysmography. Sensors 2018, 18, 4362. [Google Scholar] [CrossRef] [PubMed]
Wu, B.-F.; Chu, Y.-W.; Huang, P.-W.; Chung, M.-L.; Lin, T.-M. A motion robust remote-PPG approach to driver’s health state monitoring. In Computer Vision–ACCV Workshops: ACCV International Workshops; Taipei, Taiwan, Revised Selected Papers; Springer: Cham, Switzerland, 2017; Part I13; pp. 463–476. [Google Scholar] [CrossRef]
Wu, J.; Zhu, Y.; Jiang, X.; Liu, Y.; Lin, J. Local attention and long-distance interaction of rPPG for deepfake detection. Vis. Comput. 2024, 40, 1083–1094. [Google Scholar] [CrossRef]
Yu, Z.; Li, X.; Zhao, G. Facial-video-based physiological signal measurement: Recent advances and affective applications. IEEE Signal Process. Mag. 2021, 38, 50–58. [Google Scholar] [CrossRef]
Xiao, H.; Liu, T.; Sun, Y.; Li, Y.; Zhao, S.; Avolio, A. Remote photoplethysmography for heart rate measurement: A review. Biomed. Signal Process. Control 2024, 88, 105608. [Google Scholar] [CrossRef]
Wedekind, D.; Malberg, H.; Zaunseder, S.; Gaetjen, F.; Matschke, K.; Rasche, S. Automated identification of cardiac signals after blind source separation for camera-based photoplethysmography. In Proceedings of the 35th International Conference on Electronics and Nanotechnology (ELNANO), Kyiv, Ukraine, 21–24 April 2015. [Google Scholar] [CrossRef]
De Haan, G.; Van Leest, A. Improved motion robustness of remote-PPG by using the blood volume pulse signature. Physiol. Meas. 2014, 35, 1913–1926. [Google Scholar] [CrossRef]
De Haan, G.; Jeanne, V. Robust pulse rate from chrominance-based rPPG. IEEE Trans. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef]
Wang, W.; den Brinker, A.C.; Stuijk, S.; de Haan, G. Algorithmic principles of remote ppg. IEEE Trans. Biomed. Eng. 2017, 64, 1479–1491. [Google Scholar] [CrossRef]
Poh, M.-Z.; McDuff, D.J.; Picard, R.W. Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Trans. Biomed. Eng. 2011, 58, 7–11. [Google Scholar] [CrossRef] [PubMed]
Lewandowska, M.; Rumiński, J.; Kocejko, T.; Nowak, J. Measuring Pulse Rate With a Webcam—A Noncontact Method for Evaluating Cardiac Activity. In Proceedings of the Federated Conference on Computer Science and Information Systems, Szczecin, Poland, 18–21 September 2011. [Google Scholar]
Wang, W.; Stuijk, S.; De Haan, G. A novel algorithm for remote photoplethysmography: Spatial subspace rotation. IEEE Trans. Biomed. Eng. 2016, 63, 1974–1984. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; McDuff, D. Deepphys: Video-based physiological measurement using convolutional attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Abdulrahaman, L.Q. Two-stage motion artifact reduction algorithm for rPPG signals obtained from facial video recordings. Arab. J. Sci. Eng. 2024, 49, 2925–2933. [Google Scholar] [CrossRef]
Wang, W.; den Brinker, A.C.; Stuijk, S.; de Haan, G. Amplitude-selective filtering for remote-ppg. Biomed. Opt. Express 2017, 8, 1965–1980. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; den Brinker, A.C.; Stuijk, S.; de Haan, G. Color-distortion filtering for remote photoplethysmography. In Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; Volume 2017. [Google Scholar] [CrossRef]
Wang, W.; Stuijk, S.; De Haan, G. Exploiting spatial redundancy of image sensor for motion robust rPPG. IEEE Trans. Biomed. Eng. 2014, 62, 415–425. [Google Scholar] [CrossRef]
Gudi, A.; Bittner, M.; Van Gemert, J. Real-time webcam heart-rate and variability estimation with clean ground truth for evaluation. Appl. Sci. 2020, 10, 8630. [Google Scholar] [CrossRef]
Kim, D.-Y.; Lee, K.; Sohn, C.-B. Assessment of ROI selection for facial video-based rPPG. Sensors 2021, 21, 7923. [Google Scholar] [CrossRef] [PubMed]
Kartynnik, Y.; Ablavatski, A.; Grishchenko, I.; Grundmann, M. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv 2019, arXiv:1907.06724. [Google Scholar]
Dimmock, S.; O’donnell, C.; Houghton, C. Bayesian analysis of phase data in EEG and MEG. eLife 2023, 12, e84602. [Google Scholar] [CrossRef] [PubMed]
Johansson, M. The Hilbert Transform. Master’s Thesis, Växjö University, Växjö, Sweden, 1999. [Google Scholar]
Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 20–25 August 1995; Morgan Kaufman Publishing: San Francisco, CA, USA, 1995. [Google Scholar]
Hülsbusch, M.; Rembold, B. Ein Bildgestütztes, Funktionelles Verfahren zur Optoelektronischen Erfassung der Hautperfusion. Ph.D. Thesis, Lehrstuhl und Institut für Hochfrequenztechnik, Aachen, Germany, 2008. [Google Scholar]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Li, J.; Yu, Z.; Shi, J. Learning motion-robust remote photoplethysmography through arbitrary resolution videos. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington DC, USA, 7–14 February 2023; Volume 37, pp. 1334–1342. [Google Scholar] [CrossRef]
Nayak, S.K.; Pradhan, B.; Mohanty, B.; Sivaraman, J.; Ray, S.S.; Wawrzyniak, J.; Jarzębski, M.; Pal, K. A review of methods and applications for a heart rate variability analysis. Algorithms 2023, 16, 433. [Google Scholar] [CrossRef]
Lie, W.-N.; Le, D.-Q.; Lai, C.-Y.; Fang, Y.-S. Heart rate estimation from facial image sequences of a dual-modality RGB-NIR camera. Sensors 2023, 23, 6079. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Schematic of the proposed machine learning-based filter bank (MLFB) algorithm. The pulse signal of remote photoplethysmography (rPPG_POS) is derived using the plane-to-orthogonal-skin (POS) algorithm, using the red [R(t)], green [G(t)], and blue [B(t)] components of the video signals. The signal is then processed through a 16-sub-band filter bank, and the feature matrix (F) is computed. A support vector machine (SVM) is employed to generate a dominant sub-band temporal matrix (DSTM), which is used to output the refined pulse signal of rPPG_MLFB.

Figure 2. Example of computed remote photoplethysmography signals (rPPG) using the plane-to-orthogonal-skin (POS) algorithm in blue (rPPG_POS) and the proposed machine learning-based filter bank algorithm in pink (rPPG_MLFB). Resting, Stroop task, and Movement represent the situation of the subject during video recording. The gray line plots contact photoplethysmography (cPPG) signals used as reference.

Figure 3. (A) Correlation between heart rate calculated using contact photoplethysmography (HR_cPPG) and heart rate extracted from video data using the machine learning-based filter bank algorithm (HR_rPPG). (B) Bland–Altman plot comparing the two methods.

Figure 4. Comparison of the mean absolute error of heart rate retrieved by four different algorithms with three window sizes: no pre-processing (none) in blue, band-pass filter (BPF) in white, amplitude-selective filter (ASF) in gray, and machine learning-based filter bank (MLFB) in pink. Statistically significant difference is denoted by *: p < 0.05, **: p < 0.01, and ***: p < 0.001.

Figure 5. Importance of the 11 features on refining rPPG signal based on SHapley Additive exPlanations (SHAP) values. The features listed on the ordinate orderly: phase coherence, POS standard deviation, POS amplitude, red standard deviation, frequency variability, green standard deviation, blue amplitude, green amplitude, red amplitude, POS, and blue standard deviation.

Table 1. Description of rPPG signal features used in machine learning.

Feature Types	Feature	Description
Spatial redundancy	Phase coherence [25]	Coherence in the phase of a narrow-banded pulse signal across 9 ROIs
Spatial redundancy	Frequency variability	Variability in the number of peaks within a specific frequency band across 9 ROIs
Relative pulsatile amplitude	R, G, B standard deviation	The standard deviation of amplitude in each red (R), green (G), and blue (B) signal across 9 ROIs
	POS standard deviation	The standard deviation of amplitude of extracted pulse signals across 9 ROIs
	R, G, B amplitude	Averaged amplitude in each red (R), green (G), and blue (B) signal
	POS amplitude	Averaged amplitude of extracted pulse signals across 9 ROIs
	POS [13]	The amplitude of the pulse signal extracted from 9 ROIs

Table 2. Mean absolute error of heart rate is computed by the combination of six pulse extraction and four noise reduction algorithms (see the text for more details of each approach). Statistical differences between the results of the machine learning-based filter bank (MLFB) and other algorithms within the same pulse extraction algorithm were evaluated using the Wilcoxon non-parametric test (denoted by *: p < 0.05, **: p < 0.01, and ***: p < 0.001).

	Mean Absolute Error of Heart Rate
Noise Reduction Algorithm	Pulse Extraction Algorithm
Noise Reduction Algorithm	G	G-R	PCA	ICA	CHROM	POS
None	14.80 ***	10.15 *	15.12 ***	10.86 **	4.61 **	3.92 **
BPF	10.60 *	8.14	11.62	7.82	4.43 **	3.53 *
ASF	14.73 ***	10.13 *	7.97	8.48	4.27 *	3.75
MLFB	8.32	9.83	9.41	7.19	2.68	2.53

Note: G, green; G-R, green–red; PCA, principal component analysis; ICA, independent component analysis; CHROM, chrominance-based method; POS, plane-to-orthogonal-Skin; BPF, band-pass filtering; ASF, amplitude selective filtering; MLFB, machine learning-based filter bank.

Table 3. Average signal-to-noise ratio on pulse signal is computed by the combination of six pulse extraction and four noise reduction algorithms. Statistical significance and differences between the results of the machine learning-based filter bank (MLFB) and other algorithms within the same pulse extraction algorithm were evaluated using the Wilcoxon non-parametric test (denoted by *: p < 0.05, **: p < 0.01, and ***: p < 0.001).

	Signal-to-Noise Ratio
Noise Reduction Algorithm	Pulse Extraction Algorithm
Noise Reduction Algorithm	G	G-R	PCA	ICA	CHROM	POS
None	1.70 ***	3.44 ***	2.24	3.06 *	6.00 ***	6.31 ***
BPF	2.33 **	4.08 **	2.79	3.60	6.30 ***	6.61 ***
ASF	1.81 ***	3.65 **	4.63 *	3.97	6.03 ***	6.35 ***
MLFB	4.42	5.98	2.52	4.60	9.85	10.00

Table 4. Performance of machine-learning models with different feature combinations: phase coherence (F₁), POS standard deviation (F₂), POS amplitude (F₃), and all 11 features (F_all). The values represent averages across all 10 subjects.

Metric (%)	Combination of Model Features
Metric (%)	F₁	$F_{1} +$ F₂	$F_{1} +$ $F_{2} +$ F₃	F_all
Accuracy	97.97	97.84	97.81	97.91
F1-score	91.67	91.40	91.26	91.57
Sensitivity	89.94	91.20	90.95	91.19
Specificity	99.11	98.79	98.79	98.86

Table 5. Effect of feature selection on the machine learning-based filter bank (MLFB) algorithm performance. Mean absolute error (MAE), computational time, and accuracy were evaluated according to the combination of features of phase coherence (F₁), POS standard deviation (F₂), POS amplitude (F₃), and all 11 features (F_all) listed in Figure 5. Wilcoxon non-parametric test was used to determine a statistically significant difference between the F_all case and other combined feature cases, denoted by ***: p < 0.001).

Evaluation	Combination of Model Features
Evaluation	F₁	$F_{1} +$ F₂	$F_{1} +$ $F_{2} +$ $F_{3}$	F_all
MAE (bpm)	2.54	2.60	2.56	2.53
Computational time (s)	48.38 ***	48.71 ***	52.82 ***	108.28

Note: bpm, beats per minute.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, J.; Joo, H.; Woo, J. Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank. Appl. Sci. 2024, 14, 11107. https://doi.org/10.3390/app142311107

AMA Style

Lee J, Joo H, Woo J. Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank. Applied Sciences. 2024; 14(23):11107. https://doi.org/10.3390/app142311107

Chicago/Turabian Style

Lee, Jukyung, Hyosung Joo, and Jihwan Woo. 2024. "Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank" Applied Sciences 14, no. 23: 11107. https://doi.org/10.3390/app142311107

APA Style

Lee, J., Joo, H., & Woo, J. (2024). Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank. Applied Sciences, 14(23), 11107. https://doi.org/10.3390/app142311107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Remote Photoplethysmography Using Machine Learning-Based Filter Bank

Abstract

1. Introduction

2. Materials and Methods

2.1. Facial Video Dataset

2.2. Extraction of RGB Signal on Facial Regions

2.3. Machine Learning-Based Filter Bank (MLFB) Algorithm

2.3.1. Feature Extraction from Sub-Band Pulse Signal

2.3.2. Generation of Refined Pulse Signal

2.3.3. Evaluation of the Performance of MLFB Algorithm

3. Results

3.1. Evaluating Pulse Signal Extraction

3.2. Evaluating the Results of Applying MLFB Algorithm

3.3. Evaluating MAE by Window Size

3.4. Machine Learning Evaluation Result

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI