Next Article in Journal
A Denoising Method for Mining Cable PD Signal Based on Genetic Algorithm Optimization of VMD and Wavelet Threshold
Previous Article in Journal
Multi-Target Tracking Algorithm Combined with High-Precision Map
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Pulse Rate Measurements from Facial Videos in Diverse Environments

1
Department of Electrical and Computer Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon-si 16419, Republic of Korea
2
School of Electronic Electrical Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon-si 16419, Republic of Korea
*
Authors to whom correspondence should be addressed.
Sensors 2022, 22(23), 9373; https://doi.org/10.3390/s22239373
Submission received: 9 November 2022 / Revised: 21 November 2022 / Accepted: 28 November 2022 / Published: 1 December 2022
(This article belongs to the Section Biomedical Sensors)

Abstract

:
Pulse wave and pulse rate are important indicators of cardiovascular health. Technologies that can check the pulse by contacting the skin with optical sensors built into smart devices have been developed. However, this may cause inconvenience, such as foreign body sensation. Accordingly, studies have been conducted on non-contact pulse rate measurements using facial videos focused on the indoors. Moreover, since the majority of studies are conducted indoors, the error in the pulse rate measurement in outdoor environments, such as an outdoor bench, car and drone, is high. In this paper, to deal with this issue, we focus on developing a robust pulse measurement method based on facial videos taken in diverse environments. The proposed method stably detects faces by removing high-frequency components of face coordinate signals derived from fine body tremors and illumination conditions. It optimizes for extracting skin color changes by reducing illumination-caused noise using the Cg color difference component. The robust pulse wave is extracted from the Cg signal using FFT–iFFT with zero-padding. It can eliminate signal-filtering distortion effectively. We demonstrate that the proposed method relieves pulse rate measurement problems, producing 3.36, 5.81, and 6.09 bpm RMSE for an outdoor bench, driving car, and flying drone, respectively.

1. Introduction

Pulse rate is an essential indicator of cardiovascular health. It can predict the incidence, progression, and mortality associated with cardiovascular disease (CVD). It can also contribute to the daily prevention of CVD and the evaluation of the rehabilitation progress of chronic diseases [1,2,3,4]. In hospitals and clinical institutions, physiological signals are traditionally used to provide users with valuable information about the cardiovascular system, including pulse, blood pressure, arterial oxygen saturation, breathing, and cardiac autonomic functions [5,6].
The importance of preventing diseases in daily life and the pursuit of human health are constantly increasing. In line with the increased interest in personal health, wearable and non-wearable technologies are also developing worldwide [4,7,8]. However, wearable technology may cause problems in pulse rate monitoring due to incomplete contact between the sensor and the skin [2,9]. The user must be in long-term contact with the sensor for monitoring, and it is necessary to continuously check whether the sensor is operating in tight contact with the user. The number of sensors must also match the number of individuals who are required to be measured. Therefore, wearable technology is unsuitable for certain individuals and environments, including newborns, drivers, exercisers, skin-damaged patients and disaster-affected patients [10].
To solve this problem, researchers are examining non-contact methods for measuring the pulse using facial video, which minimizes user intervention and enables monitoring of the pulse rate when driving [11], sleeping [12], or stress monitoring [13] is required.
The development of a non-contact method for measuring pulse rate that is quick, simple, and feasible in real-time has resulted in the creation of new applications for use in various fields [14,15]. The primary objective of non-contact pulse measurement studies is to analyze skin color changes in relation to heartbeats and facial movements using built-in cameras on smart devices such as mobile phones and tablets.
Previous research has demonstrated various methods for measuring pulse rate by applying independent component analysis (ICA), principal component analysis (PCA), fast Fourier transform (FFT), band pass filter (BPF) to RGB color data calculated in the region of interest (ROI) of facial images [1,7,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34], and by analyzing head movement and blinking [12,14,35] under controlled laboratory conditions. However, in external environments (outdoor bench, car, drone), the pulse rate cannot be accurately measured due to factors such as the user’s fine body tremor, illumination changes, non-detection of the face, etc.
In this paper, methods are presented for accurately measuring the robust pulse rate in diverse environments. In summary, our procedures are as follows.
We propose a method for stable face detection by removing high-frequency components of face coordinate signals derived from fine body tremors and illumination conditions. A moving average filter (MAF) is applied to a specified number of X and Y coordinate samples for the detected face coordinates when a face is detected in facial videos collected in diverse environments. Then, the skin ROI is identified based on the facial coordinates that have been detected reliably. Through this method, it is possible to effectively solve the problem of uncalculated skin color values resulting from non-detection of the face in an external environment.
Further, we use the Cg color difference component to optimize the extraction of skin color changes by minimizing illumination-induced noise. The RGB color system of the ROI is converted into the YCgCo color system in order to extract the Cg signal over time. The purpose of using Cg color is that it is optimized for detecting changes in skin color. R, G and B colors are unsuitable for extracting the pulse wave in diverse environments due to the fact that their values vary significantly under various illumination conditions, such as incandescent lighting, fluorescent lighting and sunlight [36,37].
We propose a method for the robust pulse wave extraction from the Cg signal calculated by the ROI of skin in diverse environments. Pulse rate measurement relies on the pulse wave, which can be extracted from contact-type PPG and ECG devices. To extract the pulse wave using the Cg signal, FFT is applied to the Cg signal calculated from the facial video to convert it into a frequency domain, and zero-padding is applied to all bands with the exception of those related to the pulse (0.67~3.34 Hz). Then, it is extracted by applying inverse FFT (iFFT) to the frequency domain computed by the zero-padding. Signal-filtering distortion is effectively eliminated with this method.
The proposed method for measuring the robust pulse rate in diverse environments uses the average of the RR interval (interval between peak points) calculated from the pulse wave signal.

2. Related Work

Remote non-contact pulse rate measurements are appealing for both commercial and academic applications [21,26]. Several papers from the past few years have proposed a color-based method for non-contact pulse rate measurements using general cameras to solve the problem of contact-type measuring devices [12,14,38]. Existing studies have commonly specified facial region detection and ROI. The pulse wave is subsequently calculated using RGB color channels extracted from facial videos taken by cameras. There are numerous ways to calculate pulse wave and rate. The methods of calculating pulse rate can be summarized as follows: (1) by applying ICA to RGB color data calculated in ROI and FFT to pulse-related frequency bands [1], (2) by detecting a peak point on a signal calculated by applying ICA [33], (3) by detecting a peak point on a pulse wave calculated by applying BPF by setting a cut-off frequency in a pulse-related frequency band [10], (4) by applying PCA to RGB color data calculated in ROI and FFT to pulse-related frequency bands [26], etc. When measuring pulse rates using color data of ROI from facial images, the majority of previous studies have allowed subjects to remain stationary and have only considered a constant indoor environment with minimal illumination changes. As a result of a controlled environment producing simple and nearly noise-free data, all reported results have achieved high precision. However, if the pulse rate is measured in external environments (cafes, mountains, etc.), cars, and drones, cameras can continue to shake when detecting the user’s face, and illumination can also change dramatically. Very few studies have determined how these methods are performed on challenging data when both illumination changes and the subject’s fine body tremor are included. To consider the problem of illumination changes in pulse rate measurements using facial videos, previous studies used methods of: (1) taking a facial video in a limited situation where light sources (LED, natural light, etc.) and subjects were immobilized [1,5,18,20,26], (2) using H, S, and V color data calculated by converting RGB color models of detected ROI to HSV color models, (3) using each component of the separated RGB by applying ICA to the RGB color model [12,26,33]. In order to solve the problem associated with the subject’s fine tremor, a specific number of landmarks were set in the detected face area, and a pulse rate measurement experiment was conducted using RGB color data from the ROI detected in the set face [10,18,19,21]. In addition, none of the data is accessible to the public, and new methods must provide new datasets; thus, repetitive data collection wastes time. Moreover, differences between databases make it difficult to compare methods objectively.

3. Methods

3.1. Process of Robust Pulse Wave Extraction

This paper aims to measure pulse waves and pulse rates using facial videos that are robust in diverse environments. The robust pulse rate measurement method proposed in this study consists of the following three-steps: pre-processing, extraction of the robust pulse wave, and pulse rate measurement. In the first step, we stably detected faces from a facial video. We also specified the ROI (region of interest) in the face area detected stably in the facial image and calculated the Cg color signal by converting the RGB color scheme of ROI into the YCgCo color scheme. In the next step, the robust pulse wave was calculated by applying FFT and inverse FFT (iFFT) to the calculated Cg signal in pulse-related frequency bands. In the last step, we proposed a method to calculate robust pulse rates in diverse environments using the average of interval values between detected peak points (RR interval) from calculated pulse waves. Figure 1 illustrates the flowchart of the proposed method.

3.1.1. Stable Face Detection

The most important aspect of research on pulse rate measurement using facial video was the development of a technology that can measure pulse rate anytime, anywhere using popular smart devices, for people who have difficulty gaining access to hospitals or medical facilities. You only look once (YOLO), Haar-like feature, histogram of oriented gradient (HOG), single-shot detection (SSD), media pipe, etc., are representative face detection algorithms. In this paper, while conducting a smartphone-based pulse rate measurement, we applied a face detection method using Haar, SSD, and YOLO. The most important aspect of the pulse rate measurement using a smart device-based facial video is the face detection rate, which is proportional to detecting more skin color changes per second. Experiments using a smartphone (Galaxy S10) with a 30 fps front camera revealed an average of 28 fps for Haar, 22 fps (73%) for SSD, and 17 fps (56%) for YOLO. We conducted a facial video-based pulse rate measurement using the Haar feature on the smart device with the highest face detection rate per second.
As shown in Figure 2, the Haar feature is a method for detecting objects based on brightness differences in regions of interest [39]. It performs differently depending on the characteristics of the objects. However, it demonstrates a rapid computational speed and a high accuracy with the face. The face is detected using the brightness difference between the eyes (Figure 2d) and the nose area and the brightness difference between the brows (Figure 2e).
X and Y coordinates of the detected face using the Haar feature were entered at 30 frames per second and the coordinate position was changed from frame to frame. X coordinates were unstably detected over time, as shown in Figure 3. This instability affected the cheek area’s coordinates based on plane coordinates, which hindered the extraction of color values from the identical area of each frame.
To establish a stable ROI, the high-frequency component of the detected plane’s coordinates must be eliminated in real-time. A moving average filter (MAF) was utilized to stabilize each frame’s rapidly changing coordinate value.
As shown in Figure 4, n samples of X or Y coordinates detected with facial images can be applied to formula (1) to compensate for the problem that detected face coordinates are calculated somewhat irregularly from frame to frame without moving the face. In formula (1), x ¯ k represents the X (or Y) coordinates of the corrected face, and x k represents the X (or Y) coordinates of the actual detected face.
x k   ¯ = x k n + 1 + x k n + 2 + x k n + 3 + + x k n
The corrected x-coordinate conversion before an application is shown in Figure 5. Compared with Figure 3, the coordinate value showed a gentle change after application. This can extract stable pulse waves with the proposed method and improve pulse rate measurement accuracy when measuring pulse rate outdoors using a smartphone or tablet by hand, or even in external environments such as when a driving car or flying a drone.

3.1.2. Setting the Skin ROI

In this paper, the first step in extracting a pulse wave from a skin region was to define the region of interest (ROI) where the user’s face could be detected from the camera input of a smart device and the pulse wave could be extracted most approximately to the PPG. The majority of smart devices are capable of acquiring 30 images per second, which does not interfere with the pulse wave extraction process proposed in this paper. The typical human pulse rate ranges from 40 to 200 times. The range of frequencies is up to 3.34 Hz. Consequently, a camera on a standard smart device with a sampling rate of 30 Hz can detect a pulse state within the pulse wave.
The pulse wave for calculating the pulse rate was extracted from the color value of the cheek area of the face. As shown in Figure 6, the cheek region has many arteries and a facial artery with the largest diameter of the facial blood vessels. Accordingly, a pulse wave could be obtained using the calculated color data after setting a cheek area as an area of interest in a facial image absorbing natural light or illumination.
Since illumination changes can affect the pulse rate calculation accuracy [15,40], the areas of the ROI are designated on both cheeks ( R O I 1 , R O I 2 ), as shown in Figure 7. The amount of illumination changes applied to both cheeks can be considered according to environmental changes. A high accuracy pulse rate may be calculated by applying a weighted average to the pulse rate calculated in both cheeks,
x R O I 1 = X + W × 0.3 ,   x R O I 2 = X + Y × 0.7 y R O I 1 o r   R O I 2 = Y + H × 0.5 w R O I 1 o r   R O I 2 = W × 0.08 ,   h R O I 1 o r   R O I 2 = W × 0.08  
and setting the region of interest in the cheek in the facial coordinates that are set with time. In the cheek area, x R O I 1 and x R O I 2 of the x-coordinate represent 30% and 70% of the width from the facial coordinate origin, and the y-coordinate is set at 50% of the facial height. The width and height of the cheek area are set to 8% of the face width, as given by formula (2).

3.1.3. Extraction of Robust Pulse Wave

The average Cg color of the ROI of the face is used to extract the pulse wave from the facial skin image. The Cg color is verified to be superior to R, G and B. Using only the R, G, and B row data of the ROI to calculate the pulse rate using facial images can reduce the pulse measurement accuracy because it is more affected by the illumination change than the detection of the skin color change, which is the most important factor for pulse rate calculation. Mathematically, the temporal C g i signal corresponding to i th of the ROI is given by formula (3):
C g i   s i g n a l = c g 1 i , c g 2 i , ,   c g n i
where f is the total number of frames and the c g k i color conversion in R, G and B colors is given by formula (4):
c g k i = 0.25 × r ¯ k i + 0.5 × g ¯ k i 0.25 × b ¯ k i
where r ¯ k i , g ¯ k i and b ¯ k i represent the overall average of R, G and B color values calculated in each pixel in the k th frame for the i th ROI, which is given by formula (5):
r ¯ k i = 1 p i x ,   y R i r k x ,   y
where p i is the total number of pixels in the i th ROI, R i is the i th ROI, x ,   y is a pixel location and r k represents the R color value in k th frame. g ¯ k i and b ¯ k i are calculated to be equal to r ¯ k i .
Before calculating the pulse wave; in the Cg signal shown in Figure 7, pulse-induced pulse waves are detected. In the frequency domain shown in Figure 8, a pulse frequency of 1.256 Hz was confirmed in the pulse-related frequency band of 0.67~3.34 Hz.
After applying iFFT to the data in a frequency domain limited to the pulse-related frequency band of Figure 8, it was possible to calculate pulse waves that were strong in various environments by converting them into a time domain. This does not require a consideration of delay or distortion caused by the filter of pulse waves calculated by applying BPF to R (or G or B) color data in existing studies.
Figure 9 shows the process of pulse wave extraction by applying FFT–iFFT based on zero-padding. The robust pulse wave is calculated by applying iFFT to the frequency domain, in which zero-padding is applied. Zero-padding is given by the following formula (6):
x z p n = x n ,   f p u l s e _ m i n n f p u l s e _ m a x = 0 ,           n < f p u l s e _ m i n ,   n > f p u l s e _ m a x
where x represents frequency values calculated by applying FFT to the Cg signal, n is the total number of samples and x z p denotes frequency values to which zero-padding is applied. Zero-padding is performed for the entire area except for the pulse-related frequency band ( f p u l s e _ m i n ~ f p u l s e _ m a x (Hz)). iFFT when applying zero-padding to the frequency values is given by formula (7):
X n = 1 N k = 0 N 1 x z p k × e j 2 π N n k   ,   n = 0 , 1 , , N 1

3.2. Robust Pulse Rate Measurement Method

We propose a method to measure pulse rates using the RR interval of pulse waves calculated through Figure 9 to measure pulse rates with a high accuracy in facial images. This section describes peak point detection, RR interval calculation, and pulse rate calculation using the RR interval for the proposed method, and further presents four methods (STFT), auto correlation function (ACF), and combined ICA and BPF to evaluate the proposed method.

3.2.1. Peak Detection of Pulse Wave

To calculate the RR interval from the pulse wave, the peak point at the top of the pulse wave must be detected, as shown in Figure 10. In existing studies [9,27,30], the pulse rate is estimated using the peak number of the pulse wave, which is an important characteristic for calculating the pulse rate.

3.2.2. Pulse Rate Measurement Using the RR Interval

The RR interval is used not only for heart rate variation (HRV) analysis, but also for pulse-related condition analysis, such as arrhythmia. The RR interval is the time difference between the previous peak and the next peak point of the pulse wave. In the case of simply calculating a pulse rate using the frequency of a pulse wave, unstable detection (or non-detection) of a peak point may be greatly affected by the pulse rate, so this paper calculates the pulse rate using the average of the RR interval. Figure 11 shows the calculation process of the RR interval.
r r i i = t p e a k i t p e a k i 1 ,   i = 1 , 2 , , n
where r r i i represents the i th of the RR interval, and the unit is seconds; p e a k i is the i th of the peak; t is the time function; t p e a k i represents the time of the i th p e a k , and the unit is seconds. The average of r r i i ( R R I ¯ ) of the calculated pulse wave is shown in the following formula (9):
R R I ¯ = 1 n i = 1 n r r i i
The RFPR (robust facial pulse rate) proposed in this paper uses the R R I ¯ calculated by r r i i , which is given by the formula (10):
RFPF p e r   M i n u t e = 60 A v e r a g e   o f   R R i n t e r v a l

3.3. Pulse Rate Measurement Method Using STFT

As another method of pulse rate estimation, the pulse rate estimation was performed by applying STFT (short-time Fourier transform) to the Cg signal calculated in the ROI, except for the proposed method. The Cg signal is subdivided into regular intervals, as shown in Figure 12.
STFT was applied to the subdivided Cg signal to calculate the pulse from the overall average of frequency values with the greatest power in the frequency domain within the pulse-related frequency band. The average of the largest frequency power for each section is given by the following formula (11):
f ¯ = 1 n i = 1 n f m a x i
where f ¯ represents the average of the largest frequency power for each section, f m a x i is the largest of the frequency power at i th of the section, and n is the total number of the section. We measure the pulse rate using f ¯   calculated by f m a x i , which is given by the formula (12):
F P R p e r   M i n u t e   u s i n g   S T F T = f ¯ × 60

3.4. Pulse Rate Measurement Method Using ACF

As another method of pulse rate estimation, the pulse rate estimation was performed by applying the ACF (auto correlation function) to the pulse wave calculated in the ROI, except for the proposed method. The pulse wave is subdivided at regular intervals to apply the ACF, and the pulse rate is calculated using average intervals of the detected peak at the signal, as shown in Figure 13.
p p i i = p e a k i p e a k i 1 ,   i = 1 , 2 , , n
where p p i i represents the i th of the peak interval, and the unit is the sample rate. p e a k i is the i th of peak. The average of p p i i ( P P I ¯ ) of the signal calculated by applying the ACF to the pulse wave is shown in formula (14):
P P I ¯ = 1 n i = 1 n p p i i
We measure the pulse rate using the P P I ¯ calculated by p p i i and the F p s of the facial video, which is given by (15):
FPF p e r   M i n u t e   u s i n g   A C F = 60 × F p s   o f   F a c i a l   V i d e o A v e r a g e   o f   P e a k   i n t e r v a l

3.5. Pulse Rate Measurement Method Using ICA and BPF for RGB Color Data

Pulse rate estimation is also performed by applying BPF and ICA to R, G, and B color data calculated from the ROI of the facial video. RGB components are derived by applying ICA to R, G and B color signals calculated in the ROI. This is shown in Figure 14.
BPF is applied to the calculated components to calculate the pulse wave of each of R, G, and B, and the pulse rate is estimated by detecting the peak point from each calculated pulse wave. Figure 15 shows R, G, and B pulse waves calculated by applying BPF to R, G, and B components, compared with time-synchronized PPG signals, and we use G pulse waves to evaluate the performance of the method proposed in this paper.

3.6. Weighted Average Pulse Rate for ith of ROI

In this paper, a weighted average was applied to pulse rates calculated from the ROI of both cheeks ( R O I 1 , R O I 2 ) to calculate pulse rates that are robust to illumination changes. The amount of change in illumination applied throughout the skin according to environmental changes may be considered. The F a c i a l   P u l s e   R a t e w a calculated by applying the weighted average is given by formula (16):
F P R w a = w 1 × F P R r o i 1 + w 2 × F P R r o i 2 + + w n × F P R r o i n
As shown in Figure 16, the ROI in the facial image may be from various regions other than both cheeks. In the experimental results, the w 1 and w 2 were set to 0.5 and the experiment was conducted.

4. Results

4.1. Recorded Dataset

In this paper, the proposed pulse measurement method evaluated its performance against the five methods for facial video-based pulse measurements with PPG devices commonly used in medical facilities, such as hospitals and nursing homes.

4.1.1. Indoor Dataset

In the case of an indoor environment, a total of 375 facial videos for a video length of 60 s were collected from 25 different subjects (20 males and 5 females). As shown in Figure 17, videos were acquired from smartphone cameras (Galaxy S10+). Subjects took facial videos while wearing a PPG device (PPG signal calculation and facial video acquisition were time-synchronized and simultaneously performed). The videos obtained thus had a resolution of 640 × 480 pixels and a frame rate of 30 fps.

4.1.2. External Dataset

In an outdoor environment, a total of 60 face videos with a video length of 60 s were collected from six different subjects (five males and one female) while driving cars and flying drones to conduct pulse rate measurement experiments using facial videos. As shown in Figure 18 and Figure 19, subjects took facial videos using the camera from driving cars and flying drones while wearing PPG devices. The signal measured in the PPG device was collected using Bluetooth transmission.
In this paper, as shown in Figure 20, a user’s face color data were extracted using the camera of a smartphone attached to the drone to measure the pulse rate. However, there are cases where it was impossible to detect the face due to the influence of the drone shaking and the surrounding lighting environment. The face color might not be completely extracted because the face is not detected, as shown in Figure 21. Thus, interpolation is needed.
When a face region is detected when flying a drone or driving a car, the skin region of interest is set, a Cg signal is calculated, and an undetected region is generated. To solve this problem, a cubic spline interpolation method was used in this paper. The process is shown in Figure 20. The cubic spline interpolation is given by formulas (17) and (18).
A cubic spline is an algorithm that smoothly connects a given point. This is because the curve connecting the two points is a cubic polynomial ( a 0 + a 1 x + a 2 x 2 + a 3 x 3 ). In other words, we need to connect two points that are apart.
We have five blue points. Cubic Spline interpolation can be used to obtain the data value of x i + 1 in x i and to connect smooth curves. That is, the function values of the two curves at each point must be the same, and the derivative of the two curves at each point x n must also be the same.
  S i = f x i   a n d   S i x i + 1 = f x i + 1   f o r   e a c h   i = 0 , 1 , , n 2   S i + 1 x i + 1 = S i x i + 1     f o r   e a c h   i = 0 , 1 , , n 2 S i + 1 x i + 1 = S i x i + 1     f o r   e a c h   i = 0 , 1 , , n 2 S i + 1 x i + 1 = S i x i + 1   f o r   e a c h   i = 0 , 1 , , n 2
S x 0 = S x n = 0   S x 0 = f x 0   a n d   S x n = f x n
The S i is converted into a third-order polynomial by applying a Cg signal with undetected regions to the proposed formulas (17) and (18), the undetected part can be solved as shown in the interposed Cg signal in Figure 20, and the pulse rate can be calculated.

4.2. Performance Evaluation

The performance evaluation used in our experiments is based on the calculated FPR error ( y i y ^ i ), where y i denotes the actual pulse rate calculated from i th of the PPG and y ^ i represents the facial pulse rate (FPR) calculated from ith of the facial video. The evaluation includes M A E (mean absolute error), R M S E (root mean square error), and M A P E (mean absolute percentage error), which are given by formula (19):
M A E = 1 n i = 1 n y i y ^ i ,   R M S E = i = 1 n y i y ^ i 2 n ,   M A P E   % = 1 n i = 1 n y i y ^ i y i  
where |∙| represents the absolute operator, and n is the total number for the facial video. The lower value of M A E ,   R M S E   a n d   M A P E means that the calculated FPR estimates and actual PR are closer. As another evaluation method, the similarity between FPR and the actual PR was evaluated through correlation coefficient analysis.

4.2.1. Performance Evaluation for Stable and Unstable Face Detection

Figure 21 shows the results of measuring FPR by applying the Cg signal before and after stable face detection to the method proposed in this paper, and comparing FPR with the actual pulse rate calculated in the PPG device.
We compared FPR with the actual pulse rate calculated in the PPG device. Our experimental results on stable and unstable datasets are presented in Table 1. Except for the color change of the face, noise, such as head tremors, has a tremendous impact on pulse wave and pulse rate calculations, resulting in a lower accuracy of pulse rate measurements.
In the indoor environment dataset, the pulse rate was estimated by the proposed method for the calculated Cg signal by applying stable and unstable face detection methods and comparing with the actual pulse rate calculated by the PPG device. It was confirmed that the pulse rate measurement accuracy was improved.

4.2.2. Performance Evaluation for Indoor Dataset

In this paper, we compared pulse rate calculation results of three additional methods (STFT, ACF, ICA–BPF, weighted average) to evaluate the performance of the proposed method. A graph of the results for each method is shown in Figure 22. With the proposed method of RFPR (robust facial pulse rate), r showed the highest correlation of 0.9472 and FPR (ICA–BPF) showed the lowest correlation of 0.4648. This is because RGB colors may be greatly affected by illumination changes. Our experimental results on the indoor dataset are presented in Table 2.
As shown in Table 2, pulse rates were calculated using Cg signals calculated in both cheeks ( R O I 1 , R O I 2 ) for our proposed method and three additional methods. Effects of illumination changes were considered by multiplying pulse rates calculated in each method by the weights of left cheek 0.5 and right cheek 0.5. The FPR with the weighted average showed a higher accuracy than the FPR without the weighted average. In addition, the M A P E of our proposed method (RFPR) was improved by 0.51%, from 3.30% to 2.79%.

4.2.3. Performance Evaluation for External Dataset

When performing pulse rate measurement, it is difficult to calculate the pulse rate with a high accuracy using facial videos in an external environment, because noises, such as illumination changes, head tremors, and so on, are more diverse than those in an indoor environment. We compared the calculated pulse rate through facial videos taken from flying drones, driving cars, and outdoors with a PPG device. Our experimental results on the external environment dataset are presented in Table 3.
As shown in Table 3, the proposed method in this paper showed higher accuracy than the other three methods, with a M A P E of 3.34% for outdoors, 6.33% for flying drone, and 6.50% for driving car. In addition, pulse rate measurement results through frequency domain analysis (STFT) were calculated with the lowest accuracy, with a M A P E of 8.02% for outdoors, 8.02% for driving car, and 10.11% for flying drone. It was confirmed that the frequency analysis of pulse-related frequency bands of facial videos are susceptible to noise generated in external environments.

4.2.4. Performance Evaluation for Short-Time Pulse Rate Measurement

As shown in Table 4, the proposed method showed a high accuracy even for short-time pulse rate measurements. M A P E was calculated as 4.08% at 30 s, 4.33% at 15 s, 4.85% at 10 s.

4.2.5. Performance Evaluation on an Indoor Environment according to Distance

We also separately collected facial videos of targets farther than 50 cm using a webcam with an optical zoom function to evaluate pulse rate measurement accuracy according to distance. The distance was 1 m, 5 m, 15 m, 20 m, 25 m, 30 m and 50 m. All 42 facial videos for subjects (three males) were taken with six facial videos according to the distance. Even if a face is detected at 1 m or more, the size of the ROI decreases as the distance increases. For this reason, a normal camera, such as that of a smartphone and a smart device, has limitations in pulse rate measurement. To solve this problem, the optical zoom function of the webcam was applied to expand it to the same height and width as the facial area detected by a smartphone camera. The pulse rate was estimated using the Cg signal calculated from the skin ROI of the enlarged facial video. The M A P E for each distance was calculated to evaluate the performance. The results are shown in Table 5.

5. Discussion

In summary, we implemented the stable face detection method and selected the optimal ROI, which enabled the effective analysis of skin color changes. In addition, the robust pulse wave and rate was calculated by using FFT–iFFT with zero-padding. When measuring pulse rates using facial videos collected in indoor and external environments by removing noise from signal distortions, our method is more accurate than other methods (STFT, ACF, ICA–BPF).
We observed possible limitations of our method during experiments. Facial angle changes and facial videos of long distances reduce pulse rate measurement accuracy, because our method calculates the pulse wave/rate from the ROI for arterial blood in the cheek. However, developing a technology that can measure pulse rates using facial videos even in poor conditions would be highly valued when used in diverse fields. Furthermore, our approach is likely to be improved in the future if possible limitations can be considered by combining methodologies associated with deep learning-based face and ROI detection. Since the focus of this research methodology is lightweight, based on cameras of smart devices available in diverse environments, it did not include a deep learning-based methodology. In addition, the smart device environment has a lighter weight than the PC environment, which can result in a lower face detection rate when using deep learning methodology-based face detection and continuous calculation of skin color data. However, we will conduct extensive experiments in the future, including our approach and deep learning methodology.

6. Conclusions

In this paper, we proposed a robust pulse rate measurement method in diverse environments. In addition to measuring the pulse rate in an indoor environment, as in existing studies, facial videos were collected from outdoors, cars, and drones. Since the pulse rate calculation using a facial video is affected by noise, such as illumination changes and user movement, accuracy might be lowered. Therefore, we have achieved the precise detection of skin color changes using the stable face detection method and have calculated the pulse rate with a high accuracy by applying the FFT–iFFT method, which could prevent signal distortion and delay. We also applied a stable face detection method. Experimental results showed s higher accuracy after application than before applying the weighted average method considering the amounts of illumination changes applied to both cheeks. The proposed technology can help us not only manage health in an internal environment but also analyze conditions of drivers or victims in real time using cars and drones. Our future work will use cameras higher than 30 fps along with more dataset collection to observe more detailed color changes in facial skin areas. It will be used to measure various bio-signals, such as respiratory, blood pressure, and body temperature, as well as a more accurate pulse rates.

Author Contributions

Conceptualization, J.P. and K.H.; methodology, K.H.; software, J.P.; validation, J.P. and K.H.; formal analysis, K.H.; investigation, J.P.; resources, K.H.; writing—original draft preparation, J.P.; writing—review and editing, J.P. and K.H.; visualization, J.P.; supervision, K.H.; project administration, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Sungkyunkwan University (2022-11-046 and 1 November 2022).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data used to support the findings of this study are available from the corresponding author upon request. However, videos require permission for subjects.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Qi, L.; Yu, H.; Xu, L.; Mpanda, R.S.; Greenwald, S.E. Robust heart-rate estimation from facial videos using Project_ICA. Physiol. Meas. 2019, 40, 085007. [Google Scholar] [CrossRef]
  2. Allen, J. Photoplethysmography and its application in clinical physiological measurement. Physiol. Meas. 2007, 28, R1. [Google Scholar] [CrossRef] [Green Version]
  3. Cook, S.; Togni, M.; Schaub, M.C.; Wenaweser, P.; Hess, O.M. High heart rate: A cardiovascular risk factor? Eur. Heart J. 2006, 27, 2387–2393. [Google Scholar] [CrossRef]
  4. Jain, P.K.; Tiwari, A.K. Heart monitoring systems—A review. Comput. Biol. Med. 2014, 54, 1–13. [Google Scholar] [CrossRef]
  5. Yan, Y.; Ma, X.; Yao, L.; Ouyang, J. Noncontact measurement of heart rate using facial video illuminated under natural light and signal weighted analysis. Bio-Med. Mater. Eng. 2015, 26, S903–S909. [Google Scholar] [CrossRef] [Green Version]
  6. Scully, C.G.; Lee, J.; Meyer, J.; Gorbach, A.M.; Granquist-Fraser, D.; Mendelson, Y.; Chon, K.H. Physiological parameter monitoring from optical recordings with a mobile phone. IEEE Trans. Biomed. Eng. 2011, 59, 303–306. [Google Scholar] [CrossRef] [Green Version]
  7. Short, D.S. The syndrome of alternating bradycardia and tachycardia. Br. Heart J. 1954, 16, 208. [Google Scholar] [CrossRef] [Green Version]
  8. Da Costa, G. Optical remote sensing of heartbeats. Opt. Commun. 1995, 117, 395–398. [Google Scholar] [CrossRef]
  9. Jain, M.; Deb, S.; Subramanyam, A.V. Face video based touchless blood pressure and heart rate estimation. In Proceedings of the 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), Montreal, QC, Canada, 21–23 September 2016; pp. 1–5. [Google Scholar]
  10. Gupta, P.; Bhowmick, B.; Pal, A. MOMBAT: Heart rate monitoring from face video using pulse modeling and Bayesian tracking. Comput. Biol. Med. 2020, 121, 103813. [Google Scholar] [CrossRef]
  11. Pilz, C.S.; Zaunseder, S.; Canzler, U.; Krajewski, J. Heart rate from face videos under realistic conditions for advanced driver monitoring. Curr. Dir. Biomed. Eng. 2017, 3, 483–487. [Google Scholar] [CrossRef]
  12. Balakrishnan, G.; Durand, F.; Guttag, J. Detecting pulse from head motions in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2013, Portland, OR, USA, 23–28 June 2013; pp. 3430–3437. [Google Scholar]
  13. Rapczynski, M.; Werner, P.; Saxen, F.; Al-Hamadi, A. How the region of interest impacts contact free heart rate estimation algorithms. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2027–2031. [Google Scholar]
  14. Poh, M.Z.; McDuff, D.J.; Picard, R.W. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 2010, 18, 10762–10774. [Google Scholar] [CrossRef]
  15. Verkruysse, W.; Svaasand, L.O.; Nelson, J.S. Remote plethysmographic imaging using ambient light. Opt. Express 2008, 16, 21434–21445. [Google Scholar] [CrossRef] [Green Version]
  16. Sanyal, S.; Nundy, K.K. Algorithms for monitoring heart rate and respiratory rate from the video of a user’s face. IEEE J. Transl. Eng. Health Med. 2018, 6, 2700111. [Google Scholar] [CrossRef]
  17. Garbey, M.; Sun, N.; Merla, A.; Pavlidis, I. Contact-free measurement of cardiac pulse based on the analysis of thermal imagery. IEEE Trans. Biomed. Eng. 2007, 54, 1418–1426. [Google Scholar] [CrossRef]
  18. Zhang, B.; Li, H.; Xu, L.; Qi, L.; Yao, Y.; Greenwald, S.E. Noncontact Heart Rate Measurement Using a Webcam, Based on Joint Blind Source Separation and a Skin Reflection Model: For a Wide Range of Imaging Conditions. J. Sensors 2021, 2021, 9995871. [Google Scholar] [CrossRef]
  19. Zheng, K.; Ci, K.; Cui, J.; Kong, J.; Zhou, J. Non-contact heart rate detection when face information is missing during online learning. Sensors 2020, 20, 7021. [Google Scholar] [CrossRef]
  20. Chen, X.; Cheng, J.; Song, R.; Liu, Y.; Ward, R.; Wang, Z.J. Video-based heart rate measurement: Recent advances and future prospects. IEEE Trans. Instrum. Meas. 2018, 68, 3600–3615. [Google Scholar] [CrossRef]
  21. Li, X.; Chen, J.; Zhao, G.; Pietikainen, M. Remote heart rate measurement from face videos under realistic situations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, Columbus, OH, USA, 23–28 June 2014; pp. 4264–4271. [Google Scholar]
  22. Lewandowska, M.; Rumiński, J.; Kocejko, T.; Nowak, J. Measuring pulse rate with a webcam—A non-contact method for evaluating cardiac activity. In Proceedings of the 2011 IEEE Federated Conference on Computer Science and Information Systems (FedCSIS), Szczecin, Poland, 18–21 September 2011; pp. 405–410. [Google Scholar]
  23. Rouast, P.V.; Adam, M.T.; Chiong, R.; Cornforth, D.; Lux, E. Remote heart rate measurement using low-cost RGB face video: A technical literature review. Front. Comput. Sci. 2018, 12, 858–872. [Google Scholar] [CrossRef]
  24. Kwon, S.; Kim, H.; Park, K.S. Validation of heart rate extraction using video imaging on a built-in camera system of a smartphone. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012; pp. 2174–2177. [Google Scholar]
  25. Pursche, T.; Krajewski, J.; Moeller, R. Video-based heart rate measurement from human faces. In Proceedings of the 2012 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 13–16 January 2012; pp. 544–545. [Google Scholar]
  26. Sahin, S.M.; Deng, Q.; Castelo, J.; Lee, D. Non-Contact Heart Rate Monitoring from Face Video Utilizing Color Intensity. J. Multimed. Inf. Syst. 2021, 8, 1–10. [Google Scholar] [CrossRef]
  27. Lomaliza, J.P.; Park, H. Improved heart-rate measurement from mobile face videos. Electronics 2019, 8, 663. [Google Scholar] [CrossRef]
  28. Fernandes, S.L.; Gurupur, V.P.; Sunder, N.R.; Arunkumar, N.; Kadry, S. A novel nonintrusive decision support approach for heart rate measurement. Pattern Recognit. Lett. 2020, 139, 148–156. [Google Scholar] [CrossRef]
  29. Hassan, M.A.; Malik, A.S.; Fofi, D.; Saad, N.; Karasfi, B.; Ali, Y.S.; Meriaudeau, F. Heart rate estimation using facial video: A review. Biomed. Signal Process. Control. 2017, 38, 346–360. [Google Scholar] [CrossRef]
  30. Martinez, N.; Bertran, M.; Sapiro, G.; Wu, H.T. Non-contact photoplethysmogram and instantaneous heart rate estimation from infrared face video. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2020–2024. [Google Scholar]
  31. Oviyaa, M.; Renvitha, P.; Swathika, R. Real Time Tracking of Heart Rate from Facial Video Using Webcam. In Proceedings of the 2020 IEEE Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; pp. 1–7. [Google Scholar]
  32. Takano, C.; Ohta, Y. Heart rate measurement based on a time-lapse image. Med. Eng. Phys. 2007, 29, 853–857. [Google Scholar] [CrossRef]
  33. Alghoul, K.; Alharthi, S.; Al Osman, H.; El Saddik, A. Heart rate variability extraction from videos signals: ICA vs. EVM comparison. IEEE Access 2017, 5, 4711–4719. [Google Scholar] [CrossRef]
  34. Yu, Z.; Li, X.; Niu, X.; Shi, J.; Zhao, G. Autohr: A strong end-to-end baseline for remote heart rate measurement with neural searching. IEEE Signal Process. Lett. 2020, 27, 1245–1249. [Google Scholar] [CrossRef]
  35. Lomaliza, J.P.; Park, H. Robust heart rate estimation from mobile face videos using an auto-encoder. Electron. Lett. 2020, 56, 1169–1172. [Google Scholar] [CrossRef]
  36. Bousefsaf, F.; Maaoui, C.; Pruski, A. Continuous wavelet filtering on webcam photoplethysmographic signals to remotely assess the instantaneous heart rate. Biomed. Signal Process. Control 2013, 8, 568–574. [Google Scholar] [CrossRef]
  37. Lee, D.; Plataniotis, K.N. A novel high dynamic range image compression scheme of color filter array data for the digital camera pipeline. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 325–328. [Google Scholar]
  38. Poh, M.Z.; McDuff, D.J.; Picard, R.W. Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Trans. Biomed. Eng. 2010, 58, 7–11. [Google Scholar] [CrossRef] [Green Version]
  39. Lienhart, R.; Maydt, J. An extended set of haar-like features for rapid object detection. In Proceedings of the IEEE International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002; Volume 1, p. I. [Google Scholar]
  40. Liu, X.; Yang, X.; Wang, D.; Wong, A. Detecting pulse rates from facial videos recorded in unstable lighting conditions: An adaptive spatiotemporal homomorphic filtering algorithm. IEEE Trans. Instrum. Meas. 2020, 70, 4001215. [Google Scholar] [CrossRef]
Figure 1. Flowchart of our proposed method.
Figure 1. Flowchart of our proposed method.
Sensors 22 09373 g001
Figure 2. Expended set of Haar-like features: (a) edge features, (b) line features, (c) center-surround features. Face detection based on (d) the edge feature of Haar, (e) the line feature of Haar.
Figure 2. Expended set of Haar-like features: (a) edge features, (b) line features, (c) center-surround features. Face detection based on (d) the edge feature of Haar, (e) the line feature of Haar.
Sensors 22 09373 g002
Figure 3. X coordinate position with time.
Figure 3. X coordinate position with time.
Sensors 22 09373 g003
Figure 4. Example of n samples for facial X (or Y)-coordinates.
Figure 4. Example of n samples for facial X (or Y)-coordinates.
Sensors 22 09373 g004
Figure 5. Calibrated coordinate result (n = 15).
Figure 5. Calibrated coordinate result (n = 15).
Sensors 22 09373 g005
Figure 6. (a) Blood vessel and region of interest (red rectangle) of the facial artery. (b) Detected cheek ROI (green rectangle) from the facial video.
Figure 6. (a) Blood vessel and region of interest (red rectangle) of the facial artery. (b) Detected cheek ROI (green rectangle) from the facial video.
Sensors 22 09373 g006
Figure 7. Cg signal recorded for 60 s.
Figure 7. Cg signal recorded for 60 s.
Sensors 22 09373 g007
Figure 8. Frequency domain of the Cg signal (Figure 7).
Figure 8. Frequency domain of the Cg signal (Figure 7).
Sensors 22 09373 g008
Figure 9. Process of the pulse wave extraction.
Figure 9. Process of the pulse wave extraction.
Sensors 22 09373 g009
Figure 10. Peak detection of pulse waves.
Figure 10. Peak detection of pulse waves.
Sensors 22 09373 g010
Figure 11. RR interval calculation of the pulse wave.
Figure 11. RR interval calculation of the pulse wave.
Sensors 22 09373 g011
Figure 12. Separation of the Cg signal sections.
Figure 12. Separation of the Cg signal sections.
Sensors 22 09373 g012
Figure 13. Peak-to-peak interval calculation of the signal with applied ACF to the pulse wave.
Figure 13. Peak-to-peak interval calculation of the signal with applied ACF to the pulse wave.
Sensors 22 09373 g013
Figure 14. RGB components calculated by applying ICA to RGB color data.
Figure 14. RGB components calculated by applying ICA to RGB color data.
Sensors 22 09373 g014
Figure 15. R, G, B pulse waves calculated by applying BPF to the RGB component.
Figure 15. R, G, B pulse waves calculated by applying BPF to the RGB component.
Sensors 22 09373 g015
Figure 16. Selection of multiple ROIs.
Figure 16. Selection of multiple ROIs.
Sensors 22 09373 g016
Figure 17. Dataset collection environment for an indoor lab.
Figure 17. Dataset collection environment for an indoor lab.
Sensors 22 09373 g017
Figure 18. Dataset collection environment when driving a car.
Figure 18. Dataset collection environment when driving a car.
Sensors 22 09373 g018
Figure 19. Dataset collection environment when flying a drone (DJI WM3315).
Figure 19. Dataset collection environment when flying a drone (DJI WM3315).
Sensors 22 09373 g019
Figure 20. Process of interpolation for a Cg signal.
Figure 20. Process of interpolation for a Cg signal.
Sensors 22 09373 g020
Figure 21. Comparison of FPR calculation results by stable and unstable methods.
Figure 21. Comparison of FPR calculation results by stable and unstable methods.
Sensors 22 09373 g021
Figure 22. Graphs comparing RFPR with other FPR methods (STFT, ACF, ICA–BPF).
Figure 22. Graphs comparing RFPR with other FPR methods (STFT, ACF, ICA–BPF).
Sensors 22 09373 g022
Table 1. Comparative results of the FPR on stable and unstable datasets.
Table 1. Comparative results of the FPR on stable and unstable datasets.
MethodAnalysis of Different Conditions
(Using Cg Data of the Left Cheek)
R M S E   M A E   M A P E   %   r
Stable FPR3.112.633.300.9266
Unstable FPR5.834.936.210.7016
Table 2. Comparative results of the FPR on indoor dataset.
Table 2. Comparative results of the FPR on indoor dataset.
MethodAnalysis of Different Conditions
R M S E M A E M A P E   % r
RFRR
(Proposed)
3.112.633.300.9266
STFT-FPR6.144.565.560.7662
ACF-FPR4.863.854.870.8144
ICA and BPF-FPR7.766.458.270.4082
Weighted AverageFRPR
(Proposed)
2.672.222.790.9472
STFT5.734.465.390.8162
ACF4.393.624.610.8573
ICA & BPF7.506.298.100.4648
Table 3. Comparative results of the FPR on external dataset.
Table 3. Comparative results of the FPR on external dataset.
External Environment (Weighted Average Application)
ConditionsMetricsRFPRSTFTACFICA&BPF
Outdoor Bench R M S E 3.367.523.975.98
M A E 2.896.313.585.12
M A P E 3.348.023.856.27
r 0.90330.43770.86300.7229
Driving Car R M S E 5.817.355.996.31
M A E 5.176.245.315.88
M A P E 6.338.026.367.07
r 0.74260.47670.71890.6122
Flying Drone R M S E 6.099.396.578.84
M A E 5.327.895.966.84
M A P E 6.5010.117.328.89
r 0.75150.36980.60120.3914
Table 4. MAPE (%) of the short-time FHR on indoor dataset.
Table 4. MAPE (%) of the short-time FHR on indoor dataset.
Sec\MethodRFPRSTFTACFICA&BPF
( P e a k × 60 / n s e c )
60   s s a m p l e 2.795.394.618.10
30   s s a m p l e 4.086.464.628.32
15   s s a m p l e 4.337.125.438.76
10   s s a m p l e 4.857.246.649.25
Table 5. MAPE (%) of the FHR according to distance.
Table 5. MAPE (%) of the FHR according to distance.
Method\Distance50 cm1 m5 m15 m20 m25 m30 m50 m
RFPR 2.833.113.857.076.958.048.2310.6
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Park, J.; Hong, K. Robust Pulse Rate Measurements from Facial Videos in Diverse Environments. Sensors 2022, 22, 9373. https://doi.org/10.3390/s22239373

AMA Style

Park J, Hong K. Robust Pulse Rate Measurements from Facial Videos in Diverse Environments. Sensors. 2022; 22(23):9373. https://doi.org/10.3390/s22239373

Chicago/Turabian Style

Park, Jinsoo, and Kwangseok Hong. 2022. "Robust Pulse Rate Measurements from Facial Videos in Diverse Environments" Sensors 22, no. 23: 9373. https://doi.org/10.3390/s22239373

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop