*2.3. Preprocessing and VPG Signal Extraction*

The selected region of interest is then used to calculate the average color intensities over the ROI for each subsequent image frame. These values are stored in a circular buffer of length N, forming the raw VPG signal *y***0**(*n*) = [ R0(*n*), G0(*n*), B0(*n*)]T. Then the raw VPG signal is detrended using a simple method consisting of mean-centering and scaling [21] (Equation (1)):

$$y(n) = \frac{y\_0(n) - \mu(n, L)}{\mu(n, L)} \tag{1}$$

where μ(*n*, *L*) is an L-point running mean vector of VPG signal and *y*(*n*) = [ R(*n*), G(*n*), B(*n*)]T.

The strongest VPG signal can be observed in the green (G) channel. Because the camera's RGB color sensors pick up a mixture of reflected VPG signal along with other sources of fluctuations, such as motion and changes in ambient lighting conditions, various approaches to overcome this problem have been reported in the literature. In [22] a robust alternative to G method has been presented—green-red difference (GRD) which minimizes the impact of artifacts (Equation (2)):

$$GRD(n) = G(n) - R(n) \tag{2}$$

Some authors utilize the fact that each color sensor registers a mixture of original source signals with slightly different weights and uses the independent component analysis (ICA) [17,34]. The ICA model assumes that the observed signals *y*(*n*) are linear mixtures of sources **s**(*n*). The aim of ICA is to find the separation matrix *W* whose output (Equation (3):

$$
\hat{\mathbf{s}}(n) = \mathbf{W} \cdot \mathbf{y}(n) \tag{3}
$$

is an estimate of the vector *s*(*n*) containing the underlying source signals. The order in which ICA returns the independent components is random. Thus, the component whose power spectrum contained the highest peak can be selected for further analysis. In this work, we used FastICA implementation [35] and calculated power spectrum in the range 35–180 bpm (which corresponds to 0.583–3.00 Hz).

In our research, we found that method for greenness identification [36] utilizing the excess green image component (ExG), amplify the pulse signal and it is faster to compute than the ICA while reducing the impact of noise. The ExG image representation is computed as follows. First, the normalized components r, g and b are calculated using Equation (4):

$$r(n) = \frac{R(n)}{R(n) + G(n) + B(n)} \quad g(n) = \frac{G(n)}{R(n) + G(n) + B(n)} \quad b(n) = \frac{B(n)}{R(n) + G(n) + B(n)} \tag{4}$$

The excess green component ExG is defined by Equation (5):

$$\text{ExG}(n) = 2 \cdot \text{g}(n) - r(n) - b(n) \tag{5}$$

The refined VPG signal (G, GRD, ICA or ExG) is then band-limited by a zero-phase digital filter (Bartlet-Hamming) yielding the signal VPG(*n*). The summary of the pre-processing, VPG signal extraction and heart rate estimation steps is provided in Figure 3.

**Figure 3.** HR estimation algorithm outline.

#### *2.4. Heart Rate Estimation Algorithm*

To estimate the heart rate we used three different algorithms. The first algorithm was based on the calculation of the power spectral density (PSD) estimate of the signal VPG(*n*), using the Welch algorithm and the filter bank approach. To find the pulse frequency, the highest frequency peak was located in the PSD, as a result of which the heart rate was estimated (named as HR0 in this paper). An important aspect of this classic frequency-based approach is that the frequency resolution *fres* depends on the length of the signal buffer (Equation (6)):

$$f\_{\rm res} = \frac{F\_{\rm s}}{N} \tag{6}$$

where: N is the length of the signal observation and Fs is the sampling frequency (frame rate of the video).

We also used a second algorithm based on autoregressive (AR) modelling. In the AR model, the input signal can be expressed by Equation (7):

$$y'(n) = -\sum\_{k=1}^{p} a\_k \cdot y'(n-k) + c(n) \tag{7}$$

where: p is the model order, *ak* are the model coefficients, and *e*(*n*) is the white noise.

Using the Yule-Walker method we fit the AR model to the input signal VPG(*n*) and obtain an estimate of the AR system parameters *ak*. Then, the frequency response of this filter was used to calculate the pulse rate (named as HR1 in this paper). The HR1 value was estimated by detecting the highest frequency peak in the filter frequency responsein the selected range (50–180 bpm).

The third approach was time-based (depicted as TIME in the article). On the filtered signal VPG(*n*), peaks were located using only the peak detection algorithm. Then the intervals between successive peaks were calculated and their median value was used to obtain the heart rate value (HR2).

To minimize false detections, caused by head movements and other sources of image variations, the estimated HR has been further post-processed. A second heart rate buffer of length M was used to store the latest HR0, HR1 and HR2 values. Then the average value of each HR buffer content was calculated and used as a new estimate of the current heart rate (named as HR0m, HR1m and HR2m respectively).

#### *2.5. Evaluation Methodology*

Different kinds of metrics were proposed in other articles for evaluating the accuracy of HR (heart rate) measurement methods. The most common is the root mean squared error denoted as *RMSE* (Equation (8)):

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left[ HR\_{error}[i] \right]^2} \tag{8}$$

$$HR\_{error} = HR\_{vidco} - HR\_{gt} \tag{9}$$

where: *HRvideo*– the HR estimated from video, *HRgt*—the ground truth HR values.

Because *RMSE* is sensitive to extreme values or outliers, we additionally propose using a metric that allows to assess how long the accuracy of a given algorithm is within the assumed error tolerance (Equation (10)). This is particularly important in medical applications where measurement reliability is important:

$$\text{sRate} = \frac{100}{n} \cdot \sum\_{i=1}^{n} (|HR\_{error}|i|) \le tokenance\,\text{}\,\tag{10}$$

Little or no attention has been given in literature regarding the effect of information delay introduced by the algorithm on the error metrics. Assuming that the algorithm introduces a delay t0 and the measured ground truth HR values are also delayed by t1 (due to acquisition and device measurement method), HRerror is biased. Therefore, direct comparison of HR values using HRerror is not accurate (a systematic error is introduced). In addition, HRvideo and HRgt usually are sampled at different frequencies. For example, our camera sampling frequency was 60 FPS and the Polar H7 heart rate sensor provides measurements every approximately three seconds.

To minimize the impact of delays and different sampling frequencies on the results of the HR comparison, we propose the following method. First, HRgt values are interpolated to match the sampling frequency of HRvideo using simple linear interpolation, resulting in HRgt2. An example of ground truth and measured HR time series is given in Figure 4. All results are available online at [37].

**Figure 4.** An example of HR time-series plots for algorithm No.1 (PSD) and ExG signal representation: (**a**) video No.5; (**b**) video No.9.

The delay introduced by the algorithm was estimated using the generated artificial signal of known frequency and time of change. Here, we used a signal that changes from 80 to 120 bpm and has a similar amplitude as VPG(*n*). The resulting delays t0 do not include the delay t1 introduced by the Polar H7 device. We have adopted a constant delay introduced by the measuring device. Assuming that the delay introduced by the algorithm is constant for a given algorithm and its parameters, the estimated delay t2 can be used to correctly evaluate the remaining sequences. Although this is a strong assumption, it improves the accuracy of the results. An estimation of the algorithm delay can also be performed using cross-correlation. However, this analysis is not included in the article, because the estimated delays strongly depended on the shape of the signal and the selected fragment. The results are summarized in Table 1.

**Table 1.** Results of the delay estimation for selected algorithms.


It is also worth mentioning that delay correction is useful for correctly positioning the beginnings of individual parts of the experiment. For example—the impact of a user's head movements may be visible only after some time (equal to the algorithm delay) on the estimated pulse signal.
