*3.2. Comparison of the VPG Signal Extraction Methods for Various Activities*

To see how individual activities affect the results of heart rate detection, the *RMSE* and *sRate* values of the following video parts have been compared:


Because, *RMSE* and *sRate* can be regarded as a small sample size (nine videos) and the effect of outliers can be significant, the median and IQR were used as statistical measures. Figures 9–20 show the results of the HR estimation and comparison of the signal extraction methods and selected algorithms for selected parts.

**Figure 9.** Comparison of signal extraction methods, algorithm No.1 (PSD), part 1: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 10.** Comparison of signal extraction methods, algorithm No.1 (PSD), part 2: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 11.** Comparison of signal extraction methods, algorithm No.1 (PSD), part 4: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 12.** Comparison of signal extraction methods, algorithm No.1 (PSD), part 6: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 13.** Comparison of signal extraction methods, algorithm No.2 (AR), part 1: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 14.** Comparison of signal extraction methods, algorithm No.2 (AR), part 2: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 15.** Comparison of signal extraction methods, algorithm No.2 (AR), part 4: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 16.** Comparison of signal extraction methods, algorithm No.2 (AR), part 6: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 17.** Comparison of signal extraction methods, algorithm No.3 (TIME), part 1: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 18.** Comparison of signal extraction methods, algorithm No.3 (TIME), part 2: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 19.** Comparison of signal extraction methods, algorithm No.3 (TIME), part 4: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 20.** Comparison of signal extraction methods, algorithm No.3 (TIME), part 6: (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

Considering algorithm No. 1 (PSD), *RMSE* and IQR values are lowest for the ICA for parts 1, 2 and 6 (sitting still, reading text and playing game). For the part 4 (rewriting text) the lowest *RMSE* value applies to the ExG signal representation. Given *sRate*, the best representation is ICA for parts 1,2 and 6, but part 4, where the highest *sRate* is for ExG. However, the IQR values are the lowest for ICA only for parts 1 and 6. For parts 2 and 4 the lowest IQR is for G and GRD representations respectively.

The lowest *RMSE* are for parts 1 and 6 (sitting still and playing a game), in which facial actions and head movements were small. Part 2 (reading text) has the highest IQR values. This means that facial actions in some cases have a negative impact on the accuracy of HR estimation. The large head movements present in part 4 (rewriting text) have the least impact on the accuracy of the ExG signal extraction method.

Considering algorithm No.2 (AR), *RMSE* are the lowest for ICA for parts No. 2, 4 and 6 (reading text, rewriting text and playing game). However, IQR values are not always the lowest for ICA. For part 1 (sitting still) the lowest *RMSE* value applies to the ExG representation, but with a high IQR value. Given *sRate*, it is highest for ICA and parts No. 2, 4 (reading and rewriting text). For part No. 6 (playing game) the best signal extraction method is ExG, and for part No.1 (sitting still) the G image representation.

Given algorithm No. 3 (TIME), *RMSE* values are lowest for ICA for all parts. However, *sRate* is highest for the ExG signal extraction method (parts No. 2 and 6) and GRD for part No.1. This means that there are outliers present because *RMSE* is sensitive to extreme values. The IQR of *sRate* is the lowest for G representation and almost all parts.

To compare the medians between groups (signal extraction methods) for statistical differences, a two-sided Wilcoxon rank sum test was used. The results are shown in Tables A7–A9 (Appendix A). The p-values of almost all combinations of signal extraction methods indicate that there is not enough evidence to reject the null hypothesis of equal medians at a default significance level of 5%. This means that all methods provide similar results for different activities statistically. The exceptions are: comparison between G and ICA for PSD and part 6, G and ICA for AR and part 6 (*RMSE* only), and G and ICA for TIME and parts 1, 4 (*RMSE* only).

#### *3.3. Comparison of the Di*ff*erent Algorithms and Activities*

The results of comparing different algorithms (PSD, AR, TIME) are shown in Table 3. Statistics were calculated for entire video sequences (including all participant activities).


**Table 3.** The median *sRate* and *RMSE* for selected algorithms and signal extraction methods.

Considering the median values, the best results (highest *sRate* and lowest *RMSE*) can be observed for algorithm No. 1 based on power spectral density (PSD). The second best algorithm is based on autoregressive modeling (algorithm No. 2). The worst results are for direct analysis of the VPG signal in the time domain (algorithm No. 3). It is worth noting that video No. 9 has a significant impact on results. ICA is the best signal extraction method in terms of *RMSE* values. However, in the case of *sRate* the best results are for ExG.

To compare the medians between groups (algorithms) for statistical differences, a two-sided Wilcoxon rank sum test was used. The results are shown in Table A10 (Appendix A). The p-values of almost all combinations of algorithms and signal extraction methods indicate that there is insufficient evidence to reject the null hypothesis of equal medians at a default significance level of 5%. The only exceptions are: ICA and G for PSD vs TIME, where p-values indicate the rejection of the null hypothesis of equal medians at a default significance level of 5%. This means that the most important issue for the ICA signal extraction method is choosing the right estimation algorithm.

## *3.4. Analysis of the Impact of Average Lighting and User's Movement on the Results of Pulse Detection.*

To assess the effect of the scene illumination on the pulse detection accuracy, a Pearson's correlation coefficient between the median *sRate* and the average scene lighting was calculated for all video sequences (Table A11 in Appendix A). The results show only one strong positive correlation (0.71) for algorithm No. 3 (TIME) and the GRD signal extraction method. There are no medium and strong correlations present, with a significance level of less than 0.05 for other combination of algorithms and signal extraction methods. This may be due to similar and poor lighting for most video sequences.

Similarly—to assess whether the user's movements affect the results, correlation coefficients were calculated between the median *sRate* and the standard deviation of the accelerations (measured by SensorTag) for the entire video sequences (Table A12 in Appendix A). The results show strong positive correlations (> 0.6) for:


Counterintuitively, *sRate* raises as the standard deviation of the accelerations increases. This might suggest that ballistocardiographic head movements generated by the flow of blood through the carotid arteries has strongest impact than subtle skin color variations caused by circulating blood. Only the ICA image representation is not sensitive to acceleration. It is also worth noting that this might be the effect of the location of the sensor (chest). However, further investigation of this hypothesis is required. Also, the Pearson's correlation coefficient with a small sample size might lead to inaccurate results. However, it can still provide useful information.

#### **4. Discussion**

The main purpose of this research was to investigate the impact of human activity on the accuracy of the VPG heart rate algorithm. We focused on activities performed during typical human-computer interaction (HCI) scenarios (i.e., reading text, rewriting text, playing game). Thus, the evaluation of the continuous HR estimation accuracy was carried out on several video sequences recorded in different places and under different conditions (illumination, person identity, distance from the computer screen and camera). We have used state of the art face detection and tracking algorithm, and compare various signal extraction methods, including (to our knowledge) first time used the ExG image representation. It is worth noting that the scene lighting for most of the videos was very poor, which corresponds to the typical computer work conditions.

For the entire video sequence and taking into account the *RMSE* metric, the ICA signal extraction method results in smallest errors. However, when it comes to reliability of measurements and maintaining the accuracy of a given algorithm within the accepted error tolerance (*sRate* metric), the ExG representation seems to be a promising method. This is especially important in medical applications. It is also worth mentioning that the ExG method is much faster to calculate than ICA (about four times—MATLAB implementation on an Intel i7 machine).

To check how individual activities affect the results of heart rate detection, the following activities were compared: the participant sits still for a minimum of 60 seconds, the participant reads text, the participant rewrites text using the keyboard and the mouse, the participant plays game. In conclusion, considering algorithm No.1 (PSD), the ICA signal extraction method works better in sequences where there are no large head movements (sitting still and playing a game). For large head movements, the ExG representation gives better results. Facial actions (part 2 – reading text) have a negative impact on the accuracy of HR estimation. Given algorithm No.2 (AR), it is difficult to indicate the best signal extraction method. In general, ICA works better on parts with facial actions and head movements. For other parts, the ExG method works well, but for part in which the participant was sitting still, the simplest signal representation (G) is the best. Interestingly, these are the opposite results than in the case of the PSD algorithm, in which the ICA signal extraction method works better in cases where there are no large head movements. Considering algorithm No.3 (TIME), the ExG signal representation method provides better reliability of measurements (*sRate*). The smallest *RMSE* is for ICA, but the *RMSE* metric is more sensitive to extreme values and outliers found in the collected data.

Based on the Wilcoxon rank sum test, almost all signal extraction methods provide similar results statistically with the exception of G and ICA comparisons. This means that for the tested videos it is impossible to indicate the best method that works in all scenarios and lighting conditions. Collecting more data can help indicate a better method. Comparing the results obtained from different algorithms, we found that algorithm No. 1 (PSD) gives the best results, followed by the algorithm No. 2 (AR). The accuracy of the algorithm No. 3 (time-based) is significantly different from other algorithms. In addition, based on the Wilcoxon rank sum test, for the ICA signal extraction method the most important is the selection of the appropriate estimation algorithm.

Taking into account individual activities, the highest average *sRate* applies to the activity in which participants sat still. The second highest average *sRate* is for the activity in which users were playing game. The lowest *sRate* value applies to: reading and typing text respectively. Although, the ICA method seems to provide better results, this is not always the case. There are several combinations of estimation algorithm and signal extraction method in which the ExG is better (i.e., part No.1 and TIME).

The presented analysis and results pave the way for other studies. The following directions of future research remain open:

