*3.1. Comparison of the VPG Signal Extraction Methods (G, GRD, ICA and ExG)*

To select the appropriate statistical methods to compare the results, a Shapiro-Wilk parametric hypothesis test of composite normality can be used. However, with a small sample size (9 videos), the impact of outliers can be significant. Therefore, median and IQR were used as statistical measures.

TablesA3–A5 (AppendixA) show the results of HR estimation for various signal extraction methods and selected algorithms. The results were calculated for entire video sequences (including all participant activities). The *sRate* value is given for a threshold of 3.52 bpm (equal to the algorithm frequency resolution). Box plots (Figures 6–8) are also included to better illustrate *sRate* and *RMSE* distributions.

**Figure 6.** Comparison of signal extraction methods, algorithm No.1 (PSD): (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 7.** Comparison of signal extraction methods, algorithm No.2 (AR): (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

**Figure 8.** Comparison of signal extraction methods, algorithm No.3 (TIME): (**a**) box plots for *sRate*; (**b**) box plots for *RMSE*. Blue lines—IQR range, red line—median value.

Considering algorithm No. 1 (PSD), the lowest median *RMSE* with low interquartile range (IQR) value is for the ICA signal extraction method. The second lowest *RMSE* values relate to the G and ExG representations. The worst results are for the video No. 9. However, this video was recorded under artificial lighting conditions with lights visible in the scene, which could have a negative effect on the

results. Also, the actual heart rate was low (about 50 bpm), which is close to the limit of the measured range (results below 50 bpm are considered incorrect). The *sRate* measure shows similar results—it is the highest for ICA signal extraction method. The ExG method has the highest IQR values.

Looking at the algorithm No. 2 (AR), and *RMSE* - the results are similar to the PSD algorithm. However, all IQR values are lower, which means that this algorithm gives more similar outcome for videos acquired under different conditions. As for *sRate*, the highest value is for ExG signal extraction method but with a large IQR. Given algorithm No. 3 (TIME), the lowest median *RMSE* value with a small interquartile range (IQR) value is for ICA, followed by ExG signal extraction method. All errors are higher for this algorithm than for PSD and AR. The *sRate* is the highest for ExG and then GRD. However, the lowest *sRate* IQR values relate to the ICA and G signal representation.

To compare the medians between groups (signal extraction methods) for statistical differences, a two-sided Wilcoxon rank sum test was used. The Wilcoxon rank sum test is a nonparametric test for the equality of population medians of two independent samples. It is used when the outcome is not normally distributed and the samples are small. The results are shown in Table A6 (Appendix A). The p-values of almost all combinations of signal extraction methods indicate that there is not enough evidence to reject the null hypothesis of equal medians at a default significance level of 5%. This means that all methods provide similar results statistically. The exception is the comparison of G and ICA for algorithm No. 3 (TIME), but only for the *RMSE* metric.
