**5. Results**

For each of the 18 participants, we captured 15 parameters either measured directly or derived from ECG. For all these parameters, we also computed a moving average for an outlier-smoothed version of the same signal in order to ge<sup>t</sup> a better understanding of the signal's overall robustness and reliability. Additionally, we computed low/high-pass filtered versions of the GSR signals, as well as the complex demodulation amplitudes from the ECG signals. For each participant, we analysed 22 pairs of physiological parameters of the same type regarding similarity (e.g., heart rate from BioHarness sensor and heart rate from VarioPort sensor), and another 136 pairs of parameters of di fferent type regarding correlations (e.g., heart rate from BioHarness sensor and galvanic skin response from VarioPort sensor).

Since there are many di fferent parameters, we defined a naming convention that includes the physiological parameter of interest, the platform used to measure it, plus an indication of whether a time series is a moving averaged and/or a filtered version. For the naming of these parameters, we use the following notation:

For direct measurements:

#### <*parameter*> <*platform*> [*filt.*] [(*mv. avg*)]

where *parameter* can be GSR, HR, or IBI and *platform* can be BH, E4, or VP; *(mv. avg.)* indicates that this is the moving averaged version; *filt.* indicates that a first order high-pass (0.05 Hz) and first order low-pass (0.5 Hz) Butterworth filter has been applied to the original signal (this filter setting is used for further analysis to identify moments of stress [4,11]; however, this is not within the scope of this paper).

example: *GSR: VP (mv. avg.)* refers to the moving averaged version of galvanic skin response measured by VarioPort

For derived measurements:

> <derived parameter> from <original parameter> <platform> [(mv. avg.)]

where *derived parameter* can be HF, IBI, LF, VLF, *original parameter* can be ECG, and *platform* can be BH, or VP; *(mv. avg.)* indicates that this is the moving averaged version example: *IBI: from ECG BH* refers to the inter beat interval derived from the electrocardiogram measured by BioHarness

The following subsections are structured according to Figure 3. We use two representative time series, one HR and one GSR, as examples to guide the reader through the high number of physiological parameters investigated herein. These two examples are cross-referenced between several figures and thus provide views on the same data from di fferent perspectives, thereby fostering the consolidation of a more holistic picture.

#### *5.1. Visualisation: Exploratory Plots*

The aim of the exploratory plots is to obtain a basic understanding of the temporal behaviour and the relationship of equal-type physiological parameters. For this first insight, we investigate three complementary plots that provide di fferent views on the same data: a time series plot, a scatter plot, and a cross-correlation plot. These plots show two versions of the same parameter, namely a data-as-is version and moving averaged version. Note that for the cross-correlation plots the second parameter is used as the independent one. To illustrate the methodology and exemplary results, we only show representative sample plots, which highlight the characteristic patterns of about 80% of all plots. Overall, we produced more than 1000 plots based on unscaled and rescaled data.

Figures 4 and 5 each show a time plot (a), a scatter plot (b), and a cross-correlation plot (c,d). The physiological parameter of interest is HR, measured by HB and VP. The HR time plot (Figure 4a) shows two highly similar, almost identical curves. The blue curve has an o ffset, which is maximal in the low range and converges to zero in the high range. The corresponding error term seems

to include a reciprocal component: the higher the actual measurement, the lower the error. The HR scatter plot (Figure 4b) shows a high positive linear relationship with an R<sup>2</sup> of 0.971 for raw data and 0.997 for the moving averaged data. This means that 97.1% and 99.7%, respectively, of the data's total variance can be explained by a linear model. This plot also confirms what is seen in the time plot, namely that in the low range the residuals are higher than in the high range. Note that the higher residuals in red in the upper right quarter of the plot refer to the time plot at ~750 s, where the blue curve drops below the red curve (indicated by a black arrow in Figure 4a,b). The HR cross-correlation plots show the highest cross-correlation for the as-is version (Figure 4c) at a lag of 1 s, and the highest cross-correlation for the moving averaged version (Figure 4d) at a lag of 2 s. In other words, the local trend of the BH is lagging 1 and 2 s, respectively, "behind" the local trend of the VP on average.

**Figure 4.** *Cont*.

**Figure 4.** Participant RP 5–14: time plot (**a**), scatter plot (**b**), and cross-correlation plot (**c**), and cross-correlation plot of moving averages (**d**) of heart rate HR [beats per minute] measured by Bioharness3 BH sensor and VarioPort VP.

In Figure 5, the physiological parameter of interest is GSR, measured by the E4 and the VP. Generally speaking, measuring GSR is, in comparison to HR, a delicate undertaking due to the measurement principle, which is solely based on the electrical conductivity of the skin. This conductivity is highly dependent on (1) the participant's skin characteristics and (2) the contact between the sensor electrodes and the skin, especially during physical activity. Thus, these two factors can have a significant impact on the reliability, and thus on the comparability, of the sensor measurements. Additionally, the mounting of the sensors' electrodes can differ as well. For instance, the VP electrodes need an isotonic electrolyte gel to ensure reliable measurements, while E4 does not require anything. The GSR time plot (Figure 5a) shows that the blue curve (VP) increases gradually. The red curve (E4) increases faster than the blue one and shows a local maximum after the 5-min warm-up phase (at around 300 s). This increase is followed by a decrease for another 5 min (until around 600 s). Aside from a little drop at around 900 s, the red curve increases until the end of the cool down phase. In general, the E4 seems to be

more responsive to sweating associated with physical effort than the VP, which may be due to its lack of stabilizing isotonic electrolyte gel. The GSR scatter plot (Figure 5b) shows a positive correlation with an R<sup>2</sup> of 0.882 for raw data and 0.896 for the moving averaged data. This means that 88.2% and 89.6%, respectively, of the data's total variance can be explained by a linear model. The cross-correlation plots (Figure 5c) show the highest cross-correlation for the as-is version at lag of 2 s, and the highest cross-correlation for the moving averaged version at a lag of 1 s. In other words, the local trend of the E4 sensor is lagging 2 and 1 s, respectively, "behind" the local trend of the VP sensor on average.

**Figure 5.** *Cont*.

**Figure 5.** Participant RP 3–8: time plots (**a**), scatter plot (**b**), cross-correlation plot (**c**), and cross-correlation plot of moving averages (**d**) of galvanic skin response GSR measured by Empatica E4 and VarioPort.

#### *5.2. Quantitative Analysis*

The aim of the quantitative analysis is twofold: first, we assess the correlation and the similarity of equal-type parameters. Second, we assess potential associations in pairs of parameters of both equal-type and different types. In addition to global statistics, we also apply local measures to derive new information about the relationship between and among the different parameter pairs. This combination of global and local similarity and correlation measures on the individual level further enables a roll-up view on relationship patterns of physiological parameters among participants. Note that for similarity distance metrics, we rescaled measurements from min ... max of the original scale to 0 ... 1. For the correlation analyses we used the original values of the given parameter at the given range and the given unit in order to identify potential offsets.

#### 5.2.1. Linear Regression and Coefficient of Determination R<sup>2</sup>

The first statistic of interest is the coefficient of determination R2, which quantifies the percentage of the variance of the two given parameters that can be explained by a linear regression model. In addition to the R<sup>2</sup> of individual pairs of parameters, as shown in the scatter-plots (refer to Section 5.1, Figures 4b and 5b), we now investigate all pairs among all participants and explore the corresponding R<sup>2</sup> pattern. This pattern can be derived from the R<sup>2</sup> matrix shown in Figure 6. Furthermore, for each group of parameters, e.g., all IBI related parameter, we calculate the total average per participant in order to ge<sup>t</sup> an impression of the impact of each participant's individual overall measured activity (GSR base level, skin contact of electrodes, etc.).


**Figure 6.** R<sup>2</sup> matrix of pairs of parameter and participants; detail (**a**) complements Figure 4, detail (**b**) complements Figure 5; total average of individual pairs among participants is shown in the last column; total average of participants among individual parameter pairs is shown in the last row of each parameter group; colour: red indicates high correlation, blue indicates low correlation.

In the upper half of the R<sup>2</sup> matrix, the pairs of equal-type parameters, measured by different sensors (or derived from another signal of the same sensor) show a high linear relationship across the majority of the participants. This relationship also indicates that these parameters are rather robust from a measuring point of view. However, the matrix also shows some cases with no relationship at all, see for instance IBI derived from ECG BH (moving averaged version), and IBI derived from ECG VP (moving averaged version) at row three for participant RP 8-20 and RP 1-2. This may indicate that one of the sensors did not have proper contact between the electrodes and the skin and thus failed to collect valid data.

The matrix shows that GSR in general, and IBI measured by Empatica E4, tend to have rather low or even no correlation, while some participants demonstrate the exact opposite (compare instance RP 4-11 and RP 2-5).

Note that the R<sup>2</sup> matrix in Figure 6 is organized as follows: for each group of parameters, the top row shows the highest correlation among all participants, while the bottom row shows the lowest correlation. Further, the left column shows the participant with the highest correlations among all parameters, while the right column shows the participant with the lowest correlations among all parameters.

Figure 6 detail (a) refers to the HR example shown in Figure 4 and detail (b) refer to the GSR example shown in Figure 5.

In summary, the overall pattern shown in Figure 6 confirms that both the type of the parameter measured and the individual parameters, such as skin characteristics of the participant, significantly influence the reliability of the measurements and thus the quality of further analysis.
