*3.2. Correlation Analysis*

Table 2 shows the P-Co-Cos between each pair of variables in a correlation matrix, while no variable is transformed. Figure 3 visualises the results using a heat map.

**Table 2.** Correlation coefficients between each pair of variables (no transform).


**Figure 3.** Visualisation of the correlation matrix (no transform).

To ensure an objective basis using the adopted measures (see Section 2.3) and taking the data variables pair-wisely, every data variable should have an equal length and be tested on the same set of testing days for the same set of HPC samples. Fortunately, the collected datasets have followed these conditions. Inside the red boxes shown in Table 1, experimental data are provided for all data variables on day 28, day 56, and day 91, over all HPC samples. The result is eight variables with an equal data length of *n* = 36, correspondingly; these form the basis of subsequent data analyses.

Each sub table in Table 3 shows a correlation matrix produced when variable *X* in each pair (of variables) is transformed using one of the methods discussed in Section 2.3.4. In these sub tables, the cells are shown in different colours to visualise the results directly (which is analogous to using a separate heat map: dark green for 1, gradient green (lighter and lighter) for (less) positive values, white for 0, gradient red (heavier and heavier) for (more) negative values, full red for −1). In addition, note that the diagonal elements are white boxes with '—' entries to indicate they do not contain any meaningful information.

**Table 3.** Correlation coefficients between each pair of variables, with different RHS variable transforms: (a) *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*2; (b) *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*3; (c) *<sup>X</sup>* <sup>=</sup> *<sup>e</sup>X*; (d) *<sup>X</sup>* <sup>=</sup> <sup>2</sup>*X*; (e) *<sup>X</sup>* <sup>=</sup> <sup>√</sup>*X*; (f) *<sup>X</sup>* <sup>=</sup> <sup>√</sup><sup>3</sup> *<sup>X</sup>*; (g) *<sup>X</sup>* <sup>=</sup> log(*X*); (h) *X* = lg(*X*); (h) RHS Var. Transform: *X* = lg(*X*).



**Table 3.** *Cont.*

#### *3.3. Cosine Similarity Analysis*

Table 4 shows the Cos-Sim between each pair of variables in a matrix, with no variable transformations. Figure 4 visualises the same results using a heat map.


**Table 4.** The Cos-Sim indices between each pair of variables (no transformation).

**Figure 4.** The correlation matrix visualised (no transform).

Some studies have successfully treated variables as vectors and used the Cos-Sim between two vectors (see Section 2.3) to confirm the correlation between two variables (but not vice versa), and have observed that a higher P-Co-Co (between −1 and 1) most often indicates a higher Cos-Sim (between 0 and 1). Since in Figure 4 most variable-pairs with higher Cos-Sim indices are observed to have higher P-Co-Cos in Figure 3 (relative to other pairs), the findings from the previous studies are confirmed in terms of P-Co-Co.

Since this outcome can provide justification for performing subsequent analyses, results were not obtained in this study for the other eight transforms. Relevant tables and figures are omitted, as those outcomes are expected to be analogous.

#### *3.4. Regression Analysis*

This section summarises the results for estimating and establishing the 504 SLR models (see Section 2.3.4). The details are summarised in the web page at the following URL: http:// www.DDDM.nkust.edu.tw/download/HPCStudy2\_TheUltimateDataExperiments.html (accessed on 3 April 2022) (and in Appendix B), while some initial entries in Table A1 are listed in Table 5 for clarification. A guideline for reading these tables is provided below.


**Table 5.** Results from estimating and developing the models (some initial entries).

In Table A1, '*X*' determines the eight 'main phases' defined by the eight variables used as the independent variable. In the table, the different background colours denote these phases as blocks. In each main phase fixing the independent variable, there are seven subphases defined by other variables used as the dependent variable '*Y*'. Since each subphase involves nine transforms (refer to the transform descriptions in Section 2.3.4), each block in Table A1 contains 7 × 9 = 63 SLR models, giving 63 × 8 = 504 models in total.

For each model, M# represents the unique model number assigned, with the P-Co-Co between the independent and dependent variables (*X* and *Y*) given. This is followed by the estimated model parameters: *α*∗ (the estimated value of parameter *α*) and *β*∗ (The estimated value of *β*); the *p* values for *α*∗, *β*∗ the entire SLR model, *p*(*α*∗) and *p*(*β*∗); and the R square values, *R*<sup>2</sup> and (*R*2) ∗ .

As an example, the model with M# = 10 in Table 5 is the model established in the main phase '*X* = CS' and in the subphase '*Y* = FS' to identify the relationship between CS and FS of the HPC samples. Since it is the first model of the '*X* = CS, *Y* = FS' subphase, transform method (1): *X* = *X*<sup>1</sup> is used, which means no transform is performed for the data of variable *X* in this model (see Section 2.3.4). From the table, the estimated parameters for this model are: *α*∗ = 4.243931 (meaning the regression line intercepts with the *Y* axis

at Y = 4.243931) and *β*∗ = 0.104528 (this is the regression line's slope, meaning one unit of increase in CS leads to an increase of 0.104528 in FS). Therefore, the established model can be written as:

$$FS = 4.243931 + 0.104528 \times CS \tag{6}$$

For the same model, *p*(*α*∗) = *p*(*β*∗) = *p*(M) = 0. Despite the fact that in this table the *p* values are truncated to 4 digits past the decimal so that '*p* = 0 may represent a very small value of *p*, it could still be inferred that both parameters estimated by the model are very significant, and the model itself is quite significant (i.e., it is very reliable and can be trusted to a large extent).

Moreover, for this model, *R*<sup>2</sup> = 0.679376 and (*R*2) <sup>∗</sup> = 0.669946, meaning that the data-model fitness is acceptable, since over 2/3 of the variability is explained by the established model. However, this is only slightly above the acceptability threshold of 0.6 to claim data-model fitness which was established due to this study's scientific foundation (i.e., natural science than social science investigation, so using 0.6 than 0.4 or even 0.2 is more reasonable).

The full forms of all 504 models are provided in Table A2, which can also be accessed on the web page. For example, Equation (6) for the model with M# = 10 above is the same full form as displayed in Table A2.

#### *3.5. Additional Information*

Additional information about the SLR models is retrieved from Table A1 and rendered in terms of the transform method used to convert the data value of the independent variable, *X*. For the analytical targets, only the *p* and *R*<sup>2</sup> values for all models are considered to conserve space. The reason for considering these values is that the *p* value of any regression model represents its significance, and *R*<sup>2</sup> indicates the data-model fitness (or the model's explanation power for the variations in the data), both of which are essential for qualifying SLR models.

The results for the *p* values of the entire model are listed in Table 6. Each sub table contains all pairs of variables (*X*, *Y*); *X* and *Y* are the RHS (independent) and LHS (dependent) variables in Equation (5), respectively. In the sub tables, the rows are identified by *X* and the columns are identified by *Y*. The significant results, indicating the SLR model is reliable and the model's predictive power can be trusted, are shown in red font. The significance of the *p* values is determined using the threshold: *p* < 0.10, which is typically the most relaxed condition that is acceptable by statisticians. In addition, '*p* = 0 may mean a very small value of *p*.

**Table 6.** SLR models' *p* values for each pair of variables, by different RHS variable transform methods: (a) *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*2; (b) *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*3; (c) *<sup>X</sup>* <sup>=</sup> *<sup>e</sup>X*; (d) *<sup>X</sup>* <sup>=</sup> <sup>2</sup>*X*; (e) *<sup>X</sup>* <sup>=</sup> <sup>√</sup>*X*; (f) *<sup>X</sup>* <sup>=</sup> <sup>√</sup><sup>3</sup> *<sup>X</sup>*; (g) *<sup>X</sup>* <sup>=</sup> log(*X*); (h) *X* = lg(*X*).



**Table 6.** *Cont.*



**Table 6.** *Cont.*

The results for the *R*<sup>2</sup> values of the entire model are listed in Table 7, where each sub table contains the *R*<sup>2</sup> values for the SLR models of all pairs of variables (*X*, *Y*) when a transform is applied for *X*. Similar to Table 6, the rows are identified by *X* and the columns are identified by *Y* in Table 7. However, unlike the correlation values which may range from −1 to 1 in Table 3, the R-squared values only range from 0 to 1. Therefore, the numbers in Table 7 are dyed according to the following convention: dark green for 1, gradient green (lighter and lighter) for (less) positive values, and white for 0.

**Table 7.** SLR models' *R*<sup>2</sup> values for each pair of variables, by different RHS variable transform methods: (a) *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*2; (b) *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*3; (c) *<sup>X</sup>* <sup>=</sup> *<sup>e</sup>X*; (d) *<sup>X</sup>* <sup>=</sup> <sup>2</sup>*X*; (e) *<sup>X</sup>* <sup>=</sup> <sup>√</sup>*X*; (f) *<sup>X</sup>* <sup>=</sup> <sup>√</sup><sup>3</sup> *<sup>X</sup>*; (g) *X* = log(*X*); (h) *X* = lg(*X*).



To conserve space, the results of importance are shown with different shades of background colours in the table. In this manner, the SLR models with better data-model fitness can be easily identified. To evaluate the fitness, many scientific studies use the threshold: >0.4 (i.e., it has resolved more variations in the data, so using this model for prediction is therefore more accurate). The results for other observations, such as the *p* values of the estimated parameters *α*<sup>∗</sup> and *β*∗, as well as the adjusted *R*<sup>2</sup> value for the model, are summarised in Tables A3–A5, respectively, in Appendix C. These may also be accessed on the 'web page'.

#### **Table 7.** *Cont.*
