The Established 'Knowledge Base' Is Novel and Benefits Future HPC Sample Testing

The information relating to the established models (see Tables A1 and A2) is valuable because it can be used to create a true 'knowledge base' for practical applications. Given this knowledge base, if one experimental variable for the HPC sample can be used to predict another variable based on the known mathematical relationship and a guarantee of prediction accuracy, it is logical that the number of testing items that are truly necessary may be reduced. This is particularly true today as stakeholders in the construction and civil engineering industries have reduced the time and resources needed to complete a project, so the time available to test the HPC samples is limited. This should be considered as the primary and original contribution of this study.

A Method Is Provided to Explore the Insights into the Variables That Can Practically Be Used to Predict Another Variable and to Determine How Accurate the Prediction Will Be

Table A1 lists nine SLR models (where eight other models are derived from the first 'base model') as a group. It is critical for data analysis in such a design to utilise all opportunities to identify the optimal model that offers both better predictive power and better data-model fitness simultaneously.

Examining SLR models in which variable *X* is CS, for example (M# = {1, 2, 3, ... , 27} in Table 5), the first nine models are established for predicting TS (*Y*) from CS (*X*). If *p*(M) < 0.5 is the threshold to confirm a model's significance, the M#=4(*X* = *eX*) and M# = 5 (*X* = 2*X*) models are not qualified to be effective (*p*(M) = 0.0558 and *p*(M) = 0.0540). In addition, the models are far from data-model fitness, because *R*<sup>2</sup> = 0.103447 for M# = 4 and *R*<sup>2</sup> = 0.104875 for M# = 5 (which are far below the levels of data-model fitness for seven other models), meaning that these two models may not provide good prediction accuracy. Based on these two results, it is evident that the two SLR models with the 2 'power *X*' transforms are inadequate. A negative implication of this finding could be that the true relation between CS and TS does not exist on this basis, but a positive implication could be that there have been seven models that can be recommended (M# = {1, 2, 3, 6, 7, 8, 9}) in practice or for future research.

Investigating two subsequent (*X*, *Y*) combinations, for M# = {10, 11, ... , 18}, the models are established for predicting FS (*Y*) from CS (*X*). The resulting situation is similar to using CS to predict TS: the two 'power *X*' SLR models (M# = {13, 14}) are inadequate, and "no true relation between CS and FS exists on this basis". However, the claim can also be made that "another 7 models (M# = {10, 11, 12, 15, 16, 17, 18}) can be effective in practice or for future research".

The M# = {19, 20, ... , 27} are also established for predicting USPV (*Y*) from CS (*X*). However, no model is qualified to build a predictive relation between these two variables. Two SLR models (M# = {22, 23}) are ineffective (*p* > 0.1), and every model's *R*<sup>2</sup> value is poor (*R*<sup>2</sup> < 0.2) meaning they provide insufficient data-model fitness. Thus, we conclude that that the value of USPV from the value of CS in the experiments performed to test HPC samples cannot be anticipated, and that further research is recommended to find a method to predict USPV using CS.

This analysis is not continued throughout the entire Table A1. The above process can be repeated for all other models in Table A1 to gain other insights regarding the variables that can be used to predict other variables and the accuracy of the prediction process. In addition to these empirical insights that are expected, the experimental design to enable such explorations is the second contribution of this work.

Another Perspective to View the Model Information Is Offered to Differentiate and Recommend the Appropriate Transforms to Be Used for a Variable

An alternative method to evaluate the information relating to the established SLR models can provide additional insights. In Section 3.5 and Appendix C, the significance of the entire model (*p*(M)), the significance of the estimated *α*∗ value (i.e., *p*(*α*∗)), the significance of the estimated *β*<sup>∗</sup> value (i.e., *p*(*β*∗)), the R square value (i.e., *R*2), and the adjusted R square value (i.e., (*R*2) ∗ ) of the SLR models are systematically presented as separate tables (Tables 6, 7, A1 and A2) according to the transform applied on the RHS variable in the SLR model. This provides another perspective for the model information which can be used in addition to the previous analytical viewpoint that presents a group of models derived from a base model at the same time (e.g., Tables 5, A1 and A2). Therefore, offering two complementary perspectives is another contribution of this research. The following discussion highlights the positive outcomes of this new data-viewing perspective.

Based on the results in Tables 6 and 7 (for the *p*(M) values and the *R*<sup>2</sup> values, respectively), the following insights are gained:


Regarding the 'worse variable transforms', they can be identified by interpreting Table 6 in detail (in addition to Tables A1 and A2, as required) to determine if, for a pair of (*X*, *Y*), a 'no transform' model shows good results (e.g., *p*(M) < 0.05 and/or *R*<sup>2</sup> > 0.6), and which model(s) with a variable transform shows poor results (e.g., *p*(M) > 0.1 and/or *R*<sup>2</sup> < 0.2)?

Examining the highly-correlated variable group of {CS, TS, FS} (see Section 4.1.2), the SLR models established between all pairs of these variables using every other transform (for *X*) offers a preferred data-model fitness for providing accurate predictions (i.e., *<sup>R</sup>*<sup>2</sup> ≥ 0.6) (shown in the dark cells in the upper left of Table 7a,b,e–h); however, this is not the case using the two 'power *X* methods' (i.e., *X* = *e<sup>X</sup>* and *X* = 2*X*) (shown in the light cells in the upper left part of Table 7c,d).

Further evaluating Table 7 reveals that for all pairs of variables, using these two 'power *X*' methods may depress the *R*<sup>2</sup> value in general and make it difficult to uncover the relevant information (e.g., to determine which model provides better data-model fitness). It is therefore recommended to neglect both 'power *X*' methods in future studies for all pairs of test variables.
