**1. Introduction**

There are many types of experiments for testing a concrete material sample. As an example, in a study [1] involving the selection of high-performance concrete (HPC) admixtures for offshore wind farm construction, potential experiments are classified into the following three categories:


In the study, slump flow and the time required to flow through a V-shaped funnel were included in (Cat 1). Compressive strength (CS), ultrasound pulse velocity (USPV), and electrical resistivity on surface (ERoS) were included in (Cat 2), while anti-sulphate capability (ASC) and rapid chloride permeability (RCP) were included in (Cat 3). These categories allowed the researchers to determine the superior admixtures that included cement, fly ash, silica fume, super plasticiser, and water for grouting before building (nonfloating) wind turbines with foundations constructed in the sea. One analysis, mentioned

**Citation:** Zhuang, Z.-Y.; Kuo, W.-T. Unravelling the Relations between and Predictive Powers of Different Testing Variables in High Performance Concrete Experiments: The Data-Driven Analytical Methods. *Buildings* **2022**, *12*, 1545. https:// doi.org/10.3390/buildings12101545

Academic Editors: Ahmed Senouci and Jan Foˇrt

Received: 13 August 2022 Accepted: 20 September 2022 Published: 27 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

7

only in the paper's Appendix, was of particular interest because it concluded that the relationship between CS and RCP could be identified (and established) as follows:

$$\text{RCP} = 7966.72 + (-97.76)\text{CS.}\tag{1}$$

This equation was formulated and validated through an extensive series of exploratory data analyses in that study; further details are provided in Appendix A of this study. Figure 1 provides visualisations of the final 'effective process' that resulted in Equation (1) using K-means (i.e., a non-supervised machine learning approach) and simple linear regression (SLR) modelling.

**Figure 1.** Relationship between CS and RCP: (**a**) a rough trend was observed between CS and RCP; (**b**) visualising the result of clustering in the previous study. (Data source: Re-plot and New Plot).

However, a practical benefit of the above process has not been described in the literature: such an outcome might reduce the effort to perform sample tests because with Equation (1), one of the two experimental results (e.g., RCP) can be anticipated (e.g., by CS), thereby reducing time and effort. Nevertheless, the practical benefits of this encouraging result are still limited because it was only validated for *one pair* of variables. Therefore, this study seeks to answer the question: Among *numerous pairs* of parameters tested for the HPC samples, does *any other pair* exist in which one parameter can be used to predict another?

In this study, a full set of data related to the experiments is sourced, and a datadriven analysis is performed following a pairwise comparison approach (PCA). To a large extent, this study identifies *all pairs* of concrete sample parameters in the available datasets sourced from an HPC laboratory, providing in-depth and cross-categorical views of the relationships between each variable pair and offering scientifically grounded insights about the experimental values that can be used to anticipate others.

A systematic data analysis methodology is designed and proposed utilising numerous methods, including correlation coefficient, cosine similarity, SLR modelling, and dimensional alternation (domain transform) of the variables (before estimating the parameters of the SLR model to ensure the linearity of each established model), in addition to other supplemental methods (e.g., the heatmap visualisation technique, and viewing the results from different perspectives). Using this methodology, the analysis reveals essential information about all Cat2 and Cat3 variables, the relationship between each pair of test variables, the

variable's ability to predict other variables, and the accuracy with which the variables can predict one another. The set of information obtained from the analysis (i.e., the 'knowledge base') can be used to benefit researchers and practitioners.

Since every pair of variables fitted with a model with sufficient predictive power (in terms of data-model fitness and model significance or effectiveness) can be summarised from the knowledge base, the results of some tests can be anticipated by using the results of other tests (if the law allows not testing every item). This could be a significant benefit to material testers whose time is valuable, and could also indirectly help reduce the cost and complexity of construction projects.

In this paper, other discussions are presented for the insights gained, particularly for the identified pairs of variables for which the results are positive (i.e., effective information). Implications are thus drawn from several aspects of the analysis, such as selecting a proper dimensional alternation method to convert the independent variable data for the established SLR model, performing both 'double check' and 'third check' by using the cosine similarity index and the relevant statistical descriptors of SLR models for the main results in the total correlation matrix, confirming theories in the existing literature or standards (between the test variables), and revealing the truth between the different destructive and nondestructive tests. Extensive discussions are also given for the utilisation of the developed knowledge base and future applications of the proposed methodological framework, as well as the time-saving effects on making material tests in each case (if the schedule for the construction project is tight). These also associate the results with theories and practices.

Section 2 reviews the literature related to the subject (i.e., the primary research question) and the methodologies to conduct this study, as well as the main data analysis methods applied. Section 3 presents the results, and Section 4 discusses the primary positive outcomes in terms of the application of methods, the practical insights gained, and the theoretical aspects of the study. Finally, Section 5 concludes this paper.
