**4. Discussion**

### *4.1. Comparison of Methods in the Context of Uncertainty*

The comparison of similar model-based test methods (Equations (1) and (2)) indicates a large disproportion in reliability of the values' uncertainty determination.

What is important is that the majority of uncertainty components in tests in accordance with ISO 527 [31] can be controlled, as opposed to the bond strength test. This can be observed by comparing Figures 4 and 5. Points min1–max1 indicate the uncertainty extent, which results from the accuracy of the measuring devices used (measurement of sample strength and geometry). This is the measurement uncertainty and is epistemic in its very nature (according to the Walker matrix [14]). The measurement uncertainties described in the test method requirements outlined by the author of the method are normally used (as is the case with this paper). This is the easiest way and provides the most consistent uncertainty data. In some cases, this uncertainty can be overestimated, as the actual measuring device used in a given laboratory may be more accurate than the one referred to in the standard. Thus, more accurate devices can also be used in order to reduce uncertainty.

The component *A*σ presented in this paper is the uncertainty of aleatory nature associated with external and internal interactions (the test model inherent), which remain outside the scope of control. If introduced, this component determines the limits of the min 2 and max 2 interval in Figures 4 and 5. The impact of individual components on uncertainty is shown in Figure 6. While the uncontrolled component (*s*∗ = ∂σ <sup>∂</sup>*A*σ*uA*σ) in the test carried out according to ISO 527 [31] is of the same order as the other components, in the case of the bond strength test, the uncontrolled component is many times higher, although there are also some results with a randomly smaller dispersion.

The presence of large-scale uncontrolled components of variable values in bond strength tests may undermine confidence in both the test results and the reported uncertainty. The results' confidence is expressed and demonstrated by the error risk assessment. Type I errors are directly related to the production costs increase. Type II errors may affect the material users' safety. In the above case, there may be a high risk of type III errors related to the test model, which sets the assessment criteria incorrectly.

Compared to the bond strength test, the tensile stress test method according to ISO 527 demonstrates much better precision (repeatability and reproducibility). Uncertainty components that result from uncontrolled interactions contribute less towards total uncertainty. Thus, it can be concluded that the tested value definition in the tensile strength minimizes uncontrolled interactions, in contrast to this definition of bond strength. This is related to the fact that the bond strength test is more complex, and involves more operations and more factors affecting variability.

**Figure 6.** Contributions of input standard uncertainties: (**a**) *D*, *F* and *A*σ contributions to bond strength combined uncertainty. Sets of bars 1 and 2 have been obtained for two laboratories with different actual standard deviation (II approach), set 3 was obtained for laboratory 2 which used reproducibility standard deviation. *D*\* is the ∂σ∂*D uD*, *F*\* is ∂σ∂*F uFB*, and *A*\* is ∂σ ∂*A*σ *uA*<sup>σ</sup>. (**b**) *a*, *b*, *F*, and *A*σ contributions to ABS tensile strength combined uncertainty. Set 1 is based on reproducibility standard deviation from ISO 527-2; set 2 is based on reproducibility standard deviation obtained in DRRR inter-laboratory experiment; set 3 is the actual standard deviation of an exemplary laboratory. *a*\* is the ∂σ∂*a ua*, *b*\* is the ∂σ ∂*bub*, *F*\* is ∂σ∂*FuFB*, and *s*\* is ∂σ <sup>∂</sup>*A*σ*uA*<sup>σ</sup>.

### *4.2. E*ff*ects Related to Material Assessment*

According to current JCGM guidelines [18], there are three main ways to account for uncertainty in conformity assessment test results with the tolerance limits delineated in the material requirement:


**Figure 7.** Methods of conformity assessment with the use of uncertainty values, based on the example of the lower tolerance limit *TL*. (**a**)—guarded acceptance; (**b**)—simple acceptance; (**c**)—guarded rejection. *U*—expanded uncertainty value.

In the case of applying the principle of guarded rejection, sometimes used in factory production quality assessment, the lower acceptance limit might be (0.08 − *U*) (MPa). For uncertainty estimated using approach II by laboratories 1, 2, and 3 it is: 0.014, 0.069, and 0.063 MPa respectively (adhesive "n"). Note that the first value is more than five times lower than the tolerance limit (0.08 MPa).

If guarded acceptance is used, a sample tested by any of 1–3 laboratory which estimates uncertainty according to Approach III, would have to find an adhesion of 0.15 MPa and, thus, almost double the

required value (0.08 MPa). This would involve a significant change in the adhesive composition—in polymer cement adhesives this would most likely result in a significant increase of polymer amount in relation to cement. Table 4 presents decisions about rejection or acceptance of adhesive used as insulation material bonding to a wall, depending on the laboratory approach used for uncertainty evaluation and the way of using uncertainty in the assessment.

**Table 4.** Decisions about rejection or acceptance of the adhesive used for bonding of insulation material to a wall, depending on the laboratory approach used for uncertainty evaluation and the way of using uncertainty in assessment (based on the example of the "n" adhesive).


*4.3. Di*ff*erence between Initial Performance Tests and Routine Compliance Tests on the Background of Validation of Test Methods*

Validation before method implementation is normally used in order to confirm the test method suitability for a particular application. The first validation stage is setting the requirements a test method should meet. If the method is supposed to provide the material assessment against established criteria, e.g., tolerance limits (*TL*—lower limit, *TU*—upper limit, tolerance range width *T* = *TU* − *TL* ) assigned to a given material class, this parameter is described as

$$\alpha = \frac{\mathcal{U}}{T} \tag{10}$$

the measurement accuracy factor. The factor should be of a certain value in order to make it possible to evaluate whether the criteria based on the results of the measurement are met. Metrology, in a customary manner, assumes that this factor is not greater than 0.1 for greater responsibility measurements and 0.2 for measurements of smaller responsibility. In the bond strength test, where *T* = 0.08 MPa, the uncertainty should be around 0.008 MPa (or 0.016 MPa). Uncertainty at the level of 0.1*T* would only be achievable when approach I is used, i.e., the budget includes component uncertainties arising only from requirements regarding the accuracy of measuring devices. The uncertainty of 0.2*T* is sometimes achievable when the dispersion of results budget resulting from repeatability is included. Given the large differences in the standard deviations obtained in the tests carried out by different laboratories for different materials; however, the uncertainty estimated seems unreliable.

The examples above clearly indicate the test method's deficiencies when used for material evaluation; these deficits are manifested by result differences. From a mathematical point of view, a rational assessment becomes impossible. In addition, assessment is unreliable because there are no unified rules for uncertainty estimation and no arbitrary assessment principles adopted for all parties (Table 4).

As already mentioned, the reasons for the test method deficiencies are mainly related to the large number of interactions which impact the result. This stems from the attempts to develop a method similar to natural conditions.

The substrate to which the tested adhesive is applied in order to assess adhesion is one of these interactions. The following cases can be considered:


In the test methods cited in this paper, the second solution is used. This means that the result variability related to the impact of the substrate is limited. The assessment of how the adhesive will behave in practical application on the actual substrate; however, remains outside the scope of this model.

If R denotes an abstract difference value between the material performance in actual conditions (*Yu*) and the test result (*Yt*), the use of the actual substrate for tests results in *R* reduction and, thus, increases the testing costs (many substrates for different material applications). It may also increase uncertainty. This stems from the fact that there is no specification regarding the substrate type which may cause discrepancies between laboratories' choice regarding test substrate types which approximate the substrate of intended use.

The shape of the adhesive layer on the substrate is another interaction example where result variability has been limited. For instance, ETAG 004 [30] based tests require the use of a 50 mm side square stamp in the adhesive removal test. For adhesive property tests carried out according to previous recommendations in Poland, the method described in point 2 was used—a round stamp of 50 mm diameter. Theoretically, this should not affect adhesion which, by definition, is the force related to a surface unit. The Building Research Institute's practical tests, however, demonstrated different results for the same adhesives and substrates (Table 5).


**Table 5.** Medium bond strengths obtained according to ÖNORM (round stamp) and ETAG 004 (square stamp).

The surprisingly large average value differences shown in Table 5 may also be attributed to the poor repeatability of the method, as discussed in Section 3.1.1.

The test method requirements impose specific stamp dimensions and shapes. The adhesive layer at the actual construction site is applied to several places on a foamed polystyrene board. The adhesive layer shapes and surfaces are uncontrollable (or controlled within wide tolerances, as the instructions are not very precise and are related to workers' practical experience). Considering the above variable, there is no rational possibility to establish the relation between material performance in actual conditions (*YU*) and test results (*Yt*). In the case of the stamp, the imposed shape may only reduce the dispersion of test results.

It is apparent that whenever the test method does not perfectly duplicate the actual conditions of use, the variability increase related to the test method itself also increases the risk, when the test results are transferred to the conditions of use. In-test interactions and actual conditions interactions do not compensate, because they are random. Therefore, the final properties variation in actual conditions of use must include the test result variability resulting from the test interactions, as well as the variability resulting from the actual conditions of use. This can be expressed symbolically with equation:

$$D^2(\mathcal{R}) = D^2(\mathcal{Y}\_\mu) + D^2(\mathcal{Y}\_t) \tag{11}$$

where *D*<sup>2</sup> means variance.

Hence, the greater the variability of the test method itself, the greater the variability when the method is transposed to the actual conditions of use and, thus, the greater is the risk of erroneous assessment.

It should be noted, however, that both *Yu* and *U*(*Yu*) are virtually unknowable and, thus, can be only the subject of modeling, burdened with further uncertainties of the model and its implementation. As for the test result and its uncertainty, the examples presented in this paper also demonstrate the important property of tests methods consisting in the 'unknowability' of 'true' results. The issue of unknowability in relation to measurements and measurement uncertainty are extensively discussed by Grégis [36]. Unlike measurement methods, the level of unknowability of test results is even greater—there is no 'true' value. There is no reference value (RV) either. If one is capable of imagining a 'real' length value (although this 'real' is still in the sphere of abstraction) as there are reference values to which the measurement results' trueness can be referred to, it is very di fficult to imagine RV for test methods, especially for those used for construction materials. The bond strength tests seem to be one of the simpler examples where it is not completely unimaginable to have a test standard (although it would be destroyed with each test). It would be extremely di fficult, however, to create a test standard of a curtain wall resistance to heavy body impact or of a fire door resistance. Measurements are included in each of the tests mentioned above (e.g., force measurement, the weight of the impact body, temperature), which may have RVs; however, these are only components whose influence on the final test result is unknown in most cases due to the very large number of other interactions.

Considering the above, the unknowability of properties in the actual conditions of use ( *Yu*), the test result ( *Yt*) and the variance of these values undermine the possibility of material assessment by means of laboratory tests. There are laboratory tests, however, which are the basis for material assessment. Material property criteria for actual use are, thus, developed. This is the domain of reality modeling. Attempts to achieve the best convergence between the model and reality are universal. This should also apply to test methods. The actual purpose of the test, however, should be considered.

As already mentioned, bringing the test model closer to actual conditions reduces the *R* value (Figure 8a); however, it may increase test result uncertainty with the increased number of interactions that impact on the result. The ideal situation we seem to be striving for is both small test result uncertainty and maximum approximation to actual conditions. As shown in Figure 8b, this is practically impossible, because as *R* and *D*2(*Yt*) are concurrently reduced, the point is reached where the common part of fields ( *Yu*, *D*2(*Yu*)) and ( *Yt*, *D*2(*Yt*)) is also reduced.

Thus, this optimization issue can only be an objective function (Figure 8c). If the comparability of the test results of individual materials carried out by di fferent laboratories is the objective, the minimization attempt should characterise the *D*2(*Yt*) test method. If the method is of a cognitive nature, where reality modeling and 'revealing the truth' is the objective [37], it is necessary to minimize the *R* value. Following this logic, the initial material test in the design phase should use a di fferent method from the routine test method used in the assessment and verification of constancy of performance (AVCP) [38] or any conformity assessment with the standards.

The initial material and elements test should be a model of reality and should simulate actual conditions. The individual application of such methods reduces uncertainty mainly to aspects related to modeling. Verification and validation of models is a slightly di fferent issue than the validation of simple, repeatable test methods. The uncertainty inclusion in model verification is described in the publications of Scheiber et al. [39] and Oberkampf et al. [40].

**Figure 8.** Illustration of the discrepancy between the material's properties in actual conditions of use (*Yu*) and the test result (*Yt*), *R* = *Yu* − *Yt*. *D*2(*Yt*) —variability of test results. *D*2(*Yu*)—symbolic variability of actual conditions. (**a**) Test results uncertainty reduction which results in a difference increase when related to actual conditions. (**b**) Simultaneous *R* and *D*2(*Yt*) minimization. (**c**) Tests modeling depending on the purpose.

A routine test should only be a model of a particular aspect of reality simple enough to deliver predictable results. In the conformity assessment tests in AVCP processes, the test development phase should include the incorrect assessment risk estimation resulting from the simplified test model, as well as the incorrect assessment risk resulting from the method precision. The rational level of required uncertainty *U* determination should be the responsibility of the organization which develops the method.

The test method principles should therefore be formulated as:

$$u\_c^2(y) = \sum \left(\frac{\partial f}{\partial \mathbf{x}\_i}\right)^2 u^2(\mathbf{x}\_i) + s^2 < (\alpha T)^2 \tag{12}$$

where:

> α—targeted accuracy factor related to standard uncertainty and tolerance limits

*T*—material properties tolerance range

*f*—function which describes the measurement model. All components related to the test devices used and other factors that are controlled can be included.

$$
\mu\_k^2 = \sum \left(\frac{\partial f}{\partial \mathbf{x}\_i}\right)^2 \mu^2(\mathbf{x}\_i) \tag{13}
$$

The method suitability condition can, therefore, be represented by equation

$$s^2 < \left(aT\right)^2 - u\_k^2\tag{14}$$

which would limit the dispersion of results not attributed to the specific, known components of uncertainty described by Equation (13).

This validation method can be relevant for the organization which develops the method in order to ensure that the reproducibility variance in the final validation experiment does not exceed value *s*2. This is also important for laboratories which would have to show a su fficiently small dispersion of results with the use of the particular method. Therefore, in specific tests the laboratories would reject results whose deviations would demonstrate a higher variance, as high probability assumption means that the variance is related to material properties rather than the test. As can be seen from the above, material assessment criteria on the basis of tests should not only relate to the values but also to their variance.

The presented approach to improving conformity test models by supplementing them with conditions regarding the required variability of results o ffers benefits. The ambiguities associated with uncertainty estimation are eliminated. Instead of presenting uncertainty associated with a high risk of error, as has been shown, the laboratories would have to demonstrate the test method's individual requirements fulfilment. This, in consequence, would lead to uniform uncertainty levels and uniform material assessments in di fferent assessment bodies.

### *4.4. Sustainable Test Methods*

Test method development, both the initial material assessment in the design phase as well as assessment routine, is based on the balance between economy (research costs) and safety (error risk). Both aspects relate to energy, raw materials consumption, waste generation, and environmental and social costs in the event of an accident.

Sustainable test methods ensure a balance between widely defined tests and evaluation costs and the material's or building's safety, reliability and stability.

Considering the general risk assessment principle:

$$\text{risk} = \text{(probability of undesirable outcome)} \times \text{(\text{gficts of undesirable outcome})} \tag{15}$$

where e ffects can be broadly defined. One should consider probability and e ffects (economic, environmental, social, and other costs) associated with unnecessary spending on the development of a well-validated assessment method: the increase of the number of samples and equipment accuracy on the one hand and risk failure as a result of incorrect assessment (product of the failure probability and failure e ffects) on the other. Such comparison di ffers among material types and applications. This explains the high expenditure and grea<sup>t</sup> attention paid to construction materials load capacity assessment methods and the limited attention paid to test methods of finishing materials. In the first case, the failure e ffects impact human safety.

If we consider the e ffects associated with building durability, raw materials, energy consumption, and repair costs, however, well-justified and validated testing and assessment methods should also be applied from a sustainability point of view in relation to materials which are less important.

Test method optimization to ensure their 'sustainability', as shown in Figure 8, can be obtained by separating test models designed to learn about material or element behavior in actual conditions from method precision problems. Thus, test methods should be divided into initial and routine.

### *4.5. Final Remarks*

A model only approximates reality. There is a common goal, however, to bring a model as close to reality as possible. This is also science's objective. Given what Czarnecki et al. [37] noted, a picture of reality is dependent on the reference system used. Construction materials engineering uses reality models and may use di fferent reference systems. The latter should be appropriate for the purpose the model has been created. A test method used to assess future material behavior in actual conditions is one of the sub-models, which constitute the final model of material properties. Real conditions, however, are associated with a full range of influences. Some of the interactions are controlled by the model, others may be partially known, ye<sup>t</sup> not included in the model structure. This knowledge has not ye<sup>t</sup> been researched. There is also a group of interactions outside the research scope when the model was created. The creation of test methods, which simulate the actual conditions of use, often leads to the following situation: too many aspects are outside the study model's control in real conditions. An attempt to clarify these aspects increases method precision but, at the same time, makes the method actually diverge from real conditions. This can be optimized by the development of a test method set used depending on the purpose: as an initial test when designing a material or as a routine test when assessing its legal conformity.

Considering the risks and the costs associated with the use of certain types of assessment methods based on tests, increased uncertainty of test results used for assessment increases the likelihood of making a wrong decision, and results in more adverse e ffects associated with that wrong decision. The associated tipping points are as follows:


Considering erroneous assessments, type III error risk cannot be ignored, as it has been demonstrated in the paper, because of non-validated test methods, lack of unambiguous findings regarding uncertainty estimation in relation to given tests, and lack of uniform rules of material assessment which account for uncertainty. All adverse e ffects of erroneous assessments also impact the environment through raw materials, energy consumption, and waste generation. Inadequate construction material durability falls short of future building users' expectations and results in higher future renovations costs.

The bond strength study example shows that inappropriate test methods, which do not ensure adequate results reliability, can increase the assessment risk area so much that the assessment becomes irrational. It should also be noted that the simple examples shown in this paper constitute only a fraction of the uncertainty problem. The study considers methods, which produce quantitative results. There is a plethora of methods, however, which produce results on a nominal scale or o ffer qualitative results (such as ageing methods for durability assessment, resistance tests, etc.) where uncertainty, and hence the risk of incorrect assessment, is extremely di fficult to estimate.

Given the importance of sustainability in all areas of human activity, it also seems reasonable to refer to this aspect in test methods in the construction sector. As shown in the paper, a sustainable test method is a method which does not generate undesirable economic, environmental and social effects because of incorrect assessments. In order to consider a method as sustainable, it must be first of all fit for the intended purpose. Whenever necessary (initial tests of innovative products), it ought to present the actual conditions of use. In other cases (methods used routinely in assessments), the method should o ffer su fficiently good precision for the assessments of materials and elements results to be unambiguous. The development of methods and understanding the relationship between initial and routine assessment and actual conditions of use are scientific issues.

The applicable and reasonable basis for the uncertainty level is an important aspect of test methods, primarily routine ones. This uncertainty level should be the same for all laboratories involved in AVCP

and any conformity assessment, established on assessment criteria and methods which take previously established uncertainty into account, and available for all stakeholders. Therefore, the sustainable test methods model should also include the uncertainty estimation procedure, which ensures a uniformed code of conduct for all those who use the method. Universal methods of estimating uncertainty can be interpreted in many ways, as have been shown in point 3, therefore, the only solution is to specify a specific and strict uncertainty assessment procedure addressed to a specific method. This procedure should be developed by the institution that develops the test method during its validation (which is currently very rare) and used by the method users. The level of controlled and uncontrolled interactions should be determined. Controlled interactions should be presented as requirements (e.g., related to the metrological properties of the equipment, the test environment, etc.). Uncontrolled interactions and its acceptable levels within a single test series under repeatability conditions, as well as within reproducibility should be evaluated during the validation experiment's. Laboratories, which use the test method, should be able to prove their capacity to carry out the test by showing a dispersion of results not greater than that given in the procedure. Then, in the case of results with a larger dispersion, a material can be rejected.

In the construction sector, there is a number of test methods aimed at construction product performance assessment; however, there is a limited number of methods which meet the conditions described above, although the method development costs (largely the costs of validation experiments) may be incomparably smaller compared to the adverse e ffects resulting from incorrect assessment. Development in this area should include better characterization (model improvement) of traditionally used test methods, as well as the development of new ones.
