1. Introduction
As the burgeoning offshore wind power industry grows, so too do the technical demands on the metal frames and primary structures that sustain them. These structures are under enormous dynamic stresses due to the effects of their moving parts, wind, currents, tides, and waves. Within this sector, the quality control of the welded joints of these structures is of the utmost importance, considering that welding defects are widely considered as potential spots for structural failure initiation [
1]. The study based on fracture mechanics of parameters such as CTOD, in the context of crack nucleation and fatigue crack growth, has become essential for manufacturers, designers, classification societies, and inspectors. The fatigue life calculation occupies a prominent place in codes, standards, and rules [
4]. Such fatigue analysis is based on “rule-based” methods or direct calculation based on Stress-Cycles data models, determined by fatigue testing of the considered welded details and linear damage hypothesis.
As this approach is rarely possible (due to the full fatigue test required for the welded details), the fatigue analysis may alternatively be based on fracture mechanics. The classification societies’ crack growth models use the classic formulation of the Paris–Erdogan law, with developments for the classical plastic hinge models (firstly developed by the British Standards Institution and published in 1979). According to the vast work of Zhu and Joyce [
5], the stress intensity factor K [
6], the crack tip opening displacement (CTOD) [
7], the J-integral [
8], and the crack tip opening angle (CTOA) (developed for thin-walled materials) are the most relevant parameters used in fracture mechanics. Out of these various parameters of the interaction of the materials with the formation and propagation of cracks or defects, the critical crack tip opening displacement (CTOD) at a given distance from the crack tip is the most suited for modeling stable crack growth and instability during the fracture process [
9]. Currently, the tests are carried out by discarding the plastic hinge model and adopting the J-conversion, using recognized standards such as the (British Standard) BS-7910, (American Petroleum Institute) API-579, and (American Society for Testing and Materials) ASTM E1290.
CTOD testing requires the preparation of a notch with a specific geometry that promotes the nucleation of a stable and uniform crack in a delimited area [
10]. The crack grows under the action of dynamic mechanical forces that are generally transmitted with huge oleo-hydraulic equipment and controlled by precision extensometers. The uncertainty of the test methods, as well as the sensitivity to any internal defect, make it necessary to carry out several of these tests to guarantee representative values.
The CTOD tests are expensive, as they require significant investments in testing machinery, software, expertise, and outsourcing of services [
11]. The destruction of large quantities of ad hoc welded material is also required (ASTM E1290-08e1c (2008) [
12]). Additionally, deadlines offered by the testing laboratories exceed the average for other quality control tests in welded unions. Considering the case of welded joints, in addition to the properties of the base material, dozens of other variables related to the welding process could affect the features of the final welded material. Therefore, if the CTOD test result does not fulfil the requirements, it is very difficult for technicians to infer which changes in the variables could lead to an improvement of the CTOD results.
The aim of the present work is to evaluate the possibility of using multivariate mathematical models to correlate the CTOD parameter with other test results that are simpler and cheaper to measure, and also well known by the parties involved.
3. Modeling
We observed a set of K variables in a set of n elements of a population and wanted to summarize the values of the variables and describe their dependency structure. Each of these K variables is called a scalar or univariate variable and the set of these K variables form a vector or multivariate variable. All these values can be represented in a matrix, X, of dimensions , called a data matrix, where each row represents the values of the K variables over the individual i, and each column represents the corresponding scalar variable measured in the n elements of the population. In the element , i denotes the individual and j is the variable.
Next, we proceed to the multivariate analysis of the observations. To do this, we calculate the vector of means
of dimension p, whose components are the means of each of the
p variables and the covariance matrix. From the matrix of centered data
the symmetric and positive semidefinite matrix of covariance
is calculated.
The objective of describing multivariate data is to understand the dependence between the objective variable and the explanatory variables. For this we studied:
The relationship between pairs of variables;
Dependence between the objective variable and all the explanatory variables;
Dependence between the objective variable and the explanatory ones, but eliminating the effect of some of them.
The pairwise dependence between the variables is measured by the symmetric and positive semidefinite correlation matrix
so that there is an exact linear relationship between the variables
It may happen that there are variables that are very dependent on others, in which case it is convenient to measure their degree of dependence. Assuming that
is the variable of interest, and calling
the variable used to estimate
, the best linear predictor from the other variables, called the explanatory variables, is:
where the parameter
is determined through the data that we have at our disposal. The problem is finding the set of parameters that minimizes
, leading to
and defining
, we have
, and Equation (8) can be written as follows
Since minimizing
is equivalent to minimizing
, by deriving this sum with respect to the
parameters, we obtain a system of
equations that can be written as follows:
Equation (9) indicates that the prediction errors must not be correlated with the explanatory variables, so that the covariance of both is zero, or else the residual vector must be orthogonal to the space generated by the explanatory variables. By defining the matrix
, of size
, obtained by eliminating the column in the matrix
corresponding to the variable that we want to predict,
, the parameters are calculated by the normal equation system as follows
and Equation (10), with these coefficients, is the multiple regression equation between variable
and the remaining variables
To express this result based on the
variables of Equation (8), we must consider
The square of the multiple correlation coefficient (which can be greater than, less than, or equal to the sum of the squares of the simple correlations between variable y and each of the explanatory variables) [
49] between the variable
and the rest is
is the j-th diagonal element of the covariance matrix S and
is the
j-th diagonal element of the
matrix, which represents the residual variance of a regression between the
j-th variable and the rest. As each time a variable is added to the model the number of degrees of freedom is reduced and the adjustment is increased, it is necessary to make a correction of this coefficient and calculate the adjusted
n is the total number of observations and
k is the number of model variables; that is, the same calculation is made, but weighted by the degrees of freedom of the residuals,
, and the model,
The R-squared is a descriptive measure of the predictive capacity of the model, and for a single explanatory variable is the square of the simple correlation coefficient between the two variables.
3.1. Previous Data Processing
Correlation coefficients were determined among the study variables. A high degree of correlation between toughness (CVN) and microstructure was observed, which was strongly supported in the bibliography. This relationship also depends on other variables that have not been considered in this experiment, such as temperature, tension state, or specimen geometry. Therefore, this particular relation between both variables is exclusive to this experiment and cannot be generalized.
Figure 2 shows the correlation and scatterplot diagrams between all the variables (objective and explanatory) taken two-by-two. The kernel density estimation (KDE) representation is also a way to estimate the probability density function of a random variable. A strong correlation can be observed among the CTOD and the explanatory variables, particularly toughness, microstructure, and chemical composition. Excluding the chemical composition, other variables do not seem to follow a normal distribution.
Figure 3 shows the quantiles of input samples (explanatory variables) versus standard normal quantiles (theoretical quantiles from a normal distribution). If the distribution of the explanatory variable is normal, the plot will be close to linear. Except for the chemical composition and toughness, the rest of the independent variables (the mechanical strength, called M. Strength onwards, and microstructure) do not seem to follow a normal distribution, so it would be advisable to make a transformation (for example, logarithmic type) before carrying out a multiple regression analysis. This can be explained by the observation of the KDE of the corresponding variable in
Figure 2, where the M. Strength variable shows a positive skewness towards lower values and the microstructure shows a slightly bimodal distribution (this effect is eliminated through a logarithmic transformation after the outlier exclusion).
With the aim of discarding the outliers that could influence observations, the Mahalanobis distance was used [
50] for their detection and ten complete data sets were excluded (14%).
3.2. Linear Regression Models
3.2.1. Linear Model 1
Y is considered as the study variable that may be linearly related with
K explanatory variables
(regression coefficients). A multiple linear regression model can be written as:
where e is the difference between the fitted relationship and the observations [
Using Equations (12) and (13), the values of the parameters are calculated. In
Table 4, the coefficients for the multiple linear regression (Equation (16)) can be found. It can be seen that all coefficients are significantly different from zero, but toughness is the variable with the highest absolute value. In this case, the number of observations is 63, and the error degrees of freedom is 58.
The root mean square error (RMSE) is 0.216, which when compared to the range of the values of
Y results in:
Which provides an estimate of the possible error obtained from the real values of the CTOD variable. In
Figure 2, it can be observed that the correlation coefficient between CTOD and toughness is 0.799. Considering all the independent variables the R-squared (RSQ) is 0.866, and the adjusted RSQ value is 0.856, so there is a limited improvement from considering the CTOD toughness (or CTOD microstructure) correlation.
Henceforth, for the models shown, the t-statistic (tStat) and F-statistic will be calculated and included. The first of them, tStat, calculated as estimated or standard error (SE), tests the null hypothesis that the corresponding coefficient is zero against the alternative that it is different from zero. To evaluate this coefficient, the corresponding p-value associated with a Student´s t distribution (for n observations) is calculated and compared with a confidence interval of 95%. If the p-value is less than 0.05, we can conclude that the variable is significant for the model.
Analogously, the F-statistic, calculated as:
tests the null hypothesis that one or more of the regression coefficients are significantly different from zero (meaning a significant linear regression relationship exists for the whole model). This value is compared with an F-distribution for a given confidence interval (95%) and is evaluated in the same way as the t-statistic (associated p-value less than 0.05). The F-distribution is more appropriate than Chi-square tests for small data sets [
Two different methods were used to verify that the obtained model was independent of the chosen data population: cross-validation and training-test samples.
The cross-validation was calculated with the
LeaveMout method (see
crossvalind Matlab function) with an M value of 1, which randomly selects one value and excludes it from the evaluation. This process is repeated 50 times and helps to verify that the statistical analysis is independent of the data set. The number of observations was 62, with
, and adjusted
. The results are shown in
Table 5.
The training test was done considering a set of 500 executions of samples from 50 observations (randomly selected from the whole data set) and test samples from 13 data sets. The averages of all RMSE and RSQ results are and , respectively.
Table 6 contains the values of RSQ and RMSE obtained with the reference model (linear model 1), cross-validation, and training test. As the values are similar (less than 5% discrepancy), we can conclude that the relation between the CTOD and the explanatory variables is independent of the data set.
3.2.2. Linear Model 2
The significance of all variables was checked for all the explanatory variables, but it was observed that the microstructure was highly correlated with toughness. For that reason, a new model (linear model 2) was proposed, where the microstructure was eliminated from the original model.
Table 7 shows the values of the parameters calculated for linear model 2, and the adjustment obtained (
, and adjusted
) was similar to the previous one (linear model 1).
3.2.3. Linear Models 3 and 4
As the value of parameter
(coefficient of the mechanical strength) in linear model 1 was small compared to the values of the rest of the parameters, it was that the corresponding variable be eliminated to obtain a new model (linear model 3), considering that its contribution to the value of the CTOD variable was small. The values of the coefficients of linear model 3 are represented in
Table 8.
The quality of the adjustment is almost similar to that of the model with the four independent variables, with and .
Figure 4 shows the residuals of linear model 3, which can be considered as normally distributed.
Finally, a new model (linear model 4) is adopted considering the square of the first variable (
), and the contribution of the independent variables to the variable CTOD is checked (see
Table 9). In this case, the coefficient of determination
is larger than in the purely linear model.
Other tests have been done with different interactions between variables, but they do not improve the results.
3.3. Multivariate Adaptative Regression Splines (MARS)
Multivariate adaptive regression splines (MARS) is a non-parametric modeling method that extends the linear model (incorporating nonlinearities and interactions). It is a generalization of the recursive partitioning regression (RPR), which splits up the space of the explanatory variables into different subregions. MARS generates cut points for the variables. These knots are identified through baseline functions, which indicates the beginning and end of a region.
In each region in which the space is divided, a base linear function of one variable is adjusted. The final model is constituted from a combination of the generated base functions [
The general expression of the model is:
ci is the constant coefficient and
Bi is the base function.
A MARS model was applied using cubic splines. This method considers nonlinear relationships among the CTOD variable and the explanatory ones using a spline adjustment, obtaining a
. With a training sample of 50 data sets and test sample of 13, the results were
. Additional information may be found in
Figure 5, where the MARS model is plotted for two of the explanatory variables and two anaylsis of variance (ANOVA) functions (this visualizes the contribution of the ANOVA functions for the pairs CTOD-M. Strength and CTOD-microstructure in the MARS model).
Again, these values do not improve on those obtained with previous models.
3.4. Other Models
Other models were studied in order to observe a possible improvement with respect to the initial model (linear model 1).
In the first place, we proposed a generalized linear model considering a Gaussian distribution and an identity linking function, the parameters for which are included in
Table 10 (Generalized linear regression model 1—GLM1). It is noted that the p-value of the mechanical strength is greater than 0.05, therefore, the variable
(mechanical strength) may not be significant.
For this reason, a generalized linear model was calculated without the mechanical strength influence (GLM2), whose results are shown in
Table 11, with
obtained. These values do not improve on those obtained with previous models.
In the second place, we considered a regression tree model [
54]. To make a prediction for a given observation, we used the mean (or the mode) of the observations that were in the same region of the multidimensional space of predictors. The rules that were used to divide the predictor space can be represented as a tree [
The order of importance of the predictive variables, from highest to lowest, is microstructure, toughness, mechanical strength, and chemical composition. Therefore, the variable microstructure is the one that provides the value that maximizes the information about the dependent variable (CTOD) if it is smaller than 0.26, otherwise it is the toughness that carries more information. Nevertheless, the values associated with each subtree for the training sample (13) are between 3 to 7 times bigger than those of the test sample (50), which indicates bad behavior of the model.
5. Conclusions
The use of multivariate analysis has been proven viable for relating complex fracture mechanics parameters to well-known material properties. The industrial suitability of the methodology depends on the experimental set, specifically the availability of samples, the number of tests, and the choice of variables.
These chosen variables are significantly related with the CTOD (see p-value for linear regression model 1). Also, there is well-known experience within the manufacturing industry relating these variables with actual changes during the welding process. As an example, there is a wide background of knowledge on how the shielding gas, the welding speed, or the bead scheme affect the grain size or the hardness of a given welded joint. Using the proposed model, it is possible for the industry to transfer this knowledge on how these variables may affect the CTOD value.
The final model is precise and functional, with an estimated error of ~10% (within the limits covered by the experimental set). This error is compatible with the current uncertainty of the CTOD testing process. Besides, the model is not dependent on which subgroup of data is used for the modeling process. It is proposed to use this final model predictively, using the results of the tests for the explanatory variables (it is cheaper, simpler, and more available than the CTOD) to compute the CTOD value estimator. If this value (considering the mentioned error) is greater than the critical value (acceptance criteria) specified in the design code, rule, or standard, the expensive CTOD test can be dispensed.
The usefulness of the model has been proven within the limits of the experimental set for offshore steel welded joints of high thickness. Nevertheless, the influences of other variables not explicitly considered in this work were not tested, even for the mentioned category, and are out of the scope of the presented model. Future developments of the model could include, among others, the influence of testing temperature, different positioning or shape of the notch, post-weld heat treatments, or type of failure category (brittle, ductile-brittle, and ductile).