Interpretable Machine Learning-Based Influence Factor Identification for 3D Printing Process–Structure Linkages

Liu, Fuguo; Chen, Ziru; Xu, Jun; Zheng, Yanyan; Su, Wenyi; Tian, Maozai; Li, Guodong

doi:10.3390/polym16182680

Open AccessArticle

Interpretable Machine Learning-Based Influence Factor Identification for 3D Printing Process–Structure Linkages

by

Fuguo Liu

^1,3,

Ziru Chen

²,

Jun Xu

⁴

,

Yanyan Zheng

⁴,

Wenyi Su

²,

Maozai Tian

^3,5

and

Guodong Li

^2,3,6,*

¹

School of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012, China

²

School of Mathematics and Computing Science, University of Electronic Technology, Guilin 541002, China

³

Department of Mathematics and Data Science, Changji University, Changji 831100, China

⁴

Department of Chemical Engineering, Tsinghua University, Beijing 100084, China

⁵

School of Statistics, Renmin University of China, Beijing 100872, China

⁶

Center for Applied Mathematics of Guangxi (GUET), Guilin 541002, China

^*

Author to whom correspondence should be addressed.

Polymers 2024, 16(18), 2680; https://doi.org/10.3390/polym16182680

Submission received: 19 August 2024 / Revised: 5 September 2024 / Accepted: 10 September 2024 / Published: 23 September 2024

(This article belongs to the Section Polymer Physics and Theory)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Three-dimensional printing technology is a rapid prototyping technology that has been widely used in manufacturing. However, the printing parameters in the 3D printing process have an important impact on the printing effect, so these parameters need to be optimized to obtain the best printing effect. In order to further understand the impact of 3D printing parameters on the printing effect, make theoretical explanations from the dimensions of mathematical models, and clarify the rationality of certain important parameters in previous experience, the purpose of this study is to predict the impact of 3D printing parameters on the printing effect by using machine learning methods. Specifically, we used four machine learning algorithms: SVR (support vector regression): A regression method that uses the principle of structural risk minimization to find a hyperplane in a high-dimensional space that best fits the data, with the goal of minimizing the generalization error bound. Random forest: An ensemble learning method that constructs a multitude of decision trees and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. GBDT (gradient boosting decision tree): An iterative ensemble technique that combines multiple weak prediction models (decision trees) into a strong one by sequentially minimizing the loss function. Each subsequent tree is built to correct the errors of the previous tree. XGB (extreme gradient boosting): An optimized and efficient implementation of gradient boosting that incorporates various techniques to improve the performance of gradient boosting frameworks, such as regularization and sparsity-aware splitting algorithms. The influence of the print parameters on the results under the feature importance and SHAP (Shapley additive explanation) values is compared to determine which parameters have the greatest impact on the print effect. We also used feature importance and SHAP values to compare the importance impact of print parameters on results. In the experiment, we used a dataset with multiple parameters and divided it into a training set and a test set. Through Bayesian optimization and grid search, we determined the best hyperparameters for each algorithm and used the best model to make predictions for the test set. We compare the predictive performance of each model and confirm that the extrusion expansion ratio, elastic modulus, and elongation at break have the greatest influence on the printing effect, which is consistent with the experience. In future, we will continue to delve into methods for optimizing 3D printing parameters and explore how interpretive machine learning can be applied to the 3D printing process to achieve more efficient and reliable printing results.

Keywords:

three-dimensional printing; interpretive machine learning; SVR; integrated learning; SHAP value

1. Introduction

Three-dimensional printing, also known as additive manufacturing, is a process of generating a three-dimensional object using a digital file. The 3D printer is equipped with different “printing materials” such as metal, ceramic, plastic, and sand, which serve as tangible raw materials. When connected to a computer, the printer can stack these “printing materials” layer by layer under computer control, ultimately transforming the blueprint on the computer into a physical object. There are various technologies involved in 3D printing, distinguished by the different materials and forming methods used [1]. Common materials used in 3D printing include thermoplastic plastics, metal powders, ceramic powders, edible materials, gypsum materials, aluminum materials, titanium alloys, stainless steel, and rubber-like materials. Based on the different printing materials, 3D printing technologies are generally categorized as fused deposition modeling (FDM), selective laser melting (SLM), digital light processing (DLP), etc. [2].

Additive manufacturing has been growing and has become a pillar in many major industries such as the automotive industry, aerospace industry, and sustainable construction. Most industrial sectors choose to utilize artificial intelligence to increase revenue and reduce working hours, and the additive manufacturing industry is no exception. The application of machine learning (ML) in 3D printing has been a focus of researchers worldwide, mainly aiming to improve the overall design and manufacturing processes, especially in the era of Industry 4.0. It is an emerging technology that optimizes systems by intelligently and efficiently utilizing products, materials, and services. Machine learning in 3D printing can reduce manufacturing time, minimize costs, and improve quality. Currently, ML has a wide range of applications in 3D printing, including design optimization, process improvement, on-site monitoring, cloud 3D printing platforms, and security inspection. ML has proven to be a powerful tool for executing data-driven numerical simulation, design feature recommendation, real-time anomaly detection, and network security [3,4].

In the literature, machine learning has been applied to print design, process optimization [5,6,7,8,9], dimensional accuracy analysis [10,11,12,13], manufacturing defect detection [14,15,16], and material performance prediction [17,18,19,20]. Print design is an important research topic that requires a comprehensive understanding of the capabilities and limitations of 3D printing technology, serving as a critical step in the workflow. ML algorithms are mainly used for feature recognition in print design. For example, Yao et al. designed a hybrid algorithm to recommend additive manufacturing features, using hierarchical clustering to identify the similarity between AM design feature groups and target components. They obtained a tree-shaped diagram that can be pruned into subclusters and trained the SVM classifier using existing industrial application instances. The trained classifier is then used to determine the cutoff line of the tree diagram, which defines the final subcluster containing the recommended AM design features [21]. Additionally, AM encourages the development of new designs, and ML algorithms have been proven applicable in this field, particularly in adjusting material properties and generating new designs. Gu et al. applied machine learning to composite material systems and demonstrated its accurate and efficient prediction of mechanical properties, including toughness and strength, surpassing the existing composite materials in the dataset [22]. Furthermore, during the development of new materials or processes, process optimization is commonly carried out. The characteristics of 3D-printed parts with varying process parameters can be obtained through AM algorithms.

Whether it is the initial print design, feature selection prediction, mid-term process parameter optimization, or troubleshooting and process monitoring during technical operations, the application of ML technology in 3D printing is becoming more mature and can effectively guide actual production. Understanding the relationship between different printing parameters is crucial for optimizing the 3D printing process (extrusion, injection molding, and vat polymerization). However, there is a lack of overall explanation behind the results of machine learning, especially in design optimization, where the identification and selection of optimization parameters lack explanation, including the extent to which scaling adjustments affect printing results. In the study by Jin, ZQ, Zhang, ZZ, et al., a convolutional neural network method was used to detect defects in transparent hydrogel-based bioprinting materials based on layered sensor images and machine learning algorithms, inspired by cooperative game theory. Advanced image processing and enhancement techniques were utilized to detect extracted small image blocks, resulting in high accuracy in anomaly detection. With the prediction of various anomalies, the filling pattern category and location information on the image patches can be accurately determined [23]. Other fields and printing stages rely heavily on ML technology, but there is a significant lack of explanation regarding the application and results of ML technology. Therefore, this paper focuses on the 3D printing of polymers, exploring the influence of material formulation and physical parameters on printing results and providing explanations.

Given the limitations of poor interpretability in the application of ML technology across various fields, and drawing on innovation practices in some disciplines, the literature suggests that the theory of interpretable machine learning is both feasible and logical, particularly with regard to SHAP theory. SHAP is an additive interpretation model constructed by Lundberg in 2017, inspired by cooperative game theory. Its core involves calculating the SHAP values of each feature to reflect their contribution to the predictive ability of the entire model [24]. Building on this theoretical foundation, the ML-SHAP model has become increasingly mature in application, such as the use of XGBoost-SHAP by Dong et al. [25] to explain the relationship between driving behavior and vehicle emission levels, and the use of XGBoost-SHAP by Liao et al. [26] to explain the main factors determining athlete value. The XGBoost-SHAP model has already been applied to quantitatively analyze the contribution of influencing factors.

This paper aims to address the lack of correlation screening and optimization capabilities between driving factors and target variables in previous models, resulting in poor interpretability or even a lack thereof for some influential factors. To do so, we establish an interpretable data-driven model, optimize data combinations, and improve model interpretability. Using 3D printing data obtained from the Polymer Processing Laboratory at Tsinghua University’s Department of Chemical Engineering as an example, we use a combination of machine learning (ML) modules and interpretable SHAP modules to calculate the contribution of each driving factor to the 3D printing result. This aims to provide a basic foundation for identifying and explaining the relevant influential factors in 3D printing. The general research idea of this paper is shown in Figure 1.

2. Research Data and Methodology

2.1. Description of the Dataset Used

This study establishes an interpretable data model, the ML-SHAP model, to explore the correlation between material formulation and physical performance indicators and printing effects, in order to improve model interpretability. The ML module is coupled with the SHAP module to calculate the contribution of each driving factor to the printing effect, using 3D printing data from the Polymer Processing Laboratory, Department of Chemical Engineering at Tsinghua University as an example to provide a foundation for identifying and explaining relevant influential factors in 3D printing. The evaluation indicators for 3D printing performance recorded in 2018 and 2019 are average length deformation rate, average width variation rate, average thickness variation rate, and average warpage. The indicators for 2020 are spline volume and spline warpage, while those for 2021 are bonding strength, spline volume, and spline warpage. Considering the different printing parameters and evaluation indicators at different times, this paper focuses on analyzing the spline warpage indicator for 3D printing.

The data collected from 2018 and 2020 were consolidated to form the final dataset, including five formulation parameters and physical performance indicators—PLA (%), DR 4468 chain extender (CE), experimental situation of twin-screw blending, die swell ratio, elasticity modulus, and impact strength—as the feature set for machine learning training, and one evaluation indicator, spline warpage, as the label value for the learner. For the performance indicators, here is a brief explanation: Firstly, PLA stands for “Polylactic Acid”, a biodegradable material widely used in 3D printing. In this study, PLA refers not only to the material itself, but also to its percentage content in the printed material, which is denoted as “percentage content of polylactic acid material”. Next, DR 4468 chain extender is a chemical additive specially used in polymer materials, mainly used to improve the molecular weight and intermolecular crosslinking degree of polymers. In 3D printing, this chain extender can enhance the mechanical properties of printed materials, such as toughness and ductility.

Twin-screw extruder: HK26 model, screw diameter 26 mm, length-to-diameter ratio of 40, maximum main unit speed of 600 rpm, manufactured by Nanjing Keya Chemical Complete Equipment Co., Ltd., Nanjing, China.

The die swell ratio (DSR) refers to the cross-section expansion ratio of the material after extrusion mold. This phenomenon is usually related to the rheological properties of the material, especially the elastic recovery of the material after shear and pressure release. In 3D printing, the control of extrusion expansion ratio is very important for printing accuracy and interlayer bonding. By optimizing the printing parameters and material formulation, the extrusion expansion ratio can be adjusted to obtain a more accurate and uniform print layer.

Dual-nozzle industrial-grade 3D printer: UP350 D model, single nozzle build area of 350 mm × 350 mm × 350 mm, dual nozzle build area of 335 mm × 335 mm × 350 mm, manufactured by GuoHang Technology Co., Ltd., China. For this 3D print, we used the self-made filament with a nozzle diameter of 0.4mm, a layer height of 0.2 mm, and a print speed of 50 mm/s. The print temperature was set to about 200 °C, with a bed temperature of 60 °C. The infill was 100% with a grid pattern, and supports were generated automatically. Retraction was set at 6mm with a speed of 40mm/s. The print orientation was optimized for minimal support material, and a heated bed with an adhesive sheet was used for first-layer adhesion. A total of 23 valid data rows were collected from the laboratory data, with a test dataset selected at a ratio of 25%, resulting in 6 and 17 samples for the training and test datasets, respectively. Table 1 shows the input data description collected during the training and testing phases.

As is well known, the correct distribution of input variables can affect the performance of the model. Input variables are composed of main input parameters that strongly affect the 3D printing effect. The total number of data points, minimum value, mean value, standard deviation, maximum value, and percentiles for these input variables are shown in the table below. The 25th, 50th, and 75th percentiles are used to measure the distribution of data. These values indicate the position of a specific percentage of observed values in the data. The 25th percentile, also known as the first quartile (Q1), indicates that 25% of observed values are less than or equal to this value; the 50th percentile, also known as the median, indicates that 50% of observed values are less than or equal to this value; and the 75th percentile, also known as the third quartile (Q3), indicates that 75% of observed values are less than or equal to this value. The total number of data points, minimum, mean, standard deviation, 25%, 50%, 75%, and maximum values for these input variables are described in the table.

In addition, before machine learning training, this paper initially determines the validity of the research by drawing the multiple correlation matrix and provides guidance for the subsequent factor identification. The Pearson correlation coefficient is used to measure the strength of the linear relationship between the two variables. The formula for calculating the correlation coefficient is

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2} \sum {(y_{i} - \bar{y})}^{2}}}

;

x_{i}

and

y_{i}

are the observed values of the two variables, respectively, and

\bar{x}

and

\bar{y}

are their average values. Moreover, this study follows the common practice in statistics and defines the correlation coefficient greater than 0.7 as high correlation. This threshold is based on the consensus of a large number of statistical analyses and studies, among which MAO Shisong also mentioned the calculation and interpretation of correlation coefficients in Probability Theory and Mathematical Statistics [27], which provides us with a theoretical foundation. During the experiment, we collected data on various input variables during the 3D printing process, including PLA content, elastic modulus, and other chemical and physical properties. It was imported into a Python environment, cleaned and standardized, and the above statistical observations were made. The corrcoef function was used to calculate the Pearson correlation coefficient, and the heatmap function of the seaborn library was used to generate the heatmap according to the calculated correlation coefficient matrix. The multiple correlation matrix (heat map) of the studied input and output points is shown in Figure 2. Different colors represent different correlation values.

It can be observed that (1) the correlation between PLA and elasticity modulus with warpage is relatively high, with a correlation coefficient of 80%; (2) there is a high degree of multicollinearity among PLA, elasticity modulus, and warpage; (3) the correlation coefficients between ADR 4468 chain extender, twin-screw blending experiment, and die swell ratio with warpage are not only small but also weakly related to the other three input variables. These results are consistent with previous research findings, where PLA content primarily characterizes the chemical properties of printing materials, while elasticity modulus and impact strength describe physical property features. The chemical properties of printing materials play a decisive role in determining the physical properties of printing results, and physical property features will determine the spline warpage results of 3D printing. Therefore, it is appropriate to use all input variables to improve model accuracy and confirm the impact of each variable on the estimated value of compression strength.

The influence of input variables on 3D printing results, specifically spline warpage, was visualized using a hexagonal contour plot, as shown in Figure 3. The color regions with higher intensity indicate the most useful data points for achieving higher strength characteristics, representing a concentrated range of parameter values that lead to the desired printing outcome in practical experiments. In this study, three parameters with relatively high correlation coefficients were selected from the heatmap: PLA content (0.82), elasticity modulus (0.82), and breaking strength (−0.82). When plotting their relationships with the output parameter using the Seaborn Python package, a contour plot was generated to highlight the desire for smaller spline warpage values. Each input variable has an optimal concentration for achieving this goal. Darker colors indicate that the corresponding spline warpage values are closer to zero under the current parameter settings. It can be observed that each input variable has its own characteristic range of values. Compared to the other two parameters, the range of PLA content is more refined, primarily focusing on the range of 0.6–0.9 for achieving a spline warpage value of zero most effectively. The elasticity modulus ranges from 1000 to 3000 MPa, but values between 1800 and 2200 MPa are more concentrated, resulting in a spline warpage closer to zero and printing outcomes that align better with expectations. Breaking strength is most effective within the range of 3.5–5 kJ/m², leading to experimental conditions that approach perfection with a retention rate of 0–1. The correlation coefficient values between input and output parameters were generated using the Seaborn heatmap function, which creates a heatmap by describing the correlation matrix between inputs and outputs.

2.2. Support Vector Regression

When the SVR method identifies the influencing factors of inflation, the function form is

\hat{π_{t}} = w^{t} ϕ (x_{t}) = b

. Where

ϕ (\cdot)

is the basis function for nonlinear transformation of inflation influencing factors,

w

is the weight of the basis function, and the basis function transformation enables the SVR model to identify the nonlinear relationship between inflation and influencing factors [28,29]. The idea of SVR estimation is to optimize the distance from each sample point to the support vector. This makes the SVR model punitive

L_{2}

and reduces the negative effects of overfitting and multicollinearity of variables. SVR adds tolerance to error in the loss function, and only when the 3D printing result predicted

\hat{π_{t}}

by SVR deviates from the actual print result

\hat{π_{t}}

by more than a certain degree, the error is recorded in the loss function. Objective optimization function of SVR:

\begin{array}{l} \underset{w, b}{\arg \min} \frac{1}{2} {‖w‖}^{2} + C \sum_{i = 1}^{N} (ξ_{i} + ξ_{i}^{*}) \\ s . t \{\begin{cases} π_{t} - w^{T} ϕ (X_{t}) - b \leq ε + ξ_{t} \\ π_{t} - w^{T} ϕ (X_{t}) - b \geq - ε - ξ_{t}^{*} \\ ξ_{t} \geq 0, ξ_{t}^{*} \geq 0 \end{cases} \end{array}

(1)

2.3. Integration Method Based on Regression Tree

When the method based on regression tree is used to identify the influencing factors of 3D printing parameters, the function form is

\hat{π_{t}} = \sum_{i} \bar{π_{i, t}} \cdot I (X_{t} \in R_{i})

. According to the relevant influencing factors, the sample space

i

is divided into different regions

R_{i}

, and the mean value of the predicted region is taken as the predicted value of the sample falling into the region. The region divided by regression tree implies the relationship between various influencing factors and the volume retention rate, especially the influence of the interaction between influencing factors on the printing effect [30]. The three methods selected in this paper, RF, gradient boost, and XGBoost, are all ensemble learning methods based on regression trees, derived from the ensemble ideas of bagging and boosting [31,32]. Although the calculation process of the three methods is different, the recognition of the relationship between print parameters and volume retention rate is essentially similar to the regression tree method.

The RF method is based on the bagging integration method. Bagging integration method integrates the prediction results of multiple regression trees, which effectively improves the prediction effect of the model. Specifically, bagging [33,34] trains multiple machine learning models by constructing multiple different training samples through resampling, and takes the mean predicted by all models as the final predicted value. The RF model is improved on the basis of bagging. When the tree model in RF splits nodes, it randomly selects some factors affecting inflation as nodes to divide regions, thereby reducing the similarity between each model and improving the overall forecasting effect and robustness of the model [35].

Gradient boost and XGBoost are based on the boosting integration method. Boosting’s idea is to improve the model’s predictive performance by training multiple models several times and integrating their results. Gradient boost and XGBoost [36] are both improvement methods based on the boosting idea. In each iteration estimate, the gradient boost new model will fit the prediction residual of the existing integrated model, and finally minimize the overall prediction loss function [37,38].

In this paper, supposing that the regression tree model of the m-th iteration of gradient boost is

h_{m} (X_{t})

, the loss function is

L (π_{t}, {\hat{π}}_{t, m - 1} (X_{t}) + v \cdot β_{m} h_{m} (X_{t})) = {(π_{t} - {\hat{π}}_{t, m - 1} - v \cdot β_{m} h_{m} (X_{t}))}^{2}

,

{\hat{π}}_{t, m - 1} (X_{t})

is the volume retention rate predicted by the integrated model based on the sample

X_{t}

of influencing factors after the m − 1-th iteration, and

v

is the learning rate of the regularization model. The hyperparameters of this model include the number of regression trees, the depth of each regression leaf node, the learning rate, etc., and their optimal values can be obtained through certain methods; the specific methods are described below. XGBoost improves the training efficiency of gradient boost, utilizes the second-order information of the loss function, and adds the penalty term of model complexity to the loss function

L

. In this paper, the loss function optimized by XGBoost is as follows:

L_{m} = \sum_{t = 1}^{T} g_{t m} h_{m} (X_{t}) + \frac{1}{2} S_{t m} h_{m}^{2} (X_{t}) + Ω (h_{m})

(2)

where

g_{t m}

is the first derivative and

S_{t m}

is the second derivative with respect to the loss function

{\hat{π}}_{t, m - 1} (X_{t})

, respectively, and the penalty term is as follows:

Ω (h_{m}) = α W_{m} + \frac{λ}{2} {‖w_{m}‖}^{2}

(3)

to punish the norm of the middle node of the regression tree.

2.4. Feature Importance

The purpose of feature importance is to rapidly identify and compare the most significant factors, but its limitations include the difficulty in determining the positive or negative influence of a factor on the model results and the lack of consideration for interaction effects. Generally, feature importance can be obtained using the feature_importances_ method [39] of the model.

The feature_importances_ in xgboost, for instance, can be calculated either by the number of times a feature is split or by the gain obtained from splitting on that feature. This is also one of the advantages of using gradient boosting, as it allows for convenient retrieval of importance scores for each attribute after constructing the boosted trees. The underlying idea is that the significance of a feature lies in its ability to reduce the uncertainty of the prediction target. Features that can more effectively reduce this uncertainty are considered more important. In other words, the calculation of feature importance is based on the mean decrease impurity, which measures the information gain (Gini coefficient) before and after splitting on a particular feature during the decision tree construction process.

In the context of this study, machine learning methods are applied to predict the effects of 3D printing. Similarly, the contribution of features during the construction of decision trees is observed. Typically, feature importance provides a score indicating the usefulness or value of each feature in building the boosted decision trees of the model. The higher the number of crucial decisions made by an attribute for the decision tree, the higher its relative importance. This importance is explicitly calculated for each attribute in the dataset, allowing for ranking and comparison of attributes. The importance of an individual decision tree is computed by the number of times the attribute improves the performance metric at each split point, weighted by the number of observations handled by the node. The performance metric can be a measure of purity (such as the Gini coefficient [40]) used for selecting split points or other more specific error functions. Finally, the feature importances are averaged across all decision trees in the model to obtain concrete values for identifying the factors influencing the outcome.

2.5. Shapley Additive Explanation

The SHAP value interpretation method was proposed by Lundberg and Lee [24] to explain the contribution of each influencing factor in the machine learning model to the target predicted value. SHAP belongs to the method of model post-interpretation, its core idea is to calculate the marginal contribution of features to the model output, and then explain the “black box model” from the global and local levels. SHAP builds an additive interpretive model where all features are considered “contributors”. For each predicted sample, the model produces a predicted value, and SHAP value is the value assigned to each feature in that sample.

The set

F

represents the total set or feature set of all influencing factors, and the elements in it are called features and are denoted

|F|

as the number of elements in feature set

F

. The predicted value

\hat{f} (x_{*})

of a machine learning model

f (\cdot)

at a particular sample point

x_{*}

can be decomposed into the following:

\hat{f} (x_{*}) = \emptyset_{0} + \sum_{i = 1}^{|F|} \emptyset_{*}^{j}

(4)

where

\emptyset_{0}

is the base value of the model prediction, generally the forecast mean value

E [\hat{f} (x)]

as the base value.

\emptyset_{*}^{j}

is the size of the influence of the j-th influencing factor on the prediction on the sample

x_{*}

, i.e., the SHAP value. The larger the value

| \emptyset_{*}^{j} |

is, the greater the influence of the influencing factor on the predicted value of the target. And

\emptyset_{*}^{j} > 0

and

\emptyset_{*}^{j} < 0

, respectively, indicate that the influencing factors have a positive or negative impact on the predicted value. The SHAP value

\emptyset_{*}^{j}

of the j-th factor is calculated by the following formula:

\emptyset_{*}^{j} = \sum_{s \subseteq F \ {j}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} (v_{*} (S \cup {j} - v_{*} (S))

(5)

where

S

is the set that does not contain the j-th factor (a subset of the set

F

), given the number of elements of the subset

|S|

. Based on the feature set

F

, it can build a total of subsets that do not contain the j-th factor, whose number is

\sum_{S \subseteq F \ \{j\}} \frac{|F|!}{|S|! (|F| - |S| - 1)!}

.

(ν_{*} (S \cup \{j\} - ν_{*} (S))

is the expected influence of factors

j

on the predicted value under the condition that the combination of control variables is a subset.

The calculation of the SHAP value in this paper is based on the trained machine learning model, so the calculation of the SHAP value of each factor according to Aas et al. [41] still uses the information of all training samples when calculating SHAP value of each sample. The specific research ideas and calculation steps are as follows.

Assume that the sample set is

x = \{x_{1}, \dots, x_{n}\}

, and a 3D printing experiment has been carried out. The SHAP value

\emptyset_{i}^{j}

of the j-th factor on the sample is calculated according to the method, that is, the influence of the factor

j

on the 3D printing effect. Based on the SHAP value of the factor

j

in the sample

x_{i}

, the SHAP value sequence is

\emptyset^{j} = \{\emptyset_{1}^{j}, \emptyset_{2}^{j}, \dots, \emptyset_{n}^{j}\}

, which reflects the influence of the factor on the whole sample set, and the contribution size comparison of each feature is obtained in the set.

2.6. Models Performance Measurements

In model performance evaluation, it is important to implement uncertainty analysis to evaluate the performance of machine learning models [42]. Using the training dataset to construct the prediction model, test datasets are used to evaluate the accuracy of the model. Six statistical evolution measures, including coefficient of certainty (R2), explained_variance_score (explained_variance_score), root mean square error (RMSE), and mean absolute error (MAE), are proposed to evaluate the performance of the model [43,44]. These performance metrics are good indicators for evaluating overall prediction accuracy, with their mathematical expressions and typical ranges of accuracy shown in Table 2.

3. Results and Discussion

3.1. Model Fitness Analysis Based on Machine Learning Prediction

3.1.1. Determination of Model Parameters

After data preprocessing and feature engineering of the original dataset, the important hyperparameters of each model need to be determined. According to previous experience, the hyperparameters of support vector regression are determined by Bayesian optimization [45], while the other three regression tree-based integration methods use grid search to determine the hyperparameters.

Among them, the hyperparameters of support vector regression (SVR) mainly include penalty coefficient C and kernel function, which is the inner product of two samples transformed by the basis function. There are many kinds of kernel functions, such as linear kernel, polynomial kernel, sigmoid kernel, and RBF (radial basis function) kernel [46]. After searching by the Bayesian optimization algorithm, the penalty coefficient C is determined to be 5, and the kernel function is selected as RBF. The RBF kernel can map the sample to a higher-dimensional space and can handle the sample when the relationship between class labels and features is nonlinear. The formula is as follows:

K (x, y) = e^{- γ | | x - y | |^{2}}

(6)

The regression tree-based integration methods (random forest, gradient lift, and extreme gradient lift) are similar, and the common important parameters [47] are max_depth (maximum depth of the tree), learning_rate (learning rate), and n_estimators (number of base evaluators). In addition, there is another important hyperparameter in GDBT and XGBoost—subsample (the proportion of samples sampled)—but the quantity of data in this study is small, so the default value 1 is selected. On this basis, compared with gradient boost, the hyperparameter of the XGBoost model has two more penalty terms: reg_alpha (the weight of L1 regular term) and reg_lambda (the weight of L2 regular term). In this paper, some hyperparameters are selected, and the optimal combination of hyperparameters of the three methods is shown in the following Table 3:

3.1.2. The Fitting Results of Four Machine Learning Methods

Before the empirical analysis, it is necessary to compare the prediction performance of the four machine learning methods selected in this paper. It is common practice to select explained_variance_score of the training set and mean square error (MSE) and mean absolute percentage error (MAE) of the test set as the evaluation indexes of the model prediction performance.

This article uses Python 3.10, using the sklearn package comprising SVR, RandomForestRegressor, the GrandientBoostingRegressor function, and the Xgboost XGBRegressor function. Among them, the hyperparameter setting of specific functions will be explained in detail below. In addition, train_test_split was used to conduct test set and training plans for the data. The sample size was 25, and the ratio of the two categories was 1:3, with 718 samples.

For the fitting results of machine learning, this paper first draws a discounted comparison graph between the predicted value and the real value of the training set and adds an error bar graph for more intuitive observation. Shown in Figure 4a–d, respectively, are the proxy models fitted by SVR, RF, GBDT, and XGB models to predict “volume retention rate”, in which the horizontal axis represents the sample points, the left vertical axis scales the errors and is drawn as a column chart, and the right vertical axis is the volume retention value and is drawn as two line charts.

The outcomes show that there is a clear difference between the predicted and true values of the SVR model. The predicted value always remains at a fixed value, while the real value shows great volatility. This shows that the SVR model has a large error in the prediction of “volume retention rate” and cannot accurately capture the difference between samples. The possible reason is that the SVR model does not consider the nonlinear relationship between samples in the fitting process, which leads to the unsatisfactory fitting effect. In contrast, the RF, GBDT, and XGB models show a better fit in Figure 4b–d. The curve between the predicted value and the true value is similar, and the error is small. This shows that the integrated method model with decision tree as the underlying logic has higher generalization ability and accuracy in predicting “volume retention rate”, and these three models can better capture the change rule of “volume retention rate” and make accurate prediction. This may be because these models are able to deal with nonlinear relationships and have a strong ability to fit. Some degree provides an important reference for us to choose the appropriate machine learning model and provides guidance for further improving the prediction model.

Based on the observation of the fitting chart of the training set, this paper further analyzes the rating index results of the test set. From Table 4 above, we find that by adjusting the parameters, the machine learning model on the whole has a very good performance in predicting the 3D printing effect, with R square reaching more than 80%. This paper describes the importance relationship between variables on the basis of the prediction model, and it is necessary to improve the degree of interpretation and reduction of the model as much as possible. The model shows almost perfect accuracy and describes the relationship between the predicted data well. On the basis of these models, it is reliable to investigate the influence degree of each feature. Secondly, XGBoost and GBDT have the best performance, the R square of the former is nearly 100%, and the MAE and RMSE of the latter are close to 0, which greatly learn the relationship between features and labels, and focus on the recognition of influencing factors after the two proxy models. In addition, it is not difficult to find that the machine model indexes of the last three regression integration trees are better than those in SVR; specifically, the RMSE and MAE of the test set are smaller, and the R-square level is higher. In particular, considering that during the construction of the tree model, features will be evaluated and screened to determine nodes for splitting, which will be of great reference and help in the subsequent evaluation of influencing factors, this paper considers the observation and analysis of the feature scores inherent in the tree model in the next section.

3.2. Print Factor Recognition Based on Interpretative Machine Learning Method

Based on the established prediction model mentioned above, we found that SVR performs slightly worse compared to ensemble trees based on decision trees. Therefore, considering the principles of node splitting in the tree construction process from the three decision tree algorithms, we aim to compare the importance of different features.

This article uses Python 3.10, with the aid of the RandomForestRegressor, GrandientBoostingRegressor function, and Xgboost XGBRegressor function of the sklearn package, to output the feature_importance variable. The program draws a series of importance score bar charts—the horizontal axis is the importance score, the vertical axis is the feature, in which the characteristics “PLA”, “PBS”, “ADR 4468 chain extender”, “elastic modulus”, “breaking strength”, “elongation at break”, and “impact strength” are named “f0”, “f1”, “f2”, “f3”, “f4”, “f5”, and “f6”; the results are shown in Figure 5 below.

For the SHAP interpretation method, this paper uses the SHAP package in Python 3.9. SHAP (Shapley additive explanation) is a “model explanation” package developed by Python [48] that can explain the output of any machine learning model with the following variables: shap_values is used to represent the SHAP values of the factors in the model on the samples, so as to draw the bar chart according to it for easy observation and compare with the importance bar chart of the function. The horizontal axis is mean_shap_value, that is, the SHAP value, and the vertical axis is the characteristic, wherein the characteristics of “PLA”, “PBS”, “ADR 4468 chain extender”, “elastic modulus”, “breaking strength”, “elongation at break”, and “impact strength” are named feature 0–feature 6. The results are shown in Figure 6 below.

Looking at Figure 5, we can see a more obvious pattern. The three methods show a fairly consistent arrangement of importance: feature 4—elastic modulus; feature 0—PLA; and feature 5—impact strength are highlighted to have a large impact on the print effect–spline warpage. The evaluated values are also similar. Therefore, in summary, the elastic modulus maintains the highest contribution, followed by PLA and impact strength, which have a greater influence on sample warpage. This conclusion is confirmed by the correlation coefficient of the above thermal map; the chemical properties of the printing material play a decisive role in its physical properties, and the physical characteristics will also determine the spline warping results of 3D printing, although different models have slightly different assessments of the contribution of these two aspects, but they can identify the importance of PLA and elastic modulus. The application of interpretative machine learning to the influencing factors of 3D printing parameters is further explored by observing the evaluation situation under the SHAP value method, as shown in Figure 6 below.

Firstly, the results of each method are compared horizontally. The sorting conclusions obtained by the feature_importance method and the SHAP value calculation theory are highly consistent, which again proves the rationality and credibility of the evaluation results. Secondly, the vertical comparison of the estimation results of the three methods on the importance of influencing factors shows that the histogram is arranged from the highest to the bottom according to the importance scores. There are some differences in the results of the three methods, but they also show more common points. Among them, the three methods consider that the characteristic f5—elongation at break has the highest contribution value, and even though GBDT is evaluated as the second, it closely follows the first contribution variable. In addition, the characteristic f4—breaking strength is also at the top of the score, and the characteristic F0—PLA, f1—PBS, and f2—chain extender ADR4468 have a relatively consistent judgment, which thinks that the contribution is weak and has no great impact on the printing effect. In addition, the evaluation of f3—impact strength and feature 6—impact strength is slightly uneven. Random forest and XGB rank them higher, ranking them in the first three teams. However, the SHAP values of specific observation are close to 0.03–0.05. Similarly, for impact strength, GBDT is different from the other two methods, it is considered to contribute the third, and the SHAP value reaches 0.05, while the other two methods evaluate 0.02, but on the whole, the contribution rate of SHAP value is similar. Taking into account the performance comparison of prediction models, the XGBoost model has relatively high reliability and interpretation. In general, elongation at break, fracture strength, elastic modulus, and impact strength have a deep impact on the 3D printing effect, and the degree is weakened in turn. The importance scores of specific features are shown in Table 5 below.

4. Conclusions

This study applied successful machine learning prediction methods to explore the influencing factors of 3D printing in the field of polymers. Despite a limited dataset, the models achieved high accuracy in capturing hidden relationships and establishing a solid foundation for future feature impact assessments. Considering the nonlinear relationships between 3D printing-related features and printing outcomes, this study not only employed classical and efficient machine learning methods to characterize the relationships between features and labels but also utilized interpretable machine learning methods to provide comprehensive explanations for complex relationships that cannot be described by explicit mathematical functions. The methodologies employed in this study can be applied to more advanced black-box models, offering valuable insights for subsequent research. The main research findings are as follows:

Through correlation coefficient analysis, we found that among the input variables, the PLA content and elastic modulus showed the highest correlation with warpage, with a correlation coefficient of 80%. There was also a high degree of multicollinearity between PLA content, elastic modulus, and warpage. On the other hand, there was a weak correlation between ADR 4468 crosslinking agent, twin-screw blending, extrusion swell ratio, and both warpage and the other three input variables.
In terms of model selection, we employed three machine learning algorithms, namely, gradient boosting decision trees (GBDTs), random forest (RF), and support vector regression (SVR), to predict “spline warpage,” achieving satisfactory results. It is worth noting that these results were obtained through debugging using a small dataset, yet these models demonstrated good generalization capabilities and can be applied to larger-scale datasets.
Additionally, we introduced the SHAP (Shapley additive explanations) interpretable machine learning framework to explain the predictions of the models. Through SHAP value analysis, we discovered that fracture elongation, fracture strength, elastic modulus, and impact strength have significant impacts on 3D printing outcomes, with the influence decreasing in that order. This conclusion is consistent with practical experience and aligns with our preliminary finding that chemical properties affect physical features, which, in turn, determine printing outcomes.

However, it should be noted that in constructing the prediction model for 3D printing outcomes, this study treated indicators such as bonding strength and spline volume as individual labels for machine learning models, training and examining them separately. The study did not consider the interaction effects between parameters, which may introduce subjectivity. Furthermore, given the continuous advancements in experimental equipment and materials, the input features of the 3D printing experimental prediction models may vary over time. This study focused on individual years (2018, 2019, 2020, and 2021) without aggregating and analyzing the results, potentially missing out on underlying physical laws. These limitations require further research to overcome.

Author Contributions

Methodology, F.L., Z.C., Y.Z. and G.L.; Software, F.L., Z.C. and J.X.; Validation, F.L. and W.S.; Formal analysis, Z.C.; Investigation, M.T.; Writing—original draft, F.L. and Z.C.; Writing—review & editing, F.L., J.X., Y.Z., M.T. and G.L.; Visualization, J.X. and W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Project of Guangxi (Guike AD23023002); Natural Science Foundation of Guangxi province of Guodong Li (Grant No. 2022 gxnsfaa 035554); the Innovation Project of Guangxi Graduate Education of Yujuan Gu (Grant No. YCSW2023302); Changji College School level Discipline Construction Project, Special Fund for Scientific and Technological Bases and Talents of Guangxi (Grant No. Guike AD21075008); and the Key Laboratory of Data Analysis and Computation in Universities in Guangxi Autonomous Region.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Meredig, B.; Agrawal, A.; Kirklin, S.; Saal, J.E.; Doak, J.; Thompson, A.; Zhang, K.; Choudhary, A.; Wolverton, C. Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys. Rev. B 2014, 89, 094104. [Google Scholar] [CrossRef]
Hanakata, P.Z.; Cubuk, E.D.; Campbell, D.K.; Park, H.S. Accelerated search and design of stretchable graphene kirigami using machine learning. Phys. Rev. Lett. 2018, 121, 255304. [Google Scholar] [CrossRef] [PubMed]
Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2016, 2, 16028. [Google Scholar] [CrossRef]
Compton, B.G.; Lewis, J.A. 3D-printing of lightweight cellular composites. Adv. Mater. 2014, 26, 5930–5935. [Google Scholar] [CrossRef] [PubMed]
Aoyagi, K.; Wang, H.; Sudo, H.; Chiba, A. Simple Method to Construct Process Maps for Additive Manufacturing Using a Support Vector Machine. Addit. Manuf. 2019, 27, 353–362. [Google Scholar] [CrossRef]
Menon, A.; Póczos, B.; Feinberg, A.W.; Washburn, N.R. Optimization of Silicone 3D Printing with Hierarchical Machine Learning. 3D Print. Addit. Manuf. 2019, 6, 181–189. [Google Scholar] [CrossRef]
He, H.; Yang, Y.; Pan, Y. Machine Learning for Continuous Liquid Interface Production: Printing Speed Modelling. J. Manuf. Syst. 2019, 50, 236–246. [Google Scholar] [CrossRef]
Stavroulakis, P.; Chen, S.; Delorme, C.; Bointon, P.; Tzimiropoulos, G.; Leach, R. Rapid Tracking of Extrinsic Projector Parameters in Fringe Projection Using Machine Learning. Opt. Lasers Eng. 2019, 114, 7–14. [Google Scholar] [CrossRef]
Baturynska, I.; Semeniuta, O.; Martinsen, K. Optimization of Process Parameters for Powder Bed Fusion Additive Manufacturing by Combination of Machine Learning and Finite Element Method: A Conceptual Framework. In Procedia CIRP; Elsevier B.V.: Heidelberg, Germany, 2018; pp. 227–232. [Google Scholar] [CrossRef]
Francis, J.; Bian, L. Deep Learning for Distortion Prediction in Laser-Based Additive Manufacturing Using Big Data. Manuf. Lett. 2019, 20, 10–14. [Google Scholar] [CrossRef]
Khanzadeh, M.; Rao, P.; Jafari-Marandi, R.; Smith, B.K.; Tschopp, M.A.; Bian, L. Quantifying Geometric Accuracy with Unsupervised Machine Learning: Using Self-Organizing Map on Fused Filament Fabrication Additive Manufacturing Parts. J. Manuf. Sci. Eng. 2018, 140, 301011. [Google Scholar] [CrossRef]
Zhu, Z.; Anwer, N.; Huang, Q.; Mathieu, L. Machine Learning in Tolerancing for Additive Manufacturing. CIRP Ann. 2018, 67, 157–160. [Google Scholar] [CrossRef]
Tootooni, M.S.; Dsouza, A.; Donovan, R.; Rao, P.K.; Kong, Z.; Borgesen, P. Classifying the Dimensional Variation in Additive Manufactured Parts from Laser-Scanned Three-Dimensional Point Cloud Data Using Machine Learning Approaches. J. Manuf. Sci. Eng. 2017, 139, 091005. [Google Scholar] [CrossRef]
Scime, L.; Beuth, J. Using Machine Learning to Identify in situ Melt Pool Signatures Indicative of Flaw Formation in a Laser Powder Bed Fusion Additive Manufacturing Process. Addit. Manuf. 2019, 25, 151–165. [Google Scholar] [CrossRef]
Caggiano, A.; Zhang, J.; Alfieri, V.; Caiazzo, F.; Gao, R.; Teti, R. Machine Learning-based Image Processing for On-Line Defect Recognition in Additive Manufacturing. CIRP Ann. 2019, 68, 451–454. [Google Scholar] [CrossRef]
Zhang, B.; Liu, S.; Shin, Y.C. In-Process Monitoring of Porosity During Laser Additive Manufacturing Process. Addit. Manuf. 2019, 28, 497–505. [Google Scholar] [CrossRef]
Gu, G.X.; Chen, C.-T.; Richmond, D.J.; Buehler, M.J. Bioinspired Hierarchical Composite Design Using Machine Learning: Simulation, Additive Manufacturing, and Experiment. Mater. Horiz. 2018, 5, 939–945. [Google Scholar] [CrossRef]
Hamel, C.M.; Roach, D.J.; Long, K.N.; Demoly, F.; Dunn, M.L.; Qi, H.J. MachineLearning Based Design of Active Composite Structures for 4D Printing. Smart Mater. Struct. 2019, 28, 065005. [Google Scholar] [CrossRef]
Li, Z.; Zhang, Z.; Shi, J.; Wu, D. Prediction of Surface Roughness in Extrusion-Based Additive Manufacturing with Machine Learning. Robot. Comput. Integr. Manuf. 2019, 57, 488–495. [Google Scholar] [CrossRef]
Jiang, J.; Hu, G.; Li, X.; Xu, X.; Zheng, P.; Stringer, J. Analysis and Prediction of Printable Bridge Length in Fused Deposition Modelling Based on Back Propagation Neural Network. Virtual Phys. Prototyp. 2019, 14, 253–266. [Google Scholar] [CrossRef]
Yao, X.; Moon, S.K.; Bi, G. A hybrid machine learning approach for additive manufacturing design feature recommendation. Rapid Prototyp. J. 2017, 23, 983–997. [Google Scholar] [CrossRef]
Gu, G.X.; Chen, C.T.; Buehler, M.J. De novo composite design based on machine learning algorithm. Extrem. Mech. Lett. 2018, 18, 19–28. [Google Scholar] [CrossRef]
Jin, Z.; Zhang, Z.; Shao, X.; Gu, G.X. Monitoring Anomalies in 3D Bioprinting with Deep Neural Networks. ACS Biomater. Sci. Eng. 2021, 9, 3945–3952. [Google Scholar] [CrossRef] [PubMed]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4765–4774. [Google Scholar]
Dong, J.; Hu, D.; Yan, Y.; Peng, L.; Zhang, P.; Niu, Y.; Duan, X. Interpretability machine learning based urban O3 driver mining. J. Environ. Sci. 2023, 44, 3660–3668. [Google Scholar] [CrossRef]
Liao, B.; Wang, Z.; Li, M. Prediction and Characteristic Analysis Method of Football players’ worth based on XGBoost and SHAP Model. J. Comput. Sci. 2019, 49, 195–204. [Google Scholar]
Mao, S.; Zhou, J.; Zhang, R. Probability Theory and Mathematical Statistics, 4th ed.; China Statistics Press: Beijing, China, 2020. [Google Scholar]
Hameed, M.M.; AlOmar, M.K.; Baniya, W.J.; AlSaadi, M.A. Prediction of high-strength concrete: High-order response surface methodology modeling approach. Eng. Comput. 2021, 38, 1655–1668. [Google Scholar] [CrossRef]
Naderpour, H.; Rafiean, A.H.; Fakharian, P. Compressive strength prediction of environmentally friendly concrete using artificial neural networks. J. Build. Eng. 2018, 16, 213–219. [Google Scholar] [CrossRef]
Xiao, Z.; Chen, K.; Chen, X.; Chen, Y. Inflation factors influencing recognition—And inspection based on machine learning method. Stat. Res. 2022, 33, 132–147. [Google Scholar]
Oza, N.C. Online Bagging and Boosting. In Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, HI, USA, 12 October 2005; pp. 2340–2345. [Google Scholar] [CrossRef]
Wang, G.W.; Zhang, C.X.; Guo, G. Investigating the Effect of Randomly Selected Feature Subsets on Bagging and Boosting. Commun. Stat.-Simul. Comput. 2015, 44, 636–646. [Google Scholar] [CrossRef]
Kaloop, M.R.; Kumar, D.; Samui, P.; Hu, J.W.; Kim, D. Compressive strength prediction of high-performance concrete using gradient tree boosting machine—ScienceDirect. Constr. Build. Mater. 2023, 264, 120198. [Google Scholar] [CrossRef]
Huang, J.; Sun, Y.; Zhang, J. Reduction of computational error by optimizing SVR kernel coefficients to simulate concrete compressive strength through the use of a human learning optimization algorithm. Eng. Comput. 2021, 38, 3151–3168. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. Int. Sci. Eng. J. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Baykasoglu, A.; Oeztas, A.; Oezbay, E. Prediction and multi-objective optimization of high-strength concrete parameters via soft computing approaches. Expert. Syst. Appl. 2009, 36, 6145–6155. [Google Scholar] [CrossRef]
Qin, S.; Wang, K.; Ma, X.; Wang, W.; Li, M. Ensemble Learning-Based Wind Turbine Fault Prediction Method with Adaptive Feature Selection. In Proceedings of the Data Science: Third International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2017, Changsha, China, 22–24 September 2017. [Google Scholar] [CrossRef]
Hey, J.D.; Lambert, P.J. Relative Deprivation and the Gini Coefficient: Comment. Q. J. Econ. 1980, 95, 567–573. [Google Scholar] [CrossRef]
Aas, K.; Jullum, M.; Løland, A. Explaining Individual Predictions when Features are Dependent: More Accurate Approximations to Shapley Values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
Anyaoha, U.; Zaji, A.; Liu, Z. Soft computing in estimating the compressive strength for high-performance concrete via concrete composition appraisal. Constr. Build. Mater. 2020, 257, 119472. [Google Scholar] [CrossRef]
Alabdullah, A.A.; Iqbal, M.; Zahid, M.; Khan, K.; Amin, M.N.; Jalal, F.E. Prediction of rapid chloride penetration resistance of metakaolin based high strength concrete using light GBM and XGBoost models by incorporating SHAP analysis. Constr. Build. Mater. 2022, 345, 128296. [Google Scholar] [CrossRef]
Farooq, F.; Nasir Amin, M.; Khan, K.; Rehan Sadiq, M.; Javed, M.F.; Aslam, F.; Alyousef, R. A Comparative Study of Random Forest and Genetic Engineering Programming for the Prediction of Compressive Strength of High Strength Concrete (HSC). Appl. Sci. 2020, 10, 7330. [Google Scholar] [CrossRef]
Kuenneth, C.; Rajan, A.C.; Tran, H.; Chen, L.; Kim, C.; Ramprasad, R. Polymer informatics with multi-task learning. Patterns 2021, 2, 100238. [Google Scholar] [CrossRef]
Kim, C.; Batra, R.; Chen, L.; Tran, H.; Ramprasad, R. Polymer design using genetic algorithm and machine learning. Comput. Mater. Sci. 2021, 186, 110067. [Google Scholar] [CrossRef]
Mannodi-Kanakkithodi, A.; Pilania, G.; Ramprasad, R. Critical assessment of regression-based machine learning methods for polymer dielectrics. Comput. Mater. Sci. 2016, 125, 123–135. [Google Scholar] [CrossRef]
Safwan, A.; Maysa, A.; Ala, H. Artificial neural network modeling to evaluate polyvinylchloride composites’ properties. Comput. Mater. Sci. 2018, 153, 1–9. [Google Scholar] [CrossRef]
Ekanayake, I.U.; Meddage, D.P.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]

Figure 1. The outline of the proposed framework.

Figure 2. Thermal map of correlation coefficients between print parameters and spline warpage.

Figure 3. The hexagon contour plot of parameters and spline warpage.

Figure 4. Prediction performances of the rate of volume retention by machine learning models. Subgraphs (a–d) are the proxy models for predicting “volume retention rate” fitted by SVR, RF, GBDT, and XGB models, respectively, where the horizontal axis is the sample points, the left vertical axis is the errors, the right vertical axis is the volume retention value, and the volume retention rate is drawn as two line graphs.

Figure 5. Three-dimensional printing impact factors assessment columnar comparison chart.

Figure 6. A columnar comparison chart of the factors influencing the evaluation of the integrated approach model.

Table 1. The statistics table of spline warpage.

	PLA (100%)	Chain Extender ADR4468 (100%)	Twin-Screw Extrusion Experimental Conditions	Die Swell Ratio (r/r0) (100%)	Elastic Modulus (GPa)	Impact Strength (kJ/m²)	Warping (100%)
Count	23	23	23	23	23	23	23
Mean	0.652	0.002608696	3.695652174	1.689130435	1584.827391	5.11821739	1.8786957
Std	0.279	0.00255377	1.329209697	0.512559647	654.5020297	1.68434678	1.9694007
Min	0	0	1	1	392.8	3.334	0.23
0.25	0.5	0	2.5	1.275	1110.065	3.858	0.61
0.5	0.7	0.005	4	1.8	1657	4.477	1.16
0.75	0.85	0.005	5	2	2015.395	6.1065	1.96
Max	1	0.005	5	3.3	2708.84	9.94	7.8

Table 2. The evaluation index table of fit degree of machine learning model.

Assessment Criteria	Standard Range
$R^{2} (y, \hat{y}) = \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}$ , $y_{i}$ : observed data, ${\hat{y}}_{i}$ : predicted data, and $\bar{y}$ is the mean	0 to 1
$E x p l a i n e d_var i a n c e_s c o r e (y, \hat{y}) = 1 - \frac{var (y_{i} - {\hat{y}}_{i})}{var (y_{i})}$ , $y_{i}$ : observed data, ${\hat{y}}_{i}$ : predicted data	0 to 1
$M A E (y, \hat{y}) = \frac{1}{n} \sum_{i = 1}^{n} \|y_{i} - {\hat{y}}_{i}\|$ , $y_{i}$ : observed data, ${\hat{y}}_{i}$ : predicted data and $n$ is the number of observations	0 is the best value
$R M A E (y, \hat{y}) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} \|y_{i} - {\hat{y}}_{i}\|}$ , $y_{i}$ : observed data, ${\hat{y}}_{i}$ : predicted data, and $n$ is the number of observations	0 is the best value

Using training dataset to construct prediction model, test datasets are used to evaluate the accuracy of the model. Six statistical evolution measures, including coefficient of certainty (R²), explained_variance_score (explained_variance_score), root mean square error (RMSE), and mean absolute error (MAE), are recommended to evaluate the performance of the mode.

Table 3. The main parameter configuration table of the forecast “volume retention rate” model.

Model Name	Parameter Configuration
SVR	C = 4.9284, Kernel = RBF
Random Forest	max_depth = 3, max_features = 5, n_estimators = 422
GBDT	max_depth = 3, max_features = 4, n_estimators = 18
XGBoost	max_depth = 2, n_estimators = 18, reg_lambda = 1.4423

Table 4. Evaluation index results of four machine learning algorithms.

	SVR	RF	GBDT	XGB
$R^{2}$	0.8096	0.8498	0.9369	0.9794
$explained_variance_score$	0.8179	0.8509	0.9377	0.9794
$M A E$	0.3367	0.2364	0.0556	0.2742
$R M S E$	0.2026	0.1038	0.0043	0.1919

Table 5. Feature importance characteristic score.

	RF	GBDT	XGB
PLA	0.212644311	0.328358355	0.5503053
ADR 4468 chain extender	0.00365122	0.002253861	0.010976699
Modulus of elasticity	0.002183382	0.002425572	0.030449962
Breaking strength	0.056911609	0.023243075	0.015605606
Elongation at break	0.631832284	0.393985351	0.37159628
Impact strength	0.092777195	0.249733786	0.021066085

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, F.; Chen, Z.; Xu, J.; Zheng, Y.; Su, W.; Tian, M.; Li, G. Interpretable Machine Learning-Based Influence Factor Identification for 3D Printing Process–Structure Linkages. Polymers 2024, 16, 2680. https://doi.org/10.3390/polym16182680

AMA Style

Liu F, Chen Z, Xu J, Zheng Y, Su W, Tian M, Li G. Interpretable Machine Learning-Based Influence Factor Identification for 3D Printing Process–Structure Linkages. Polymers. 2024; 16(18):2680. https://doi.org/10.3390/polym16182680

Chicago/Turabian Style

Liu, Fuguo, Ziru Chen, Jun Xu, Yanyan Zheng, Wenyi Su, Maozai Tian, and Guodong Li. 2024. "Interpretable Machine Learning-Based Influence Factor Identification for 3D Printing Process–Structure Linkages" Polymers 16, no. 18: 2680. https://doi.org/10.3390/polym16182680

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Machine Learning-Based Influence Factor Identification for 3D Printing Process–Structure Linkages

Abstract

1. Introduction

2. Research Data and Methodology

2.1. Description of the Dataset Used

2.2. Support Vector Regression

2.3. Integration Method Based on Regression Tree

2.4. Feature Importance

2.5. Shapley Additive Explanation

2.6. Models Performance Measurements

3. Results and Discussion

3.1. Model Fitness Analysis Based on Machine Learning Prediction

3.1.1. Determination of Model Parameters

3.1.2. The Fitting Results of Four Machine Learning Methods

3.2. Print Factor Recognition Based on Interpretative Machine Learning Method

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI