An Effective Framework for Predicting Performance of Solid-Solution Copper Alloys Using a Feature Engineering Technique in Machine Learning

Fan, Tiehan; Hou, Jianxin; Hu, Jian

doi:10.3390/met13101641

Open AccessCommunication

An Effective Framework for Predicting Performance of Solid-Solution Copper Alloys Using a Feature Engineering Technique in Machine Learning

by

Tiehan Fan

^1,†

,

Jianxin Hou

^1,2,†

and

Jian Hu

^3,*

¹

National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang 110819, China

²

Institute of Materials Intelligent Technology, Liaoning Academy of Materials, Shenyang 110004, China

³

School of Materials Science and Engineering, East China Jiaotong University, Nanchang 330013, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Metals 2023, 13(10), 1641; https://doi.org/10.3390/met13101641

Submission received: 4 September 2023 / Revised: 21 September 2023 / Accepted: 23 September 2023 / Published: 25 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Utilized extensively in a myriad of industries, solid-solution copper alloys are prized for their superior electrical conductivity and mechanical properties. However, optimizing these often mutually exclusive properties poses a challenge, especially considering the complex interplay of alloy composition and processing techniques. To address this, we introduce a novel computational framework that employs advanced feature engineering within machine learning algorithms to accurately predict the alloy’s microhardness and electrical conductivity. Our methodology demonstrates a substantial enhancement over traditional data-driven models, achieving remarkable increases in R² scores—from 0.939 to 0.971 for microhardness predictions and from −1.05 to 0.934 for electrical conductivity. Through machine learning, we also spotlight key determinants that significantly influence overall performance of solid-solution copper alloys, providing actionable insights for future alloy design and material optimization.

Keywords:

machine learning; feature engineering; solid-solution copper alloys; mechanical properties; electrical properties

1. Introduction

Solid-solution copper alloys (SSCAs), such as brass and tin bronze, play a pivotal role in electrical and electronic applications. Their significance is attributed to their ease of manufacture, cost-effectiveness, and an optimal balance between mechanical strength and electrical conductivity (EC) [1]. However, these alloys present an inherent trade-off: improvements in tensile strength often come at the expense of EC, and vice versa [2]. Traditional methods of strengthening metallic materials involve the introduction of defects like point defects, dislocations, and grain boundaries. Although these defects can enhance mechanical attributes, they also introduce increased electron scattering, thus reducing the EC of the material [2,3,4]. The interplay between mechanical strength and EC is not merely a theoretical concern but has significant ramifications in practical applications where both attributes are essential.

Advancements in the mechanical and electrical properties of copper alloys rely heavily on strategic chemical composition and process parameter optimization. These factors govern the evolution of microstructure and defects, which in turn influence the material’s performance [1,5,6]. Experimental results have substantiated that microhardness can escalate from an initial 0.7 GPa to 2.1 GPa under specific deformation conditions [7,8]. Moreover, copper alloyed with 0.2 wt% magnesium has shown a marked increase in microhardness, surpassing the 2.0 GPa threshold following multi-pass equal channel angular pressing. This, however, comes at the cost of a diminished EC, dropping to 84.5% International Annealed Copper Standard (IACS) [9].

Despite these advancements, the intricate interplay of alloying elements and processing conditions renders traditional empirical approaches inefficient, costly, and time-consuming [10,11]. In this landscape, data-driven machine learning (ML) offers a compelling solution for optimizing material properties and facilitating the design of innovative materials. ML algorithms have demonstrated their ability to rapidly identify data patterns and make accurate predictions [11]. For example, Lookman et al. [12,13] employed ML models in tandem with adaptive learning strategies to successfully develop high-performance shape-memory alloys. Likewise, Xie et al. [14] used data-driven methodologies to create a property-oriented alloy composition design system, which successfully achieved inverse composition design of high-property copper alloys. Unfortunately, the effectiveness of such ML models is highly dependent on the quality of features utilized to characterize the materials. ML algorithms that utilize the quantities of specific elements as inputs often lack transparency [15] and exhibit inadequate performance when faced with scanty data [16,17].

To address these limitations, the field of materials science has begun adopting advanced feature engineering (FE) techniques [18]. These techniques transform materials into composition vectors or incorporate other physically meaningful metallurgical attributes for ML-based property predictions [17]. The essence of FE lies in leveraging domain expertise to craft pertinent features, thus enhancing both the performance and interpretability of ML models [19,20].

Unlike steels, which often have large datasets sourced from manufacturing plants [21,22], data for SSCAs are not as readily available, posing challenges when it comes to training robust predictive models. This paper aims to bridge this gap by introducing a robust framework that synergizes FE techniques with ML to predict SSCAs’ performance accurately. The proposed method encompasses feature extraction, screening, and model selection, aimed at refining prediction precision and dependability. Ultimately, our study elucidates the key features influencing performance of SSCAs, shedding light on the complex relationships between material characteristics and functional outcomes. These insights hold promise for more informed decisions in the design and optimization of SSCAs.

2. Methods

2.1. Data Collection

Data collection serves as the cornerstone of this research framework and lays the groundwork for subsequent analyses. The data for this study were sourced exclusively from a single peer-reviewed publication [1], thereby ensuring a high degree of data comparability and integrity. The dataset comprises 105 observations, each containing chemical compositions, process parameters, Vickers micro-hardness (HV), and EC. It is noteworthy that the EC of SSCAs is predominantly influenced by alloy compositions and shows less sensitivity to process parameters [23,24]. Due to this focus, the dataset used for EC analysis was trimmed to 21 samples, exemplifying a classic small-data scenario. The full dataset is provided in Table S1 of the Supplementary Materials.

2.2. Feature Engineering

FE serves as the fulcrum of this investigation. It leverages domain-specific knowledge to develop new features that can enhance the efficacy of the ML algorithms. Employing a strategic scheme, 170 alloy factors are generated by calculating the mean values and variances for 85 distinct material properties of the alloying elements using Formulas (1) and (2). A detailed list of these 85 material properties can be found in Supplementary Table S2.

f_{i}^{m e a n} = \sum_{j = 1}^{m} f_{i j} p_{i j}

(1)

f_{i}^{v a r} = \sum_{j = 1}^{m} [{(f_{i j} - f_{i}^{m e a n})}^{2} p_{i j}]

(2)

The alloy factors are represented as

f_{i}^{m e a n}

and

f_{i}^{v a r}

, where

i

denotes the specific feature for the current sample,

m

signifies the total major metal elements in the current sample,

p_{i j}

represents the atomic percentage content of the corresponding element in the sample, and

f_{i j}

signifies the property

i

value for element

j

.

The initial modeling input encompasses these 170 alloy factors and 5 process parameters, including heat-treated temperature (HT). However, the amalgamation of high-dimensional features with limited data can engender overfitting, hindering model training and impacting predictive accuracy [15]. Thus, variable selection is employed with a three-fold objective: to enhance predictive performance, to enable faster and more cost-effective predictions, and to provide deeper insights into the data-generating process [25]. A three-pronged variable selection strategy employing variance threshold, correlation screening, and recursive elimination is employed to mitigate these challenges.

2.3. Machine Learning Modeling and Evaluation Criteria

An ensemble model is constructed for the ML task, capitalizing on its superiority over individual base learners and component learners [15,26]. The ensemble incorporates an array of ML algorithms, including XGBoost, CatBoost, LightGBM, Random Forest (RF), K-Nearest Neighbors and Artificial Neural Networks (ANN). To prevent overfitting, 20% of the data was randomly set aside as testing data, while 8-fold cross-validation was applied to the remaining training dataset. Model performance is assessed via diverse metrics, including the coefficient of determination (R²), root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), as articulated by Formulas (3)–(6).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(3)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(4)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(5)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(6)

Here,

n

represents the number of data points,

y_{i}

represents the true value of the sample,

{\hat{y}}_{i}

represents the predicted value of the ith sample, and

\bar{y}

represents the mean value of the true values of the corresponding samples.

3. Results and Discussion

3.1. Feature Engineering Processs

Figure 1a offers a comprehensive visualization of the FE process, delineating its influence on the reduction of the feature set. A preliminary step in the FE process involves the elimination of features with limited variance, as they contribute minimally to the predictive power of ML models. Specifically, attributes like sample thickness and reduction ratio display insufficient variability, rendering them ineffective for pattern learning. Features with a variance below a specified threshold (0.11 for EC and 0.07 for HV) are thus excluded to improve the dataset’s overall quality. To prevent feature redundancy and model overfitting, the subsequent stage of FE entails the removal of highly correlated features [15]. When the correlation coefficient between two features exceeds 0.95, only the feature with a higher correlation with the target variable is retained. The final step in the FE process employs recursive feature elimination to ascertain the optimal number of features instrumental for predicting both HV and EC effectively. Figure 1b,c elucidate the inverse relationship between the number of features and the prediction error, illustrating how feature quantity impacts model accuracy. Following this rigorous examination, a set of four key features for EC and five key features for HV were isolated, as they yielded the lowest prediction errors in the test set. Additional information on the features retained after each screening phase can be consulted in Supplementary Table S3. It is worth noting that the exclusion of HT as a feature in the HV model resulted in a significant increase in the HV’s MAPE, thereby accentuating the crucial role of HT in determining HV values.

3.2. Machine Learning Modeling

Two ensemble models are built to predict the HV and EC values of SSCAs using the selected key features obtained from the FE process. The ensemble integrates a variety of ML models, including XGBoost, RF, ANN, etc., as component learners. While tree-based models like XGBoost and RF exhibit robust data-fitting capabilities, their performance falters in regression tasks with small sample sizes. These models struggle to output continuous values and to predict beyond the training set’s data range. Contrarily, neural networks like ANN offer a remedy by incorporating more flexible decision boundaries, thus enhancing model performance [27].

To quantify the benefits of FE, the performance of these models is compared to baseline models that utilize the original composition and processing parameters as input, without employing FE. Figure 2 offers a visual performance comparison, making it evident that models without FE suffer from inferior predictive capability. On integrating FE, however, a marked improvement is observed in predictive performance—especially notable for EC, which has a smaller dataset [28]. Notably, the R² values of the ML models for HV and EC prediction increase significantly from 0.939 to 0.971 and from −1.05 to 0.934, respectively, on the testing set. The HV model demonstrates satisfactory performance even without FE; this is primarily because it is trained on a more expansive dataset, inclusive of the HT, a critical variable affecting HV. To assess model robustness, the data was partitioned into training and test sets ten times, each with a unique random seed. Table 1 tabulates the mean and variance of performance metrics before and after FE, convincingly illustrating that FE-incorporated models outperform those without FE. This performance enhancement underscores the improved accuracy and reliability achievable through feature engineering in the ML modeling process [19].

3.3. Effects of Key Features on the Properties of SSCAs

The utility of FE extends beyond performance enhancement to enrich the interpretability of ML models [20]. Methods like feature importance ranking and SHAP (SHapley Additive exPlanations) values offer a robust framework for understanding which features significantly contribute to model predictions [29], as shown in Figure 3. Figure 3a,b reveal that HT is the most influential feature for HV prediction, corroborating earlier observations from Figure 1b. For EC prediction, the variable P25_var emerges as the most important, as shown in Figure 3c,d, suggesting that P25 (Martynov and Batsanov electronegativity) should be the primary consideration when designing SSCAs for high EC.

Augmented data from ML models are leveraged to illuminate the relationships between material characteristics and alloy performance, as illustrated in Figure 4. For example, HV remains relatively stable when HT is below 200 °C but decreases when HT exceeds 300 °C, in line with the recovery and recrystallization processes found in cold-worked copper alloys [30]. Furthermore, increasing values of P77_var and P26_var are associated with higher HV, marking P77 (Waber metal radii) and P26 (Pauling electronegativity) as key indicators for high-strength SSCAs. Conversely, EC decreases as values of P25_var, P18_var, and P6_var rise. This finding implies that to achieve high EC, SSCAs should incorporate elements with P25 (Martynov and Batsanov electronegativity), P18 (Debye temperature), and P6 (atomic concentration) values closely matching those of copper.

Figure 5 elucidates the properties of various SSCAs (Cu-Mg, Cu-Sn, Cu-Zn, Cu-Al) under different processing conditions (HT). Cu-Mg alloys outperform other solid-solution strengthened alloys, both in as-deformed and as-annealed states, corroborating previous research [1,31]. While Cu-Al alloys may exhibit higher strength due to its high solid solubility (17 at. % at room temperature) [32], their EC significantly diminishes as the alloys have high levels of solution atoms, leading to elevated variance values of P25 (Martynov and Batsanov electronegativity), P18 (Debye temperature), and P6 (atomic concentration). These insights offer invaluable direction for the design and optimization of SSCAs. By understanding the key features and their impact on alloy properties, researchers and engineers can make more informed decisions in selecting the appropriate compositions and processing conditions.

4. Conclusions

This research offers an integrative approach that marries FE with ML algorithms to predict the performance of SSCAs. The study compellingly illustrates the fact that FE significantly boosts the predictive accuracy and generalizability of the ML models. Specifically, the R² values for predicting HV and EC showed marked improvements, rising from 0.939 to 0.971 and from −1.05 to 0.934, respectively, when tested on an independent dataset. Beyond accuracy enhancements, the study underscores the importance of incorporating domain-specific knowledge into ML models to bolster their interpretability. By doing so, the work succeeds in identifying the critical features that profoundly influence the HV and EC properties of SSCAs. These discoveries illuminate the intricate interplay between material attributes and alloy performance, thereby providing invaluable guidance for the design and optimization of SSCAs. This work paves the way for more informed decisions in alloy composition and processing, offering a robust framework for future research and practical applications in material science.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/met13101641/s1, Table S1: The original data of SSCAs used in this work; Table S2: Comprehensive description of the elements’ properties; Table S3: Remaining features after each screening step.

Author Contributions

Conceptualization, J.H. (Jianxin Hou) and J.H. (Jian Hu); formal analysis, T.F., J.H. (Jianxin Hou) and J.H. (Jian Hu); funding acquisition, J.H. (Jian Hu); investigation, T.F.; methodology, T.F. and J.H. (Jianxin Hou); project administration, J.H. (Jian Hu); software, T.F.; supervision, J.H. (Jianxin Hou) and J.H. (Jian Hu); validation, J.H. (Jianxin Hou); visualization, T.F.; writing—original draft, T.F. and J.H. (Jianxin Hou); writing—review and editing, J.H. (Jianxin Hou) and J.H. (Jian Hu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 72192830, 72192831, 51961012 and Science Fund for Distinguished Young Scholars of Jiangxi Province, grant number 20212ACB214001.

Data Availability Statement

The date presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Maki, K.; Ito, Y.; Matsunaga, H.; Mori, H. Solid-Solution Copper Alloys with High Strength and High Electrical Conductivity. Scr. Mater. 2013, 68, 777–780. [Google Scholar] [CrossRef]
Lu, L.; Shen, Y.; Chen, X.; Qian, L.; Lu, K. Ultrahigh Strength and High Electrical Conductivity in Copper. Science 2004, 304, 422–426. [Google Scholar] [CrossRef] [PubMed]
Pry, R.H.; Hennig, R.W. On the Use of Electrical Resistivity as a Measure of Plastic Deformation in Copper. Acta Metall. 1954, 2, 318–321. [Google Scholar] [CrossRef]
Andrews, P.V.; West, M.B.; Robeson, C.R. The Effect of Grain Boundaries on the Electrical Resistivity of Polycrystalline Copper and Aluminium. Philos. Mag. A J. Theor. Exp. Appl. Phys. 1969, 19, 887–898. [Google Scholar] [CrossRef]
Li, X.; Lu, K. Playing with Defects in Metals. Nat. Mater. 2017, 16, 700–701. [Google Scholar] [CrossRef]
Han, K.; Walsh, R.P.; Ishmaku, A.; Toplosky, V.; Brandao, L.; Embury, J.D. High Strength and High Electrical Conductivity Bulk Cu. Philos. Mag. 2004, 84, 3705–3716. [Google Scholar] [CrossRef]
Hou, J.X.; Li, X.Y.; Lu, K. Orientation Dependence of Mechanically Induced Grain Boundary Migration in Nano-Grained Copper. J. Mater. Sci. Technol. 2021, 68, 30–34. [Google Scholar] [CrossRef]
Hou, J.; Li, X.; Lu, K. Formation of Nanolaminated Structure with Enhanced Thermal Stability in Copper. Nanomaterials 2021, 11, 2252. [Google Scholar] [CrossRef]
Ma, A.; Zhu, C.; Chen, J.; Jiang, J.; Song, D.; Ni, S.; He, Q. Grain Refinement and High-Performance of Equal-Channel Angular Pressed Cu-Mg Alloy for Electrical Contact Wire. Metals 2014, 4, 586–596. [Google Scholar] [CrossRef]
Li, J.; Cao, B.; Chen, H.; Li, L. Accelerated Design of Chromium Carbide Overlays via Design of Experiment and Machine Learning. Mater. Lett. 2023, 333, 133672. [Google Scholar] [CrossRef]
Lookman, T.; Balachandran, P.V.; Xue, D.; Yuan, R. Active Learning in Materials Science with Emphasis on Adaptive Sampling Using Uncertainties for Targeted Design. npj Comput. Mater. 2019, 5, 21. [Google Scholar] [CrossRef]
Xue, D.; Balachandran, P.V.; Hogden, J.; Theiler, J.; Xue, D.; Lookman, T. Accelerated Search for Materials with Targeted Properties by Adaptive Design. Nat. Commun. 2016, 7, 11241. [Google Scholar] [CrossRef] [PubMed]
Xue, D.; Xue, D.; Yuan, R.; Zhou, Y.; Balachandran, P.V.; Ding, X.; Sun, J.; Lookman, T. An Informatics Approach to Transformation Temperatures of NiTi-Based Shape Memory Alloys. Acta Mater. 2017, 125, 532–541. [Google Scholar] [CrossRef]
Wang, C.; Fu, H.; Jiang, L.; Xue, D.; Xie, J. A Property-Oriented Design Strategy for High Performance Copper Alloys via Machine Learning. npj Comput. Mater. 2019, 5, 87. [Google Scholar] [CrossRef]
Gao, J.; Wang, Y.; Hou, J.; You, J.; Qiu, K.; Zhang, S.; Wang, J. Phase Prediction and Visualized Design Process of High Entropy Alloys via Machine Learned Methodology. Metals 2023, 13, 283. [Google Scholar] [CrossRef]
Shen, C.; Wang, C.; Wei, X.; Li, Y.; van der Zwaag, S.; Xu, W. Physical Metallurgy-Guided Machine Learning and Artificial Intelligent Design of Ultrahigh-Strength Stainless Steel. Acta Mater. 2019, 179, 201–214. [Google Scholar] [CrossRef]
Murdock, R.J.; Kauwe, S.K.; Wang, A.Y.-T.; Sparks, T.D. Is Domain Knowledge Necessary for Machine Learning Materials Properties? Integr. Mater. Manuf. Innov. 2020, 9, 221–227. [Google Scholar] [CrossRef]
Zhang, H.; Fu, H.; He, X.; Wang, C.; Jiang, L.; Chen, L.-Q.; Xie, J. Dramatically Enhanced Combination of Ultimate Tensile Strength and Electric Conductivity of Alloys via Machine Learning Screening. Acta Mater. 2020, 200, 803–810. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature Selection in Machine Learning: A New Perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Zhong, X.; Gallagher, B.; Liu, S.; Kailkhura, B.; Hiszpanski, A.; Han, T.Y.-J. Explainable Machine Learning in Materials Science. npj Comput. Mater. 2022, 8, 204. [Google Scholar] [CrossRef]
Xie, Q.; Suvarna, M.; Li, J.; Zhu, X.; Cai, J.; Wang, X. Online Prediction of Mechanical Properties of Hot Rolled Steel Plate Using Machine Learning. Mater. Des. 2021, 197, 109201. [Google Scholar] [CrossRef]
Guo, S.; Yu, J.; Liu, X.; Wang, C.; Jiang, Q. A Predicting Model for Properties of Steel Using the Industrial Big Data Based on Machine Learning. Comput. Mater. Sci. 2019, 160, 95–104. [Google Scholar] [CrossRef]
Zhang, Z.Y.; Sun, L.X.; Tao, N.R. Nanostructures and Nanoprecipitates Induce High Strength and High Electrical Conductivity in a CuCrZr Alloy. J. Mater. Sci. Technol. 2020, 48, 18–22. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Y.S.; Tao, N.R.; Lu, K. High Strength and High Electrical Conductivity in Bulk Nanograined Cu Embedded with Nanoscale Twins. Appl. Phys. Lett. 2007, 91, 211901. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Pintelas, P.; Livieris, I.E. Special Issue on Ensemble Learning and Applications. Algorithms 2020, 13, 140. [Google Scholar] [CrossRef]
Mendoza, H.; Klein, A.; Feurer, M.; Springenberg, J.T.; Urban, M.; Burkart, M.; Dippel, M.; Lindauer, M.; Hutter, F. Towards Automatically-Tuned Deep Neural Networks. In Automated Machine Learning; Springer: Cham, Switzerland, 2019; pp. 135–149. [Google Scholar] [CrossRef]
Xu, P.; Ji, X.; Li, M.; Lu, W. Small Data Machine Learning in Materials Science. npj Comput. Mater. 2023, 9, 42. [Google Scholar] [CrossRef]
Yang, C.; Ren, C.; Jia, Y.; Wang, G.; Li, M.; Lu, W. A Machine Learning-Based Alloy Design System to Facilitate the Rational Design of High Entropy Alloys with Enhanced Hardness. Acta Mater. 2022, 222, 117431. [Google Scholar] [CrossRef]
Rollett, A.; Humphreys, F.J.; Rohrer, G.S.; Hatherly, M. Recrystallization and Related Annealing Phenomena; Elsevier: Amsterdam, The Netherlands, 2004; ISBN 978-0-08-054041-2. [Google Scholar]
Ma, M.; Li, Z.; Qiu, W.; Xiao, Z.; Zhao, Z.; Jiang, Y. Microstructure and Properties of Cu–Mg-Ca Alloy Processed by Equal Channel Angular Pressing. J. Alloys Compd. 2019, 788, 50–60. [Google Scholar] [CrossRef]
Lee, S.; Im, Y.-D.; Matsumoto, R.; Utsunomiya, H. Strength and Electrical Conductivity of Cu-Al Alloy Sheets by Cryogenic High-Speed Rolling. Mater. Sci. Eng. A 2021, 799, 139815. [Google Scholar] [CrossRef]

Figure 1. (a) The overall process of FE. Results of recursive elimination for (b) HV and (c) EC. Five key features affecting HV and four features affecting EC are selected, respectively, after FE. These five key features affecting HV include HT (heat-treated temperature), P77_var (variance value of Radii metal—Waber), P56_var (variance value of Mendeleev Pettifor), P24_var (variance value of Electronegativity—Allred–Rochow), and P26_var (variance value of Electronegativity—Pauling). On the other hand, the four key features influencing EC are P25_var (variance value of Electronegativity—Martynov and Batsanov), P39_var (variance value of Hardness—Brinell), P18_var (variance value of Debye temperature), and P6_var (variance value of Atomic concentration).

Figure 2. Scatter plots of experimental values and those predicted by the ensemble models: (a) HV without FE, (b) EC without FE, (c) HV with FE, and (d) EC with FE. The red dashed line (tangent equals one) represents the ideal fitness level, where the predicted values perfectly match the experimental values.

Figure 3. Feature importance ranked by RF models for (a) HV and (c) EC. SHAP value distribution for (b) HV and (d) EC.

Figure 4. The correlation between key features and SSCAs’ performance: (a–c) depict the correlation between key features and HV, while (d–f) show the relationship between key features and EC.

Figure 5. Combinations of EC and HV of various SSCAs (Cu-Mg, Cu-Sn, Cu-Zn, Cu-Al) under different processing conditions predicted by the ML models.

Table 1. Comparison between the ensemble models with and without FE.

Metrics		HV Model		EC Model
Metrics		With FE	Without FE	With FE	Without FE
R²	Training set	0.983 ± 0.008	0.971 ± 0.016	0.941 ± 0.048	0.846 ± 0.099
R²	Testing set	0.949 ± 0.017	0.929 ± 0.026	0.792 ± 0.178	−1.08 ± 3.05
RMSE	Training set	8.82 ± 2.40	11.4 ± 3.1	4.95 ± 2.31	8.26 ± 3.29
RMSE	Testing set	15.9 ± 2.4	18.6 ± 3.6	7.51 ± 2.94	19.7 ± 6.6
MAE	Training set	6.26 ± 1.74	8.19 ± 2.64	3.02 ± 1.50	5.39 ± 2.26
MAE	Testing set	11.2 ± 1.8	14.0 ± 2.6	6.41 ± 1.74	16.6 ± 6.0
MAPE	Training set	(4.37 ± 1.43)%	(5.36 ± 2.00)%	(1.55 ± 0.95)%	(3.19 ± 1.60)%
MAPE	Testing set	(7.77 ± 2.30)%	(11.0 ± 3.2)%	(5.81 ± 1.54)%	(16.6 ± 6.6)%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, T.; Hou, J.; Hu, J. An Effective Framework for Predicting Performance of Solid-Solution Copper Alloys Using a Feature Engineering Technique in Machine Learning. Metals 2023, 13, 1641. https://doi.org/10.3390/met13101641

AMA Style

Fan T, Hou J, Hu J. An Effective Framework for Predicting Performance of Solid-Solution Copper Alloys Using a Feature Engineering Technique in Machine Learning. Metals. 2023; 13(10):1641. https://doi.org/10.3390/met13101641

Chicago/Turabian Style

Fan, Tiehan, Jianxin Hou, and Jian Hu. 2023. "An Effective Framework for Predicting Performance of Solid-Solution Copper Alloys Using a Feature Engineering Technique in Machine Learning" Metals 13, no. 10: 1641. https://doi.org/10.3390/met13101641

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Effective Framework for Predicting Performance of Solid-Solution Copper Alloys Using a Feature Engineering Technique in Machine Learning

Abstract

1. Introduction

2. Methods

2.1. Data Collection

2.2. Feature Engineering

2.3. Machine Learning Modeling and Evaluation Criteria

3. Results and Discussion

3.1. Feature Engineering Processs

3.2. Machine Learning Modeling

3.3. Effects of Key Features on the Properties of SSCAs

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI