Next Article in Journal
Time Efficiency Improvement in Quadruped Walking with Supervised Training Joint Model
Next Article in Special Issue
A Quantum-Based Beetle Swarm Optimization Algorithm for Numerical Optimization
Previous Article in Journal
GIS-Analysis for Active Tectonics Assessment of Wadi Al-Arish, Egypt
Previous Article in Special Issue
An Improved Multiple-Target Tracking Scheme Based on IGGM–PMBM for Mobile Aquaculture Sensor Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Expandable Yield Prediction Framework Using Explainable Artificial Intelligence for Semiconductor Manufacturing

1
Department of Semiconductor and Display Engineering, Sungkyunkwan University, Suwon-si 16419, Republic of Korea
2
Semiconductor R&D Center, Samsung Electronics Co., Ltd., Hwaseong-si 18448, Republic of Korea
3
College of Information and Communication Engineering, Sungkyunkwan University, Suwon-si 16419, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(4), 2660; https://doi.org/10.3390/app13042660
Submission received: 30 January 2023 / Revised: 11 February 2023 / Accepted: 15 February 2023 / Published: 18 February 2023
(This article belongs to the Special Issue Intelligent Control Using Machine Learning)

Abstract

:
Enormous amounts of data are generated and analyzed in the latest semiconductor industry. Established yield prediction studies have dealt with one type of data or a dataset from one procedure. However, semiconductor device fabrication comprises hundreds of processes, and various factors affect device yields. This challenge is addressed in this study by using an expandable input data-based framework to include divergent factors in the prediction and by adapting explainable artificial intelligence (XAI), which utilizes model interpretation to modify fabrication conditions. After preprocessing the data, the procedure of optimizing and comparing several machine learning models is followed to select the best performing model for the dataset, which is a random forest (RF) regression with a root mean square error (RMSE) value of 0.648. The prediction results enhance production management, and the explanations of the model deepen the understanding of yield-related factors with Shapley additive explanation (SHAP) values. This work provides evidence with an empirical case study of device production data. The framework improves prediction accuracy, and the relationships between yield and features are illustrated with the SHAP value. The proposed approach can potentially analyze expandable fields of fabrication conditions to interpret multifaceted semiconductor manufacturing.

1. Introduction

In the highly competitive semiconductor manufacturing industry, yield analysis plays a vital role in increasing production efficiency and reducing operating costs. This is because yield is one of the most critical factors in profit, and yield analysis results are applied to enhance yield and adjust settings for poor process conditions. If yield can be predicted in advance, production planning efficiency would increase, and the accomplishment of the yield enhancing plan would be confirmed punctually. Since modern electronic devices are scaled and complicated, the overall manufacturing procedure takes over several weeks from the input of wafers to the chip packaging. Briefly, the fabrication of a device generally consists of three stages: (1) the wafer fabrication procedure constructs integrated circuits on each wafer via hundreds of precise and well-controlled processes; (2) the assembly utilizes wafer probing test results to sort dice and encases each die through multiple steps; and (3) the final test stage includes electrical testing to verify the reliability and the quality of produced chips. When there is a problematic situation that lowers the yield in the wafer fabrication stages and the cause of the lowering yield is detected in the final test phase, it lowers the profit and takes a considerable time to normalize the whole manufacturing procedure [1]. Therefore, at the end of the wafer fabrication phase, there are probing and testing steps to exclude wafers expected to have a low final test yield. Developing a model to predict the yield and counteract the lowering yield procedures in a timely manner based on the induced knowledge, is one way to effectively utilize the intermediate test results. The prediction models can strengthen the competitiveness of a semiconductor manufacturer.
In work-site operations, a substantial part of yield analysis is practiced manually by experienced domain engineers. For instance, the wafers expected to have yield improvement are analyzed to confirm yield-enhancing process conditions. After the confirmation, the wafers with the modified process are predicted to have the same yield gain. On the other hand, if an instrument malfunctions, the wafers processed in the machine for a certain amount of time are predicted to have a lower yield. This method has a limit because there are possibilities for engineers to miss yield-affecting circumstances and for the same condition of one step to affect each wafer differently. Moreover, there are overfull data to monitor and examine manually in practical yield prediction. The various monitoring systems in production lines include metrology and inspection tools for monitoring processes, sensors on instruments to check the process conditions, and tests of properties for each wafer, such as electric die sorting (EDS) yield, wafer acceptance test (WAT), and final test (FT) [2]. These systems monitor only a limited number of wafers, a part of the area on chips, or certain conditions due to limitations of time, capacity, and cost. Therefore, they are not enough to identify multifaceted factors that enhance or lower the yield. Another limitation is that various data are stored in several systems separately for individual areas and are organized in different formats, while there are plenty of kinds of data, including numerical type, categorical type, and serial type, among others. Additionally, practical yield analysis is repeatedly conducted for each stored data to screen the yield-related features. Consequently, the yield analysis necessarily includes wide ranging data and needs to be expandable.
An overview of the related literature is presented in this paragraph. In previous studies, semiconductor yield analysis has a long history, and particular research on yield modeling is summarized in the reference [3]. Various perspectives include defective yield loss [4], statistical process control [5], and integrated process specification systems [6]. Recently, machine learning techniques have attracted much attention as they enable researchers to handle large-scale data and automate analysis. In numerous prediction studies in diversified fields including information technology, geographical science, industrial manufacturing, and energy production, established regression models for each research purpose are used through comparison and selection among various machine learning and deep learning models [7,8,9,10,11,12]. In our study, we adopted the procedure of choosing the most fitting model for the prediction. Specifically, the machine learning-based method has been applied in various studies in the semiconductor industry [2,13,14,15,16,17,18]. In our view, yield research mainly focuses on analysis with one kind of input variable. Various types of independent variables exist; however, the most used variables are one type or from a dataset obtained in one test procedure. One study focuses on non-normal distribution test parameters to estimate the yield using the principal component analysis [19]. Considering the delays of gates and paths on a chip, researchers predict parametric yield using statistical timing analysis [20]. Machine learning-based root cause detection approaches are developed to target a specific circuit test yield [21]. The existing approaches for metrology data usage for yield classification focus on the imputation of missing data and counteraction of the imbalanced classes [22]. There are studies on back-end FT yield regression and classification modeling based on front-end WAT data [1,23]. Some studies preprocess the categorical input data using one-hot encoder or other encoders to utilize non-numerical data as input data for machine learning models, and the categorical and numerical data are recorded in the same step [1,24]. In previous research, big data analysis for low yield validates its detection efficiency with simulation, especially for the development of new devices [25]. There is a study to modify the gradient boosting algorithm model to analyze multi-step data from semiconductor manufacturing [26].
Regarding XAI, a few studies have applied this method for semiconductor manufacturing. As discussed in the literature, XAI has been studied to apply artificial intelligence with more transparency while maintaining high performance [27]. Additionally, the explainability makes the continuous improvement of models possible and approves AI-based decisions. The SHAP value method is adopted to improve process quality in the real production of semiconductor devices [28]. SHAP analysis is used to rank the features that affect the electrical test scores of the device [24]. To the best of our knowledge, few studies cover machine learning-based modeling using various kinds of wafer fabrication data to predict EDS yield and explain each wafer and feature with the XAI method. This study uses different types of data from the whole wafer fabrication stage as input data to examine yield impact. Multiple machine learning models are trained to compare prediction performance for the wafer yield and the top model is explained on the basis of SHAP values. Brief explanations of the utilized machine learning algorithms and the SHAP value method are mentioned in the proposed framework part of this manuscript.
Less attention has been focused on the analysis of multiple kinds of data in the front-end production stage to predict EDS yield. Therefore, this paper aims to predict wafer yield based on combined fabrication data to include possible causes in investigating as many fields as possible and speeding up the feedback. The framework is expanded by interpreting the model to determine the yield-lowering factors and by explaining the relationship between factors and yields. The XAI technique-based SHAP decomposes individual attribution of target-affecting factors [29,30]. The main motivation for this research is to build an automated yield prediction model, to enhance the effectiveness of production scheduling, to report an analysis for adjusting the problematic condition of processes, and to deepen the understanding of fabrication conditions. To sum, the work presented here provides the highlights listed below:
  • The yield prediction framework utilizes various types of fabrication data, allowing input data expansion;
  • XAI technology, i.e., SHAP, is implemented and improves the possibility of explanation for the most performant model;
  • Demonstration using a real-world dataset is analyzed by SHAP values, including the discovery of factors affecting yield.
Moreover, this work possibly contributes to developing procedures of advanced devices, including frequent evaluation of various process modifications in the wafer fabrication, with the determination of yield-affecting factors.
The remainder of this paper is as follows: Section 2 describes the framework and methods used in this study. Section 3 provides main findings of a case study and further discussion. Finally, Section 4 provides the conclusion, which contains the possible application of this proposal and the suggestion for future studies.

2. Proposed Framework

The previous section reveals that earlier studies have not included mostly diversified fabrication data in modeling or the XAI techniques for explaining models. To solve the addressed problems, we propose a method, i.e., building models with four different types of data to predict EDS yield and interpreting the chosen model with SHAP values. Although the input and output data are selected for practical use in the field, the framework is adoptable for expansion and various investigations. The framework includes (1) data preparation with preprocessing, (2) model optimization and selection, and (3) prediction of yield and explanation of the model. Figure 1 shows the overall framework proposed in this paper. To confirm the stability of the expandable framework, the framework is applied to each dataset, and the extracted yield-related features are compared.

2.1. Data Preparation with Preprocessing

The input variables consist of various data subsets related to wafer fabrication, including process operating conditions, time spans, equipment units, and some sensor parameters. These are common fields of investigation for practical yield analysis. Various fabrication data are combined with each wafer to organize the input variables into a two-dimensional form.
The numerical data include the time spans from process steps to a designated step, sensor parameters measured for each processed wafer on several steps, and EDS yield data as target variables. Each data item has a different scale of values; hence, standardization is necessary to prevent models from being biased. The categorical data include the settings of conditions for process operation and units of instruments on several processes. These are string-type data that need to be converted into numeric values for machine learning applications [1]. The one-hot encoding method is used to transform data and maintain the information on column names, which is useful in the explanation stage. The encoding procedure is chosen since most of the categorical data have no rank order, and the same degree separating ordinal encoding possibly delivers unintended meaning. The Pearson correlation coefficients are calculated, and the dimensions of input data are reduced in case of coefficient values exceeding 0.95 [23]. We split the dataset into training and test sets and used the only training set for model optimization and selection.

2.2. Model Optimization and Selection

In this research, ten popular and high-performance regression models were hired for comparison. The ten machine learning algorithms consist of a linear-based learner, four tree-based models, two kernel-based learners, a neural network-based learner, an instance-based learner, and a sample consensus algorithm [8]. The training dataset was used for training and validation, adopting the cross-validation method which splits the training set multiple times to obtain different training and validation datasets and avoids overfitting. The lasso algorithm is a linear model using a regularizer of L1, with strength in excluding useless variables [31]. The adaptive boosting (AdaBoost) regression model is a decision tree-based model that uses multiple regressors trained according to the errors of the prior regressors [32]. Two more tree-based boosting models are extreme gradient boosting (XGBoost) regression and light gradient boosting model (LightGBM) regression. XGBoost uses a gradient boosting mechanism and is well-known for efficient computing [33]. LightGBM is well-known for its speed and leaf-wise expanded growth strategy [34]. The RF regression algorithm uses bagging techniques to build trees using subsamples and random subsets of predictors, and RF aggregates multiple tree models to avoid overfitting [35]. The two kernel-based regression models are the support vector regression (SVR) model and the Gaussian process regression (GPR) model. SVR, or support vector machine-based regression problem solving, employs the kernel trick of mapping input vectors to higher-dimensional feature spaces [36]. GPR is a nonparametric machine learning model based on the Bayesian approach, which is beneficial for measuring uncertainty over prediction [37]. Multilayer perceptron (MLP) is a feedforward artificial neural network algorithm that trains models with the backpropagation technique [38]. By using neighborhood interpolation, K nearest neighbor (KNN) regression predicts the target [39]. Random Sample Consensus (RANSAC) is an algorithm that generates putative solutions with the most points in a consensus set through random sampling iteration and is suitable for datasets with a high number of outliers [40]. In this study, sklearn in Python was used for Lasso, AdaBoost, RF, SVR, GPR, MLP, KNN, and RANSAC modeling. The Python packages xgboost and lightgbm are used for XGBoost and LightGBM modeling.
The RMSE and mean absolute error (MAE) are the metrics for calculating model performance [7,12]. The formulas are as follows:
R M S E = 1 n i = 1 n ( y i y i ^ ) 2
M A E = 1 n i = 1 n | y i y i ^ |
where n denotes the total quantity of data points, y i denotes the empirical yield values in the dataset, and y i ^ denotes the predicted yield values from the model. The grid search with a cross-validation method is used to determine the optimal hyper-parameters for each model. RMSE and MAE values of hyper-parameter tuned models are compared to select the best model. The scores are measured multiple times with N-fold cross-validation method to enhance the robustness of models.

2.3. Prediction and Explanation

The tuned models are applied for the prediction of the test dataset, which is separated from the model optimization and selection procedure. The scores of prediction performance can prove the feasibility of the model selection procedure. The best performing model is combined with the XAI method, i.e., SHAP. Originating from game theory in economic science, the Shapley value is the relative contribution of a factor to the outcome [29]. The SHAP value is a calculated attribution value combining conditional expectation and the Shapley value [30]. As shown in the following Equation (3), the SHAP value for each feature is defined as
ϕ i = S F \ { i } | S | ! ( | F | | S | 1 ) ! | F | ! [ f S { i } ( x S { i } ) f S ( x S ) ]
where F is the set of input features, S F \ { i } is the subset with feature i , f is the model prediction, and f S { i } ( x S { i } ) f S ( x S ) denotes the marginal contribution of feature i to the prediction. There are several reasons to paying attention to SHAP, such as local accuracy, consistency, model agnostic nature, and ability to visualize the interpretation. The computed SHAP values show the contribution of each instance to the prediction. The explanation model f ^ ( x ) , which approximates the model, is the summation function of feature attributions, and the sum is equal to the model output of a single instance x , following the local accuracy property [30,41,42,43]:
f ( x )   f ^ ( x ) = ϕ 0 + i = 1 M ϕ i ( x )
where ϕ 0 is Ε [ f ^ ( x ) ] expected value of the function. The method also calculates the rank of features, specifically the averaged absolute SHAP values on features. The SHAP values for a specific parameter illustrate the relationship between the parameter and target values in the model [44]. The Python package shap is adapted in this manuscript.

3. Results and Discussion

The purpose of this experiment is to confirm the proposed method with the recent empirical data. The fabrication and yield data are provided from the device processes of a manufacturer of semiconductors. For proprietary reasons, the exact names of variables and the device are not revealed. The data are restricted by the wafers of a specific device with EDS yield data to use them in supervised modeling. The organized dataset has 352 parameters and 327 wafers for specific device production. After input data preprocessing, including one-hot encoding and Pearson correlation coefficient-based dimension reduction, the number of input parameters becomes 983. Table 1 summarizes the counts of the overall dataset and each type of dataset during preprocessing and after the train test split. The distributions of target variables are similar and not skewed, as statistically summarized in Table 1.

3.1. Model Selection and Prediction

During the model optimization and selection procedure, grid search tunes hyper-parameters, which vary depending on the model algorithms as shown in Table 2. The hyper-parameters include the number of estimators, max depth, subsampling size, learning rate, kernels, regularization factors, activation functions, max iterations, etc. The ten tuned models are compared using MAE and RMSE scores, which are obtained dozens of times by the cross-validation method to select the model of the best performance. As shown in Figure 2, the RF model has the smallest MAE and RMSE average, and the standard deviation of RF records good scores among the ten models. The KNN model shows the second-best performance with the training and test datasets. In terms of validation and prediction scores, the RANSAC and MLP models are ranked at the bottom. Other models, which are SVR, LightGBM, GPR, XGBoost, AdaBoost, and Lasso, have similar validation and prediction performances for estimating EDS yield with the combined fabrication data.
The prediction of EDS yield with the tuned models is executed using the test dataset. The MAE and RMSE values of each model are compared, as shown in Figure 3. The prediction results reveal that the performance of the RF model is the best among all the models in both metrics, with MAE and RMSE values of 0.520 and 0.648, respectively. The result shows a similar order to earlier cross-validated scores in the model selection phase. RMSE values show constantly larger values than MAE because RMSE uses squared differences, and larger differences enhance RMSE more. As reference scores, a statistical estimator predicts target values as an average value of the training dataset. The scores of the estimator are 0.744 for MAE and 0.919 for RMSE, and they are inevitably poorer than the scores of machine learning models, as presented in Table 2.
To conclude the building yield model part, the case study shows how the multi-fabrication data are delivered as input data and how the optimization and comparison of models work. Through the suggested framework, the EDS yield of the device is predicted more accurately by 30.1% in the MAE score compared with the simple statistical prediction. The RF algorithm-based model ranks first for the case and is explained with SHAP values in the following chapter. The XAI method illustrates a chosen model and itemizes important features with visualization.

3.2. Explanation of the Model Using the SHAP Value Method

In the following interpretation part, the selected RF model is applied for SHAP analysis, i.e., TreeSHAP, a specialized method for tree-based models [30,41]. Much research is actively conducted regarding XAI because there is uncertainty in decision-making based on the machine learning results. In this study, the SHAP value method is adopted, enabling granular explanations of the contribution of each feature [24,27]. Figure 4a is the SHAP summary plot, showing how the top parameters affect yield prediction. The plot lists in descending order the average of absolute SHAP values and shows the correlation between input variables and the output. For a numerical feature example, as the “step_aa_T” parameter value decreases, the SHAP values decrease to −0.06 where a negative SHAP value means a negative impact on prediction as shown in Equation (4). To examine partial dependency, SHAP value scatterplots describe the effect of changing an individual feature [41,42]. In Figure 5, the SHAP values scatterplot with the “step_aa_T” parameter shows the details of the nonlinear relationship, in which the EDS yield is roughly proportional to the parameter values and differs only near the peak value. As shown in the other plot of Figure 5, the “step_ax_P” feature implies that low and high sensor values are related to lowering yield relatively. As an example of a categorical parameter, the “step_s_R_53” parameter shows its yield-enhancing influence with the SHAP value up to 0.04 on the summary plot, as shown in Figure 4a and on the individual feature’s SHAP value scatter plot in Figure 5. The “step_s_R_52” parameter is the other category processing condition of “step_s_R_53” converted from one-hot encoder, and they show opposite responses in the summary plot, as shown in Figure 4a. As shown in the waterfall chart of Figure 4b, these two categorical parameters share SHAP values, while they mean the wafer is processed with the operating condition presented by “52” instead of “53.” As shown in Figure 4 and Figure 5, the “53” process condition is supposed to increase yield in the prediction model. Practically, these parameters represent process design change, and the influence on yield is consistent with the domain knowledge that “53” is the advanced process of the step.
The waterfall charts indicate how each feature influences the expected target prediction for a specific data point as shown in Figure 4b–d. In other words, SHAP waterfall charts illustrate how the explanation model decomposes the model output for an instance (i.e., a wafer) as the summands of plus and minus SHAP values of each feature as Equation (4). The basic prediction yield of the RF model is Ε [ f ^ ( x ) ] , and the predicted yield according to the summation of SHAP values for each example wafer is f ^ ( x ) [30,44]. This approach can help analysts perform investigations on specific low-yield wafers. For example, Figure 4c shows that a specific low-yield wafer is processed by a specific unit, as represented by the plus value for “step_ac_U_8” and it has a yield lowering effect of −0.04. The sum of all the SHAP values of 983 parameters for the wafer, f ^ ( x ) of Figure 4c, is −0.694, and the predicted yield of the wafer by the RF model, f ( x ) , is −0.673. The SHAP value method decodes the model well for the data point and illustrates the stepwise prediction. The top parameters identified in the expandable framework overlap with those of individual dataset-based models, as shown in Figure S1. The proposed approach can replace iterative modeling for each dataset and save effort and time.

3.3. Discussion and Limitation

The proposed framework to predict EDS yield with extensive wafer fabrication data is demonstrated with real-world data. The dataset consists of diverse data applied for practical yield analysis in the field. The 10 different machine learning models are optimized and compared, to select the best performing prediction model for the data. The chosen RF model shows improved prediction scores, which are 0.520 for MAE and 0.648 for RMSE. We employ the XAI method, i.e., SHAP, to explain the model and present the relationship between the key features and the yield. Thus, this study raises the possibility of practical yield prediction with expandable fabrication data and interpretation through SHAP analysis.
The cautious application of feature analysis is necessary because these relationships, inferred from the SHAP value scatterplot over feature values, do not guarantee causality. Therefore, the counteraction on the fabrication process should be considered carefully with domain knowledge and proper experiments to establish possible causation. Knowledge derived based on the XAI analysis is possibly considered a controlling factor in the wafer fabrication processes, although the physical or chemical background of this phenomenon needs to be discussed and examined through further research. The framework goes beyond simply predicting the yield and listing important features, that is XAI informs how each feature is reflected in the yield prediction and how the yield of each wafer is predicted based on the correlation in the model. Therefore, the XAI method increases the transparency of the model to improve usability.
The limitations of this study must be acknowledged. For new trial operating conditions, modification and retraining of the model are necessary. Some datasets from wafer fabrication are not included in the analysis because of their characteristics. As a typical example, metrology and inspection data have a high missing rate, making the utilization of these data challenging in this framework. Nevertheless, there is research that proposes a method for identifying the key steps in fabrication processes using missing value imputation [22] and other studies that focus on advanced imputation mechanisms, such as virtual metrology (VM) [45,46]. Moreover, there is various information related to quality control, e.g., line condition, equipment maintenance [17,47], source material change, and engineers’ notes. Without wafer information, these datasets necessitate a complicated conversion in order to serve as the input dataset for this modeling. To counteract the expansion of input variables in future studies, the principal component analysis can improve dimension reduction efficiency during preprocessing [13,19,48]. Another limitation is that the yield-improving action based on the analysis results needs to consider various aspects of manufacturing, such as production efficiency, equipment maintenance costs, and serial changes to other features. For example, some time span parameters affect the yield, as shown in Figure 5; however, the restriction on the features would affect prior and later steps and require advanced scheduling [49,50]. In addition, deep learning models are not studied in this paper due to the limited number of data points but are necessarily included in future research [12,51]. Furthermore, a study of the SHAP method with non-tree-based models is required, considering other candidate models such as KNN and SVR [42,52].

4. Conclusions

This paper proposes an EDS yield prediction and interpretation framework with different types of variables amassed from the wafer fabrication procedure. To test and validate the framework, real fabrication data from a semiconductor manufacturer covering general investigation fields of yield analysis is used. In this case study, the model using the RF algorithm is selected as the best performing prediction model, following the evaluation process. The results provide experimental evidence of improved prediction scores, and the XAI-based analysis specifically provides insight into the relationship between wafer fabrication conditions and EDS yield. The analysis results identify the key features, the explanation of low or high yield wafers, and the relationship between features and yield values. Furthermore, the granular SHAP values enhance the understanding of the influence on yield by the principal features.
The findings of this study have several important implications to improve the manufacturing yields of various products with diverse fabrication data. The attempt to use different kinds of data as input variables is important not only to increase the prediction performance but also to have knowledge from every corner related to wafer fabrication. This method is expandable to cover a broad database of complex manufacturing environments. Moreover, XAI assisting scrutiny would bring progress on yield enhancement and manufacturing management.
The limitations of this study are explained here: Despite the inclusion of various manufacturing factors, numerous factors possibly related to yield are not part of this investigation. Some factors are mentioned at the end of the previous section, datasets with missing values and the other factors need to set the way to handle the data, including the condition of the fabrication line, the condition of source material, and information on preventive maintenance or excursions of equipment, among others. Another limitation is that the analyzed yield lowering factors, such as specific units of equipment, were not experimentally implemented in real chip production.
In future work, the various factors need to be explored and included to build more expanded models. Practical experiments in semiconductor manufacturing are also suggested to validate the yield prediction model and the impact of the features, and the possibility of yield enhancement in the field can be tested. Another objective of further research would be the development of an integrated yield enhancement system containing automated yield analysis on the basis of this study. The system can help domain engineers in the field, including non-expert machine learning users, make data-driven decisions.

Supplementary Materials

The supporting files can be downloaded at: https://www.mdpi.com/article/10.3390/app13042660/s1, Figure S1: SHAP value plots for the top parameters of individual dataset models, Example code.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L.; software, Y.L.; validation, Y.L.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and Y.R.; visualization, Y.L.; supervision, Y.R.; project administration, Y.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank Enago (www.enago.co.kr accessed on 31 October 2022) for the English language review. Y.L. is indebted to Eunmi Park at Samsung Electronics, who provided great advice during the paper revision process. The authors would also like to acknowledge Samsung Electronics for providing the real-world database. Finally, the authors would like to express gratitude to the reviewers and editors for their very helpful suggestions and insightful comments in reviewing the manuscripts.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jiang, D.; Lin, W.; Raghavan, N. A Novel Framework for Semiconductor Manufacturing Final Test Yield Classification Using Machine Learning Techniques. IEEE Access 2020, 8, 197885–197895. [Google Scholar] [CrossRef]
  2. Espadinha-Cruz, P.; Godina, R.; Rodrigues, E.M.G. A Review of Data Mining Applications in Semiconductor Manufacturing. Processes 2021, 9, 305. [Google Scholar] [CrossRef]
  3. Kumar, N.; Kennedy, K.; Gildersleeve, K.; Abelson, R.; Mastrangelo, C.; Montgomery, D. A Review of Yield Modelling Techniques for Semiconductor Manufacturing. Int. J. Prod. Res. 2006, 44, 5019–5036. [Google Scholar] [CrossRef]
  4. Tyagi, A.; Bayoumi, M.A. Defect Clustering Viewed through Generalized Poisson Distribution. IEEE Trans. Semicond. Manuf. 1992, 5, 196–206. [Google Scholar] [CrossRef]
  5. Spanos, C.J. Statistical Process Control in Semiconductor Manufacturing. Proc. IEEE 1992, 80, 819–830. [Google Scholar] [CrossRef]
  6. Durbeck, D.; Chern, J.-H.; Boning, D. A System for Semiconductor Process Specification. IEEE Trans. Semicond. Manuf. 1993, 6, 297–305. [Google Scholar] [CrossRef]
  7. Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M.A. Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches. Energies 2018, 11, 1636. [Google Scholar] [CrossRef] [Green Version]
  8. Zhang, Y.; Ma, J.; Liang, S.; Li, X.; Li, M. An evaluation of eight machine learning regression algorithms for forest aboveground biomass estimation from multiple satellite data products. Remote Sens. 2020, 12, 4015. [Google Scholar] [CrossRef]
  9. Zhang, Y.; Wang, G.; Li, M.; Han, S. Automated Classification Analysis of Geological Structures Based on Images Data and Deep Learning Model. Appl. Sci. 2018, 8, 2493. [Google Scholar] [CrossRef] [Green Version]
  10. Kammerer, K.; Hoppenstedt, B.; Pryss, R.; Stökler, S.; Allgaier, J.; Reichert, M. Anomaly Detections for Manufacturing Systems Based on Sensor Data—Insights into Two Challenging Real-World Production Settings. Sensors 2019, 19, 5370. [Google Scholar] [CrossRef] [Green Version]
  11. Li, Z.; Rahman, S.M.; Vega, R.; Dong, B. A Hierarchical Approach Using Machine Learning Methods in Solar Photovoltaic Energy Production Forecasting. Energies 2016, 9, 55. [Google Scholar] [CrossRef] [Green Version]
  12. Dou, Z.; Sun, Y.; Zhang, Y.; Wang, T.; Wu, C.; Fan, S. Regional Manufacturing Industry Demand Forecasting: A Deep Learning Approach. Appl. Sci. 2021, 11, 6199. [Google Scholar] [CrossRef]
  13. Ge, Z.; Song, Z. Semiconductor manufacturing process monitoring based on adaptive substatistical PCA. IEEE Trans. Semicond. Manuf. 2010, 23, 99–108. [Google Scholar]
  14. Singgih, I.K. Production Flow Analysis in a Semiconductor Fab Using Machine Learning Techniques. Processes 2021, 9, 407. [Google Scholar] [CrossRef]
  15. He, Q.P.; Wang, J. Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes. IEEE Trans. Semicond. Manuf. 2007, 20, 345–354. [Google Scholar] [CrossRef]
  16. López de la Rosa, F.; Sánchez-Reolid, R.; Gómez-Sirvent, J.L.; Morales, R.; Fernández-Caballero, A. A Review on Machine and Deep Learning for Semiconductor Defect Classification in Scanning Electron Microscope Images. Appl. Sci. 2021, 11, 9508. [Google Scholar] [CrossRef]
  17. Hung, Y.-H. Improved Ensemble-Learning Algorithm for Predictive Maintenance in the Manufacturing Process. Appl. Sci. 2021, 11, 6832. [Google Scholar] [CrossRef]
  18. Nakata, K.; Orihara, R.; Mizuoka, Y.; Takagi, K. A comprehensive big-data-based monitoring system for yield enhancement in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 2017, 30, 339–344. [Google Scholar] [CrossRef]
  19. Kovacs, I.; Ţopa, M.; Buzo, A.; Pelz, G. An Accurate Yield Estimation Approach for Multivariate Non-Normal Data in Semiconductor Quality Analysis. In Proceedings of the 2017 14th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), IEEE, Giardini Naxos, Italy, 12–15 June 2017; pp. 1–4. [Google Scholar]
  20. Jess, J.A.; Kalafala, K.; Naidu, S.R.; Otten, R.H.; Visweswariah, C. Statistical Timing for Parametric Yield Prediction of Digital Integrated Circuits. In Proceedings of the Proceedings of the 40th Annual Design Automation Conference, Anaheim, CA, USA, 2–6 June 2003; pp. 932–937. [Google Scholar]
  21. Chien, C.-F.; Wang, W.-C.; Cheng, J. –C. Data Mining for Yield Enhancement in Semiconductor Manufacturing and an Empirical Study. Expert Syst. Appl. 2007, 33, 192–198. [Google Scholar] [CrossRef]
  22. Lee, D.-H.; Yang, J.-K.; Lee, C.-H.; Kim, K.-J. A Data-Driven Approach to Selection of Critical Process Steps in the Semiconductor Manufacturing Process Considering Missing and Imbalanced Data. J. Manuf. Syst. 2019, 52, 146–156. [Google Scholar] [CrossRef]
  23. Jiang, D.; Lin, W.; Raghavan, N. A Gaussian Mixture Model Clustering Ensemble Regressor for Semiconductor Manufacturing Final Test Yield Prediction. IEEE Access 2021, 9, 22253–22263. [Google Scholar] [CrossRef]
  24. Kim, S.; Lee, K.; Noh, H.-K.; Shin, Y.; Chang, K.-B.; Jeong, J.; Baek, S.; Kang, M.; Cho, K.; Kim, D.-W.; et al. Automatic Modeling of Logic Device Performance Based on Machine Learning and Explainable AI. In Proceedings of the 2020 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), IEEE, Kobe, Japan, 23 September–6 October 2020; pp. 47–50. [Google Scholar]
  25. Chien, C.-F.; Liu, C.-W.; Chuang, S.-C. Analysing Semiconductor Manufacturing Big Data for Root Cause Detection of Excursion for Yield Enhancement. Int. J. Prod. Res. 2017, 55, 5095–5107. [Google Scholar] [CrossRef]
  26. Lee, G.T.; Lim, H.; Jang, J. Sequential residual learning for multistep processes in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 2022, 36, 37–44. [Google Scholar] [CrossRef]
  27. Wang, D.; Thunéll, S.; Lindberg, U.; Jiang, L.; Trygg, J.; Tysklind, M. Towards Better Process Management in Wastewater Treatment Plants: Process Analytics Based on SHAP Values for Tree-Based Machine Learning Methods. J. Environ. Manag. 2022, 301, 113941. [Google Scholar] [CrossRef]
  28. Senoner, J.; Netland, T.; Feuerriegel, S. Using Explainable Artificial Intelligence to Improve Process Quality: Evidence from Semiconductor Manufacturing. Manag. Sci. 2021, 68, 5704–5723. [Google Scholar] [CrossRef]
  29. Shapley, L.S. Stochastic Games. Proc. Natl Acad. Sci. USA 1953, 39, 1095–1100. [Google Scholar] [CrossRef] [Green Version]
  30. Lundberg, S.M.; Erion, G.G.; Lee, S.-I. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv arXiv preprint. 2018. [Google Scholar]
  31. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  32. Drucker, H. Improving Regressors Using Boosting Techniques. In Proceedings of the ICML, Nashville, TN, USA, 8–12 July 1997; Volume 97, pp. 107–115. [Google Scholar]
  33. Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  34. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural. Inf. Process Syst. 2017, 30, 3149–3157. [Google Scholar]
  35. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  36. Cherkassky, V.; Ma, Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 2004, 17, 113–126. [Google Scholar] [CrossRef] [Green Version]
  37. Rasmussen, C.E. Gaussian processes in machine learning. In Summer School on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2023; pp. 63–71. [Google Scholar]
  38. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Internal Representations by Error Propagation; California Univ San Diego La Jolla Inst for Cognitive Science: La Jolla, CA, USA, 1985. [Google Scholar]
  39. Imandoust, S.B.; Bolandraftar, M. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. Int. J. Eng. Res. Appl. 2013, 3, 605–610. [Google Scholar]
  40. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  41. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  42. Palar, P.S.; Zuhal, L.R.; Shimoyama, K. Enhancing the explainability of regression-based polynomial chaos expansion by Shapley additive explanations. Reliab. Eng. Syst. Saf. 2023, 232, 109045. [Google Scholar] [CrossRef]
  43. Aydin, H.E.; Iban, M.C. Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive exPlanations. Nat. Hazards 2022, 1–35. [Google Scholar] [CrossRef]
  44. Zhang, G.; Shi, Y.; Yin, P.; Liu, F.; Fang, Y.; Li, X.; Zhang, Q.; Zhang, Z. A machine learning model based on ultrasound image features to assess the risk of sentinel lymph node metastasis in breast cancer patients: Applications of scikit-learn and SHAP. Front. Oncol. 2022, 12, 944569. [Google Scholar] [CrossRef]
  45. Kang, P.; Lee, H.; Cho, S.; Kim, D.; Park, J.; Park, C.-K.; Doh, S. A Virtual Metrology System for Semiconductor Manufacturing. Expert Syst. Appl. 2009, 36, 12554–12561. [Google Scholar] [CrossRef]
  46. Lenz, B.; Barak, B.; Mührwald, J.; Leicht, C. Virtual metrology in semiconductor manufacturing by means of predictive machine learning models. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, IEEE, Miami, FL, USA, 4–7 December 2013; Volume 2, pp. 174–177. [Google Scholar]
  47. Zhang, B.; Xu, L.; Chen, Y.; Li, A. Remaining useful life based maintenance policy for deteriorating systems subject to continuous degradation and shock. Procedia CIRP 2018, 72, 1311–1315. [Google Scholar] [CrossRef]
  48. Huang, L.; Dou, Z.; Hu, Y.; Huang, R. Textual analysis for online reviews: A polymerization topic sentiment model. IEEE Access 2021, 7, 91940–91945. [Google Scholar]
  49. Ma, Y.; Qiao, F.; Zhao, F.; Sutherland, J.W. Dynamic Scheduling of a Semiconductor Production Line Based on a Composite Rule Set. Appl. Sci. 2017, 7, 1052. [Google Scholar] [CrossRef]
  50. Lee, G.M.; Gao, X. A Hybrid Approach Combining Fuzzy c-Means-Based Genetic Algorithm and Machine Learning for Predicting Job Cycle Times for Semiconductor Manufacturing. Appl. Sci. 2021, 11, 7428. [Google Scholar] [CrossRef]
  51. Kim, D.; Kim, M.; Kim, W. Wafer edge yield prediction using a combined long short-term memory and feed-forward neural network model for semiconductor manufacturing. IEEE Access 2020, 8, 215125–215132. [Google Scholar] [CrossRef]
  52. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Figure 1. Overall flowchart of yield prediction and explanation with multi-type fabrication data.
Figure 1. Overall flowchart of yield prediction and explanation with multi-type fabrication data.
Applsci 13 02660 g001
Figure 2. Distribution of the 20 validated (a) MAE and (b) RMSE values for the RF, KNN, LightGBM, SVR, GPR, XGBoost, AdaBoost, Lasso, RANSAC, and MLP models using cross-validation method and statistical tables of the average (Avg) and standard deviation (StdDev) values for the models.
Figure 2. Distribution of the 20 validated (a) MAE and (b) RMSE values for the RF, KNN, LightGBM, SVR, GPR, XGBoost, AdaBoost, Lasso, RANSAC, and MLP models using cross-validation method and statistical tables of the average (Avg) and standard deviation (StdDev) values for the models.
Applsci 13 02660 g002
Figure 3. Comparison results of the performances of 10 regression models (RF, KNN, SVR, LightGBM, GPR, XGBoost, AdaBoost, Lasso, RANSAC, and MLP) for an expandable yield prediction.
Figure 3. Comparison results of the performances of 10 regression models (RF, KNN, SVR, LightGBM, GPR, XGBoost, AdaBoost, Lasso, RANSAC, and MLP) for an expandable yield prediction.
Applsci 13 02660 g003
Figure 4. (a) SHAP value plot of feature attribution for top parameters by the RF model. The color corresponds to the range of feature values from high (red) to low (blue). A positive SHAP value means a positive impact on prediction, leading the model to predict a high yield for the wafer. Waterfall plots of example wafers (b) with mid yield, (c) with low yield, and (d) with high yield. The color corresponds to the SHAP value: positive (red) and negative (blue).
Figure 4. (a) SHAP value plot of feature attribution for top parameters by the RF model. The color corresponds to the range of feature values from high (red) to low (blue). A positive SHAP value means a positive impact on prediction, leading the model to predict a high yield for the wafer. Waterfall plots of example wafers (b) with mid yield, (c) with low yield, and (d) with high yield. The color corresponds to the SHAP value: positive (red) and negative (blue).
Applsci 13 02660 g004aApplsci 13 02660 g004b
Figure 5. SHAP value scatterplots according to the values of six example features.
Figure 5. SHAP value scatterplots according to the values of six example features.
Applsci 13 02660 g005
Table 1. Summary of the dataset.
Table 1. Summary of the dataset.
PreprocessingCount of Parameters
Numerical DataCategorical DataTotal Data
Original dataset 176176352
One-hot encoding -1276-
Dimension reduction 62921983
Dataset symbolBrief explanationTypeOriginalOne-Hot EncodingDimension Reduction
StepsMax Label/StepStepsFeatures
ROperating ConditionCategorical14273775
UEquipment UnitCategorical345834846
TProcess TimeNumerical142-2828
PSensor ParameterNumerical34-3434
PreprocessedTarget Data
Training DataTest Data
Count26166
Average0.012(−) 0.048
Standard deviation 1.0210.926
Table 2. Prediction performance of 10 regression models and their tuned hyper-parameters.
Table 2. Prediction performance of 10 regression models and their tuned hyper-parameters.
ModelsMAERMSE **Tuned Hyper-Parameters
RF0.5200.648n_estimators = 400, min_samples_leaf = 2, max_features = ‘sqrt’, max_depth = 12
KNN0.5420.653n_neighbors = 8, p = 1, weights = ‘distance’
SVR0.5310.682kernel = ‘sigmoid’, gamma = ‘auto’, coef0 = 0, C = 0.5
LightGBM0.5570.692colsample_bytree = 0.4, learning_rate = 0.01, max_depth = 5, n_estimators = 200
GPR0.5490.693kernel = RationalQuadratic(alpha = 1, length_scale = 1), alpha = 0.5
XGBoost0.5590.696learning_rate = 0.03, n_estimators = 100, subsample = 0.25
AdaBoost0.5660.716learning_rate = 0.1, n_estimators = 300
Lasso0.5670.726alpha = 0.1, tol = 0.001, max_iter = 2000, selection = ‘random’
RANSAC0.6140.800stop_probability = 0.999, min_samples = 5, max_trials = 500
MLP0.6670.825max_iter = 200, hidden_layer_sizes = (200, 2), activation = ‘logistic’
Statistical estimator *0.7440.919
* Statistical estimator predicts target values of test dataset as the average value of training data set. ** This table is sorted in ascending order by the RMSE value.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, Y.; Roh, Y. An Expandable Yield Prediction Framework Using Explainable Artificial Intelligence for Semiconductor Manufacturing. Appl. Sci. 2023, 13, 2660. https://doi.org/10.3390/app13042660

AMA Style

Lee Y, Roh Y. An Expandable Yield Prediction Framework Using Explainable Artificial Intelligence for Semiconductor Manufacturing. Applied Sciences. 2023; 13(4):2660. https://doi.org/10.3390/app13042660

Chicago/Turabian Style

Lee, Youjin, and Yonghan Roh. 2023. "An Expandable Yield Prediction Framework Using Explainable Artificial Intelligence for Semiconductor Manufacturing" Applied Sciences 13, no. 4: 2660. https://doi.org/10.3390/app13042660

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop