1. Introduction
With their fast switching speed, low driving power consumption, and simple driving circuit, IGBT modules are key devices in power electronic systems [
1]. Wire bonding is the main interconnection method of IGBT modules and is used to connect chips and external circuits [
2]. In practical applications, the bonded interface is subjected to large thermal stresses due to the difference between the coefficients of thermal expansion (CTEs) of the Al bonding wires and the Si chip. As the service time increases, cracks develop at the bonded interface and gradually expand, degrading the mechanical properties of the bonded interface [
3]. The degradation of the mechanical properties of the bonded interface seriously affects the conductive and heat transfer characteristics of the IGBT module and ultimately leads to the IGBT module failing. Studies have shown that cracking of the bonded interface is one of the main failure mechanisms of IGBT modules [
4]. Therefore, studying the mechanical properties of bonded interfaces is crucial for assessing the reliability of IGBT modules.
The mechanical properties of interfaces have been studied primarily through experimental measurements or numerical simulations. The fracture toughness of pure type I/II interfaces is typically experimentally measured via uniaxial tensile tests, shear tests, and end-notched flexural (ENF) tests [
5]. The mixed-mode bending (MMB) test is used to measure the interfacial fracture characteristics of the mixed mode [
6]. However, as the problem becomes more complex, especially under certain practical working conditions, measuring the mechanical properties of the interface through experiments becomes very difficult and expensive [
7]. Therefore, several researchers have adopted numerical simulation methods to characterize the mechanical properties of interfaces. The cohesive zone modeling (CZM) method proposed by Dugdale [
8] and Barenblatt [
9] is the most popular method and has been widely used to address interface problems involving different structures [
10,
11,
12,
13].
In the CZM method, the mechanical behavior of the interface is described by the traction-separation law (TSL). Different TSLs, including the bilinear law [
14]; the trapezoidal law [
15]; the polynomial law [
16]; the exponential potential-based law [
17]; and the Park, Paulino, and Roesler (PPR) law [
18], have different shapes and corresponding parameters. Xu [
19] noted that the effect of the TSL shape generally cannot be ignored, especially for type II fractures. The literature [
20,
21,
22] has summarized the general rules for the influence of the TSL shape. Depending on the material and interface to be simulated, the appropriate TSL (both shape and parameters) must be chosen to make reasonable predictions. Once the TSL shapes and parameters of the bonding interface of IGBT modules are accurately predicted, the health status can be made clear, and further guidance for service reliability can be provided.
Several researchers have used the CZM method to study the bonded interfaces of IGBT modules. Halouani [
23] analyzed the degradation behavior of bonding wires using the bilinear law. Luo [
24] combined the bilinear law with the multiscale approach to study the crack extension of bonding wires. Shqair [
25] established the relationship between the parameters in the bilinear law and the microstructural properties and predicted the crack paths of the bonded interfaces. However, the current studies on bonded interfaces for IGBT modules empirically select TSLs without analyzing the effect of the shape, which may lead to errors.
Methods for determining the TSL parameters can be divided into direct and indirect methods [
26,
27]. Direct methods use experimental tests, such as double cantilever beam (DCB) tests [
26], to determine the TSL parameters. This method is not applicable here, because fabricating specimens with double cantilever beam structures at the bonded interface of IGBT modules is difficult. The indirect method is an inverse analysis method [
19,
28,
29] based on finite element (FE) simulation, where the appropriate TSL is determined by comparing the numerical solution with the experimental measurements. Maier [
30,
31], using an inverse analysis strategy, estimated residual stress and identified the elastic–plastic material parameters. However, this method requires continuous trial and error, is computationally inefficient, and is dependent on the initial parameter estimation. In addition, extensive numerical simulations are required for each experimental result. This has motivated the search for more generalized methods for determining TSL parameters.
Machine learning (ML) technology has been developing rapidly in recent years. Its advantages in terms of its block learning speed, its simple model structure, and its ability to manage larger data volumes and nonlinear mapping make it a promising method for identifying TSL parameters. This method may also be described as an inverse analysis strategy. Su [
32] used an artificial neural network (ANN) model based on exponential law to predict the TSL parameters of the interface between fiber-reinforced polymers. Hou [
33] used a generalized regression neural network (GRNN) model to predict the TSL parameters of a mixed-mode thermal barrier coating system based on the bilinear law. Similarly, the Gaussian process regression (GPR) algorithm [
34], the random forest regression (RFR) model [
35], and dynamic convolutional neural network (DCNN) [
36] have been used to predict various TSL parameters. The ML models in the above studies achieve strong TSL parameter-prediction results. However, in most of the currently established ML models, only a single shape of the TSL is considered, and the effect of the TSL shape is not considered. In addition, the load–displacement (
F–
δ) curve itself has a strong time dependence, and recent research fails to consider the time-series characteristics of the
F–
δ curve as an input parameter. As a result, the established ML models have poor application potentials. Long short-term memory (LSTM) machine learning algorithms, which can effectively handle sequential data and can learn the long-term dependence of the data [
37], have been used to consider time-series features of
F–
δ curves. However, Long’s study [
38] revealed that when LSTM networks were used alone for prediction, the results were only acceptable and did not yield the expected excellent results.
To solve the above problems, in this paper, a CNN-LSTM architecture combining a convolutional neural network (CNN) and LSTM [
39] is used for the first time to estimate TSLs at the bonded interface of IGBT modules. Utilizing the powerful feature-extraction capability of CNNs, the features of the original data are first extracted and subsequently used as inputs to the LSTM for prediction. The proposed method can fully utilize the time-series features of the recorded data and can extract data features and long-term memory curves.
To accurately characterize the mechanical properties of the bonding wire interfaces of IGBT modules, in this work, an experimental study is first conducted on the bonded interfaces to obtain the
F–
δ curve response of the bonded interfaces. Then, based on the tests, the FE-CZM model of the bonded region is established. Three different shapes of TSLs are used, i.e., the bilinear law, exponential law, and polynomial law. Based on the numerical simulation, 1800 datasets are obtained to train the CNN-LSTM architecture with
F–
δ curves as the input parameters and TSL shapes and TSL parameters as the output parameters. The prediction performance of the constructed CNN-LSTM model is evaluated using the coefficient of determination (
R2), the root mean square error (RMSE), the mean absolute percentage error (MAPE), and the error-rate metric. The results of the proposed model are compared with those of the CNN and LSTM architecture. Finally, based on the experimentally obtained
F–
δ-curve responses, the TSL shapes and TSL parameters characterizing the mechanical properties of the bonded interface of the IGBT modules are discussed, and the effectiveness of the methodology is verified by comparing the experimental data and the predicted results. The research schematic used in this study is depicted in
Figure 1.
2. Experimental Procedure
To evaluate the mechanical properties of the bonded interface of the IGBT modules, shear tests were conducted on the bonding wires. The F–δ-curve responses obtained from the tests were used to compare the predicted results and those of the numerical simulations to validate the prediction ability of the CNN-LSTM architecture.
In the shear tests, the device utilized was a 1200 V/450 A IGBT module. The module consists of a plastic case, silicone gel, a Cu backplate, a solder layer, direct bonding copper (DBC) ceramic base plates, bonding wires, a freewheel diode (FWD) chip, and an IGBT chip, as shown in
Figure 2a. Before the shear test, the plastic shell, silicone gel, and Cu base plates were removed for easy operation. The samples for the shear test were obtained as shown in
Figure 2b.
The test equipment used was a DAGE-4000 tensile/shear tester (Nordson, Westlake, OH, USA). The shear height was 10 μm, and the shear speed was 100 μm/s, according to the DVS-2811 standard [
40]. The fabricated specimen was placed on the tester, as shown in
Figure 2c. During the test, the applied load and the displacement of the shear chisel were recorded in real time, as shown in
Figure 2d.
4. Machine Learning Framework
ML has been widely used in different fields, and many different types of ML network architectures exist. The most significant advantage of ML models is that they can make predictions about future data, provided that the model performs well on a testing dataset. However, the ability of a machine learning algorithm to perform well depends on whether the algorithm matches the problem itself. Therefore, based on the problem to be solved in this paper, we choose an ML architecture that combines a CNN and LSTM to fully utilize the advantages of these two algorithms. In this section, the CNN, LSTM, and CNN-LSTM algorithms are given, and the network architecture is built. In addition, the implementation process of the ML architecture, including data collection, the ML architecture, and performance metrics, is presented.
4.1. Data Collection
Collecting sufficient data plays a significant role in the ML model training process. The dataset generated by the FE-CZM model, considering different combinations of TSL shapes and parameters, is utilized to train the developed ML architecture. The stiffness, maximum traction, and critical fracture energy are randomly selected between 10~900 N/mm3, 1~90 MPa, and 0.01~15 N/mm, respectively. Notably, the randomly chosen parameters follow a normal distribution, which ensures the generalizability of the proposed ML architecture. Through a numerical simulation of the FE-CZM model based on the bilinear law, exponential law, and polynomial law, 1000, 400, and 400 datasets, respectively, were collected.
For ML training, different datasets must be selected when the output is the TSL shape or is the TSL parameter. When the output is the TSL shape, all 1800 sets of data are selected as the dataset, 70% of which form the training dataset and 30% of which form the testing dataset. When the output is a bilinear law parameter, 1000 sets of data from the numerical simulations obtained using the bilinear law are selected as the dataset, 70% of which form the training dataset and 30% of which form the testing dataset. Similarly, for exponential and polynomial law parameters, 400 and 400 sets of data obtained from numerical simulations performed using the exponential and polynomial law, respectively, are selected as the dataset, where 70% of each set is used for training and 30% of each set is used for testing.
The inputs of the ML architecture are the
F–
δ response curves obtained from FE-CZM simulations. The
F–
δ curves are represented as equally spaced (
δ/
n) one-dimensional arrays {
x1,
x2,
x3 …
xn}, where
n is the dimensionality of the response, i.e., the number of features, which is set as
n = 200 here. The outputs of the ML architecture are the shapes and parameters of the TSLs. Since the shapes are discrete values and the parameters are continuous values, they are predicted using classification and regression techniques, respectively. To improve the convergence speed and accuracy of the network, all the features are normalized before training is conducted, as follows:
where
and
are the original and scaled features, respectively.
and
are the minimum and maximum values of the
ith feature in the training dataset, respectively.
4.2. ML Architecture
In this section, three different ML architectures are created, namely, the CNN, LSTM, and CNN-LSTM architectures. These architectures are described below. Each network architecture is used separately for classification (TSL shape prediction) and regression (TSL parameter prediction) analyses. Compared to the ML architecture for regression, the ML architecture for classification has an additional Softmax layer (after the fully connected layer); the rest of the structures are otherwise identical.
4.2.1. CNN Architecture
A CNN generally consists of convolutional, convergent, and fully connected layers with two important properties: local connectivity and weight sharing. The convolutional layer is employed to extract the features of a local region, and the pooling layer is employed to reduce the number of features to avoid overfitting. The CNN is constructed using the convolution operation as follows.
where
y(l) is the net input of layer
l;
a(l−1) is the activity value of layer
l − 1;
is the convolutional kernel of layer
l, i.e., the weight vector that can be learned;
K is the size of the convolutional kernel; and
is the learnable bias term of layer
l.
Figure 5a shows the CNN architecture built in this paper, which consists of one fully connected layer and two convolutional layers.
4.2.2. LSTM Architecture
LSTM [
35] is a variant of recurrent neural networks that is better able to learn data with long-term dependencies and can achieve optimal performance on challenging sequence-processing tasks. The main arithmetic process of the LSTM network architecture is as follows: first, using the external state
h(t−1) of the previous moment and the input
x(t) of the current moment, the external input gate
i(t), the forget gate
f(t), the output gate
o(t), and the candidate states
are computed, as shown in Equation (14). Notably, the value of “gate” in the LSTM is in the range of (0, 1). Then, the internal state
c(t) is updated by incorporating the memory cell
c(t−1) of the previous moment, as shown in Equation (15). Finally, by incorporating the output gate
o(t), the information of the internal state is transferred to the external state
h(t), as shown in Equation (16).
Figure 5b shows the LSTM architecture built in this paper, which consists of one LSTM layer and one fully connected layer.
where
is the bias,
is the input value at the current moment, and
is the weight of the three gates and the candidate state.
4.2.3. CNN-LSTM Architecture
Figure 5c shows the proposed CNN-LSTM architecture. This architecture primarily consists of an input layer, a CNN block, an LSTM block, and an output layer. The CNN block consists of a convolutional network with two paths. One path comprises two convolutional layers, i.e., Convolution 1 and Convolution 2, and the other path comprises a convolutional layer, Convolution 1, and a squeeze-and-excitation (SE) block. The convolutional layer utilizes a 2D convolution with a convolutional kernel size of (3, 1) and a ReLU activation function. The SE block [
43] consists of a global average pooling layer, a fully connected layer, a ReLU layer, a fully connected measurement layer, and a sigmoid activation function. The effectiveness of this block has been demonstrated, and optimal performance has been achieved in several tasks [
39]. This block is primarily employed to improve the performance and representation of the ML network by adaptively adjusting the weights of each channel. The output of the CNN block is obtained by multiplying the outputs on the two-path SE block and Convolution 2. The data features of the
F–
δ curve are extracted by the CNN block and are inputted into the LSTM block as feature vectors. The LSTM block consists of an LSTM layer and a full linkage layer for learning the relationships between
F–
δ-curve data features.
The Adam algorithm is used to optimize the network parameters in the ML architecture and to obtain the optimal solution. The learning rate and number of epochs are set to 0.01 and 500, respectively.
4.3. Performance Metrics
To evaluate and compare the performances of the architectures used, we employed three error metrics, i.e., the coefficient of determination (
R2), the root mean square error (RMSE), and the mean absolute percentage error (MAPE), to assess the accuracy of the regression analysis results. And we employed the error-rate metric to assess the accuracy of the classification analysis results. These metrics are defined as follows:
where
,
, and
are the actual test values, predicted test values, and average actual values, respectively; and
m is the number of samples in the dataset.
For each metric, R2 reflects the degree of agreement between the actual data and the fitted function; the closer the R2 value is to 1, the better the fit and the better the prediction results of the architecture. RMSE and MAPE indicate the deviation of the predicted value from the actual value; the closer these values are to 0, the better the prediction results of the architecture.
6. Conclusions
In this work, we establish an FE-CZM model for describing the mechanical behavior of the bonded interface of IGBT modules and combine it with a constructed ML architecture to accurately identify TSL shapes and corresponding parameters. The prediction performances of three architectures, CNN, LSTM, and CNN-LSTM, are compared, the influence of the TSL shape is discussed, and the effectiveness of the CNN-LSTM architecture is verified using the results of shear tests. The following conclusions are drawn from this work:
The proposed CNN-LSTM architecture accurately recognizes suitable TSL shapes, achieving an error rate of 0.186%. Compared to the CNN and LSTM architectures, this architecture achieves the lowest error rate and improves the test performance by 2−4 times.
Based on the obtained TSL shapes, the architecture can more accurately predict the corresponding TSL parameters. The TSL parameters of the three TSL shapes are predicted, and the analysis results reveal that the R2 value is greater than 0.9514, the RMSE is less than 3.6704, and the MAPE is less than 3.5044%. This strongly suggests that the CNN-LSTM architecture proposed in this work excels in recognizing TSL parameters. The CNN-LSTM architecture has the best prediction ability for each TSL parameter relative to the separate CNN and LSTM architectures.
The shear test results of the IGBT modules are used as inputs to the CNN-LSTM architecture to accurately predict the TSL shapes and parameters suitable for describing the mechanical behaviors of the bonded interfaces of IGBT modules. Comparing the F–δ curves of the FE-CZM results with the experimental results reveals an RMSE value of 0.3815, indicating the accuracy of this prediction method.
The influence of the TSL shape is also discussed. The analysis results reveal that the TSL shape significantly impacts the prediction results when the interface parameters of IGBT modules are predicted. The results yielded by using the polynomial law align best with the experimental results.
The main contribution of this paper is the proposed ML architecture, which enables suitable TSL shapes and corresponding TSL parameters to be accurately predicted and identified by experimentally obtaining the F–δ response curves of the bonding interfaces of IGBT modules. A promising and effective solution is provided for describing the mechanical behaviors of IGBT-module bonded interfaces.
The IGBT module will withstand stress and loads in different directions during the test. However, we only considered the load in the shear direction in this paper. Therefore, complex load conditions should be considered in the future.