1. Introduction
Nowadays, semiconductor manufacturing technologies have advanced rapidly, driven by the demand for smaller, faster, and more reliable electronic devices. However, this advancement has also brought up a significant technical challenge, in which the physical limit of transistor scaling creates enormous difficulties in continuing on the path of Moore’s Law [
1]. In the post-Moore era, the concept of “More than Moore” based on heterogeneous integration using new packaging technologies [
2,
3,
4,
5,
6,
7,
8] is becoming more critical and demanding. Flip-chip chip-scale packaging (FCCSP) possesses the capacity of a high I/O count, miniaturization, and great electrical performance, and thus becomes one of the promising packaging solutions for realizing heterogeneous system integration.
Despite the fact that FCCSP has been one of the mainstream packaging technologies today (see, e.g., [
6,
7,
8]), there are still several technical issues that need to be addressed, such as yield, reliability, thermal performance, and warpage. Among them, the warpage induced during the manufacturing process is particularly important as it can cause various process problems in subsequent process steps, such as handling, registration, and alignment, eventually resulting in yield and throughput losses [
9,
10]. It is, thus, essential to have a thorough understanding of its warpage behavior in the initial design stage. In the literature, few studies have been reported on the characterization and management of the warpage behavior of FCCSP during fabrication through theoretical analysis, such as finite element analysis (FEA), and experiments [
8]. As compared to experimental approaches, theoretical analysis can be not only more efficient and cost-effective, but also capable of giving better insight into the physical mechanisms.
To solve the aforementioned challenges and even lessen the prediction uncertainty and modeling error made by less experienced engineers, researchers start seeking the integration of simulation and machine learning (see, e.g., [
11,
12,
13,
14,
15,
16]). To date, due to the rapid advance of computer technologies and machine learning algorithms, it has evolved into a critical tool for addressing a wide range of real-world issues, with applications covering medical diagnosis, transportation, space exploration, defense systems and various engineering fields. Deep learning is a branch of machine learning which incorporates artificial deep learning neural networks (NNs). These NNs are composed of a number of neurons, each of which performs a simple mathematical function of the inputs. By assembling these layers, deep learning models have the ability to learn complex information and features from raw data, such as images, audio, text, and sensor readings. This has allowed the development of a deep learning-based prediction model for timely and effective predictive analysis of very complex systems. The simulation-based deep learning prediction models have been extensively applied in advanced microelectronic packaging for a quick and accurate assessment of their thermal-mechanical performance, such as thermal performance [
12,
13] and reliability [
14,
15,
16]. For example, Law et al. [
12] developed a deep learning model for the prediction of the thermal performance of quad flat no-lead (QFN) packages using an ANN. Subbarayan et al. [
14] applied an artificial NN (ANN) algorithm to build up a reliability prediction model for a ball grid array (BGA) package. Yuan et al. [
15] applied an ANN-based simulation framework to investigate the solder joint reliability of a wafer-level chip-scale package, where the initial parameters of the ANN model, namely, the weights and bias, were obtained using a genetic algorithm (GA). Hsiao and Chiang [
16] combined FEA together with random forest (RF) to explore the solder joint reliability of wafer-level packaging subjected to thermal cycling.
In addition to conventional gradient-based back-propagation (BP) approaches, evolutionary algorithms (EAs), such as GA, evolutionary strategy (ES), and particle swarm optimization (PSO), have been extensively applied to network topology design and connection weight adaption [
17,
18,
19,
20], mainly because of their advantages over the conventional approaches’, such as conceptual simplicity and flexibility, capability to solve problems without any human expertise, and higher probability to reach a global optimum. White and Ligomenides [
17] proposed a two-stage approach to explore the network topology and connection weights of an NN model by combining a GA and a BP approach. The underlying idea behind this approach is that if the GA was unable to obtain an appropriate network solution, the BP approach with an MP algorithm was further performed to locally explore the optimal weights using the calculated connection weights from the GA as initial values. A similar approach can also be found in Ding et al. [
18]. Juang [
19] introduced an evolutionary recurrent network for a temporal sequence production problem using an evolutionary learning algorithm based on a hybrid of GA and particle swarm optimization (PSO). Ahmadizar et al. [
20] developed an ANN model using an evolutionary-based algorithm that integrates grammatical evolution (GE) for the network topology design and GA for better weight adaptation.
The NN prediction model performance can be alternatively improved through hyperparameter tuning [
21,
22,
23,
24,
25]. Hyperparameters are crucial for the performance of a machine learning model because they control the architecture of a neural network. Well-tuned hyperparameters can also prevent the model from overfitting or underfitting (see, e.g., [
26]). In the literature, the hyperparameters were mostly tuned using trial-and-error parametric analysis (one factor at a time) [
21], grid search [
22], and random search [
23]. The former two approaches are either unable to account for the interaction effect of hyperparameters, or computationally expensive, especially for models with a large number of hyperparameters and a huge search space. Random search could be a more efficient and cost-effective approach; however, theoretically, it is less probable to find the optimal hyperparameter setting. EAs, such as GAs [
24,
25], are a feasible alternative to determine the best set of hyperparameters. Even though Erpolat Taşabat and Aydin [
25] found that for hyperparameter optimization, GAs can be more efficient in computation than grid search, the heuristic algorithms may fail to converge to an optimal or even good result due to their premature convergence in nature, and, in addition, they are computationally cost-ineffective due to their poor convergence. Consequently, a more effective and cost-effective hyperparameter optimization approach is preferred and needed.
According to the above literature survey, there are still very limited studies on the development of the warpage prediction model for electronic packaging, not to mention FCCSP. Thus, this work attempts to develop a prediction model using a proposed FEA-based ANN approach to facilitate an effective and quick estimate of the process-induced warpage behavior of FCCSP for use in subsequent fabrication process design. In order to upgrade model prediction accuracy and training performance, an ANN algorithm integrating a novel subdomain-based sampling strategy and Taguchi hyperparameter optimization is proposed for prediction model design and training. To simulate the fabrication process, an FEA-based process modeling approach is proposed, which takes into account the viscoelastic behavior of the epoxy molding compound (EMC) and the temperature-dependence of the thermal-mechanical properties of the materials in FCCSP. For the validation of the proposed process modeling approach, the warpage simulated results are compared against the warpage measurement data. Moreover, warpage parametric analysis is performed to characterize the crucial factors that mainly influence the warpage behavior. These characterized crucial factors are utilized for the ANN prediction model’s construction. The benefits of the proposed sampling and hyperparameter tuning techniques are shown by comparison to other existing approaches. Furthermore, the feasibility of the developed warpage prediction model is evaluated using the validation dataset.
3. Theoretical Model of ANN
ANNs, data processing models, are designed to mimic the human nervous system [
12]. The architecture and behavior of ANNs are inspired by the biological NNs in human brains, which process information in a parallel and distributed manner. A typical ANN model mainly comprises three layers, namely, the input layer, hidden layer, and output layer, as shown in
Figure 4. The input layer, i.e., the first layer of an ANN model, is primarily responsible for receiving the external inputs. The hidden layers, the intermediate layers or the neural layers between the input layer and the output layer, manage the ANN’s data processing and computation. Increasing the hidden layers enhances the capability of mimicking a more complex and nonlinear features and behaviors, meanwhile raising the computational complexity and effort, and potentially causing overfitting and poor prediction performance. The output layer, the last layer of an ANN model, is in charge of providing predictions based on the computations performed in the hidden layers. The links connecting neurons in an ANN model are termed connection weights, which are to be solved through optimization.
Figure 4 illustrates the process of passing information through an NN having two inputs (
x1,
x2) and outputs (
o1,
o2), one hidden layer with three neurons (
z1,
z2,
z3) inside, where
w is the weight,
b the bias of the layer, and σ the activation function of the layer. The goal of an ANN model is to modify the weights through optimization or learning process to minimize the discrepancy of the ANN outputs and the target data. Additionally, the setting of hyperparameters of an ANN model, including optimizer, number of hidden layers and neurons, activation function, learning rate, and batch size, is critical to the prediction model’s performance. The most commonly used hyperparameter optimization methods include trial-and-error parametric analysis [
21], grid search [
22], random search [
23] and EAs such as GA [
24,
25]. However, these methods hold various drawbacks (see Introduction). Consequently, a more effective and cost-effective approach using the Taguchi method is proposed to determine the optimal hyperparameter setting for constructing the best-fitted prediction model.
4. Process Modeling
An FEA-based process modeling approach that integrates the ANSYS element death/birth technique and nonlinear FEA is introduced for effectively evaluating the warpage of the FCCSP during the fabrication process. Considering its symmetry, a quarter-symmetric FEA model of the FCCSP is adopted, where a symmetric boundary condition is imposed on these symmetric planes, i.e., the nodal displacements normal to the symmetric planes are zero. In addition, to avoid rigid body motion, the displacement of the bottom node on the intersecting line of these two symmetric planes is constrained in the z-direction. The FEA model of the FCCSP is primarily composed of a coreless substrate, an EMC, Cu pillar bumps, and a silicon die, as shown in
Figure 5, together with the imposed boundary conditions. Hexahedral solid elements in ANSYS, i.e., solid 185, are adopted.
Table 2 lists the number of nodes and solid elements of the FEA models associated with TV1, TV2, and TV3. The Young’s modulus (E) and coefficient of thermal expansion (CTE) of the EMC, prepreg, Sn-Ag-Cu(SAC)305 solder, and solder mask are characterized using a thermal-mechanical analyzer (TMA) (TA Instruments, New Castle, DE, USA) and a dynamic mechanical analyzer (DMA) (TA Instruments, New Castle, DE, USA), and the results are displayed in
Figure 6. Except for the EMC, which is assumed to be a linearly viscoelastic material, they are considered to be linearly elastic, isotropic, and temperature-dependent. In addition, the CTEs and Young’s moduli of the silicon die and Cu are 2.8 ppm/°C and 160 GPa, and 16.3 ppm/°C and 121 GPa, respectively. According to the fabrication process displayed in
Figure 3, the process modeling primarily involves the die bonding process (steps 0–3) and mold cure process (steps 3–6). At step 0, the silicon die, solder layer of the CPB, and EMC are deactivated. At step 1, i.e., heating to the die bonding temperature (260 °C), the solder layer and silicon die are activated to form a mechanical connection between the silicon die and the coreless substrate. At step 4, i.e., heating to the mold cure temperature (175 °C), the EMC is activated to simulate a fully cured EMC.
EMC materials play a significant role in the thermal-mechanical behavior of electronic packaging [
9]. Typically, EMC materials reveal temperature-, time- and strain-rate-dependent viscoelastic behaviors (see, e.g., [
10]), such as creep, stress relaxation, and even hysteresis behavior. The viscoelastic relaxation behavior is generally depicted by a generalized Maxwell model, comprising multiple Maxwell elements and an independent spring connected in parallel. This generalized Maxwell model is well approximated by a Prony series representation for fitting measured relaxation data,
wherein
denotes the relaxation modulus of the entire model,
the relaxation modulus of the
ith Maxwell element,
the long-term fully relaxed modulus,
the time,
the relaxation time, and m the total number of Maxwell elements. Based on the following relationship between the unrelaxed modulus
and
,
the Prony series representation of the generalized Maxwell model (Equation (1)) can be rewritten as
where
represents
.
The time and temperature dependence of the mechanical properties of a viscoelastic material can be correlated using the time–temperature superposition principle (TTSP) [
10]. More specifically, the TTSP suggests that a relaxation curve of a viscoelastic material at a specific temperature can be employed as a reference for further characterizing the relaxation curves at other temperatures by conducting a horizontal translation of the reference relaxation curve in the logarithmic time domain. The temperature translation factor
is normally approximated using an empirical relationship, the so-called Williams–Landel–Ferry (WLF) equation,
In Equation (4), and are the curve fit coefficients, and the reference temperature.
The master curve of the relaxation modulus at a reference temperature can be constructed by translating the measured frequency-dependent storage moduli at multiple temperatures along the time axis with temperature translation factors
. Based on the relaxation modulus at different isothermal temperatures under 1% applied strains [
10], the constructed reference master curve at the glass transition temperature of the EMC is shown in
Figure 7 and the fitted coefficients (
,
) of the Prony series model with 21 terms are given in
Table 3. Furthermore, the fitted coefficients
and
of the WLF model for the characterized translation factors as a function of temperature are 74.7 and 313.9, respectively.