1. Introduction
With the lithographic line width reaching the physical limit, integrated circuit chips are shifting from transistor scaling to three-dimension integration [
1,
2]. Silicon wafer thinning is needed to satisfy the demand for three-dimension integrated circuits packaging since the primary wafer thickness is relatively large to retain enough stiffness [
3,
4]. Grinding is the commonly used technique to remove redundant material for its high efficiency [
5]. Chemo-mechanical polishing is subsequently conducted to eliminate subsurface damage introduced in the grinding process and improve the reliability of final chips [
6].
The subsurface damage introduced in precision machining is often evaluated by destructive methods. The silicon wafers are diced and the microscopic morphology of the cross-section is observed by scanning electron microscopes (SEM) or transmission electron microscopes (TEM) to evaluate the processing quality [
7,
8]. Gao et al. compared the subsurface damage images machined by grinding with those of chemo-mechanical polishing [
9]. The nature of the cross-section observation methods is to acquire the information of the microstructure changes of monocrystalline silicon. However, the subsurface damage is not uniformly distributed on the surface in the microscale due to the mechanical properties of anisotropic single crystal silicon, which restricts the repeatability improvement of the methods. Raman spectroscopy is also a common method for the evaluation of processing quality [
10,
11,
12]. Like the aforementioned methods, the frequency shift of Raman microscopes corresponds to the microstructure changes, and the measured values will fluctuate evidently when the sampling positions change. Besides, the resolution of Raman spectroscopy is limited and cannot be applied when the subsurface damage introduced in the precision machining process is very small.
In the precision machining (e.g., fine grinding, polishing), the residual stress is generated when there exists plastic deformation in the subsurface damage layer. There is a definite correspondence between the residual stress and the subsurface damage for a specific manufacturing process. Therefore, the residual stress of thinned silicon wafers is preferred to be known as it is an important indicator of the subsurface damage and is very useful in the study of precision machining mechanisms. The residual stress leads to the wafer deformation and there is a definite correspondence between them. The residual stress could be obtained inversely from the wafer deformation data [
13]. Stoney equation is often used to relate the residual stress to the geometric parameters of the wafer deformation [
14,
15]. However, the residual stress and the deformation curvature are assumed to be constant across the silicon wafer, which is often contrary to the actual scenarios of thinned silicon wafers [
16].
The authors have engaged in research on the deformation measurement of silicon wafers and found that the deformation of thinned silicon wafers could be obtained accurately after eliminating the effect of gravity [
17,
18]. Unlike the cross-section observation methods which are focused on the microstructure changes, the deformations of silicon wafers are the overall effects of all the residual stresses across the wafer surface and show the potential for the evaluation of processing quality. Stoney equation cannot be applied when the residual stress or the wafer deformation curvature is not constant across wafer surfaces. However, if the silicon wafer is divided into many subareas each subarea is small enough that the residual stress in one subarea would be approximately the same. Then the residual stress could be obtained based on the principle of superposition in which the entire wafer deformation is calculated as the sum of residual stress-induced deformation of all subareas [
19]. However, there exists a multi-collinearity problem and small measurement errors would lead to large changes in residual stress calculation results. Besides, the residual stress distribution is not continuous across the wafer surface, which does not agree with actual situations [
13]. The conventional regularization method using an Identify matrix as the Tikhonov matrix was used in our previous research. The overfitting phenomenon was improved and the calculated largest residual stress value was reduced [
18]. However, the continuity problem of the residual stress distribution across the wafer was not directly addressed since the positional relationship of different subareas was not taken into consideration.
In this study, the mechanisms for the discontinuity of the residual stress distribution across the wafer and the sensitivity of calculation results to the measurement errors were explored. The influences of the number of subareas and the number of elements in one subarea were investigated. Finally, a regularization method considering continuity constraints in the residual stress calculation process was proposed. The residual stress calculation results with continuity constraints were compared with those without continuity constraints to verify the effectiveness of the proposed method.
2. Principle of Residual Stress Obtainment
When the silicon wafer is divided into small subareas the residual stress could be assumed to be uniformly distributed in one subarea when the division number is large enough. The wafer deformation caused solely by the unit stress load of one subarea could be obtained by finite element simulation and the total wafer deformation is the sum of the wafer deformations caused by residual stresses of all subareas and a system of linear equations could be obtained based on their relationship. For each linear equation, the total wafer deformation at a certain point is equal to the sum of the wafer deformations at the same point caused by residual stresses of all subareas. Then, the actual residual stresses of divided subareas could be obtained by solving the equations.
A number of
n points were used to characterize the wafer deformation. The value of the
ith point for the total wafer deformation was denoted as
ti. The wafer was divided into
m subareas. The deformation value of the
ith point was denoted as
Aij in which
j corresponded to the
jth subarea under the unit load. For the
jth subarea, if the actual residual stress is
σj the wafer deformation at the
ith point would be
Aijxj.
xj is the coefficient of proportionality between the actual residual stress and the unit load for the
jth subarea and could be expressed as follows:
The total wafer deformation at the
ith point
ti was the sum of wafer deformations at the same point caused by residual stresses of all divided subareas based on the principle of linear superposition. Their relationship could be expressed as:
There were
n linear equations corresponding to the
n points and there were
m variables to be determined in the system of linear equations. The system of linear equations could be transformed into a matrix equation [
13], which is shown in Equation (3).
where
The variable
x could be obtained as the least-squares solution [
20], as shown below.
In Equation (5),
t is the measured wafer deformation value. The deformation of a double-sided polished silicon wafer was obtained by the three-point-support method based on the position determination of supports and wafers [
17], which is shown in
Figure 1. The diameter of the silicon wafer was 200 mm and the thickness was 333 μm. The columns of
A were acquired using finite element models and the vector
x could be obtained by solving Equation (5).
One column of matrix
A corresponded to the wafer deformation with one subarea applied a unit load of residual stress. The finite element model was established using commercially available software (Ansys 15.0, Ansys Inc., Canonsburg, USA). The silicon wafer consisted of two layers, as shown in
Figure 2. One layer was the damage-free silicon wafer with perfect lattice and there was no residual stress in the layer. On top of the damage-free silicon layer was the subsurface damage layer. There might exist residual stresses on both sides of the silicon wafer. However, the wafer deformation caused by compressive residual stress on one side equals to be that caused by tensile residual stress of the same size on the other side [
14]. Therefore, it could be simplified and the residual stress in the subsurface damage layer in
Figure 2 represented the residual stresses of both sides. The details of the finite element model of silicon wafers could be found in our previous study [
18].
The total wafer deformation was simulated using the calculated
x and compared with the measurement result to verify the method. As shown in
Figure 3, the simulated wafer deformation was almost identical to the measurement result (
Figure 1). In most regions the difference was smaller than 1 μm, indicating that the calculated residual stress distribution could reproduce the desired wafer deformation.
The residual stress distribution calculated using Equation (5) was plotted in
Figure 4. It was found that the residual stresses of adjacent subareas changed significantly across the wafer surface, which was contrary to the character of the polishing process. In the polishing process, the surface of the polishing pad was smooth and the density of abrasive grain trajectories was uniformly distributed along the wafer surface. Therefore, the residual stress obtained from wafer deformation by solving Equation (5) which was adopted in reference [
13] was not true and new methods needed to be explored to obtain reasonable solutions.
4. Regularization Method with Continuity Constraints
The solution of variable
x using Equation (5) is the direct solution of the inverse problem. The solution errors have a mean of zero, are uncorrelated, and have equal variances. The calculated results of residual stress are the best linear unbiased estimator of the coefficients, which is known as the Gauss–Markov theorem [
21]. It means that the solution of variable
x would enable the wafer deformation calculated by
Ax to be the closest approximation to the measured result
t, which could be expressed as:
However, the relationships of
xj (
j = 0, 1, …, m) were assumed to be independent of each other during the solution process, which was incompatible with the actual situations. Therefore, the solution fluctuated greatly when small perturbations were introduced to the measurement result
t. Regularization methods could be adopted to obtain numerically stable solutions in which additional information is integrated to solve the ill-posed problem [
22]. In this study, the continuity constraint was constructed to obtain the stable solution of variable
x.
Tikhonov regularization method is the most commonly used regularization method [
23]. The method is to obtain the solution
x which minimizes the following weighted sum:
where
μ was the trade-off parameter,
L was the Tikhonov matrix. The parameter of
μ was used to control the weight given to the minimization of the constraint. In this study, the Tikhonov matrix
L was constructed to add the continuity constraint, which was expressed as follows:
The main diagonal elements of the matrix L were all set to 8 which equaled the number of closest subareas to be selected, the other elements were set to −1 or 0. For each subarea, the centroid x and y location values were recorded. The distance between the centers of any two subareas was calculated. For each subarea, the other subareas were sorted by the distance values from this subarea. Eight subareas with the smallest distance values were recorded and the corresponding matrix element in L was set to −1. The number of recorded closest subareas could also be selected as other values depending on the number of subareas.
For example, there were
m subareas numbered from 1 to
m. For the first subarea, the eight closest to it were Subarea 2, Subarea 3, Subarea 4, Subarea 6, Subarea 7, Subarea 8, Subarea 9, and Subarea 10 in
Figure 12. Then the first row of matrix
L should be (8, −1, −1, −1, 0, −1, −1, −1, −1, −1, 0, 0, …, 0). For the second subarea, the eight closest to it were Subarea 1, Subarea 3, Subarea 4, Subarea 5, Subarea 8, Subarea 10, Subarea 11, and Subarea 12. Then the second row of matrix
L should be (−1, 8, −1, −1, −1, 0, 0, −1, 0, −1, −1, −1, …, 0).
Therefore, the matrix
L should be expressed as:
The other rows of matrix
L could be obtained using the same method. Then the
Lx could be expressed as:
The Euclidean norm of
Lx could be expressed as:
It could be inferred from Equation (11) that the Euclidean norm of Lx would be smaller with the decrease of the difference of residual stresses between adjacent subareas. An extreme case was that the residual stress values were all the same across the wafer surface. In the case, x1, x2, x3,…, and xn were all equal to each other and the terms (8x1 − x2 − x3 − x4 − x6 − x7 − x8 − x9 − x10)2, (8x2 − x1 − x3 − x4 − x5 − x8 − x10 − x11 − x12)2, …, and (8xm + Lm1x1 + Lm2x2 + Lm3x3 + … + Lm(m−1)xm−1)2 were all zeros. Then the value of the ‖Lx‖2 would be zero.
When the matrix
L was determined, the variable
x could be calculated from the following equation:
The parameter
μ was chosen based on the balance between the value of ‖
Ax −
t‖
2 and that of ‖
Lx‖
2. The term ‖
Ax −
t‖
2 was adopted to evaluate the misfit of the solution
x which was the measure of data fidelity. When the value of ‖
Ax −
t‖
2 was very large, the difference between the simulated and the measured wafer deformation was large, indicating the underfitting of the solution. The term ‖
Lx‖
2 was used to evaluate the continuity of solution
x. When the value of ‖
Lx‖
2 was very large, the changes of residual stresses of adjacent subareas were large. Different
μ values were chosen to calculate the corresponding values of ‖
Ax −
t‖
2 and ‖
Lx‖
2. Then the plot of residual norm ‖
Ax −
t‖
2 of the regularized solution versus the corresponding norm of ‖
Lx‖
2 could be obtained, which was called the trade-off curve, as shown in
Figure 13.
The vertical part of the trade-off curve corresponded to x values where the simulated wafer deformation differed a lot from the measured wafer deformation. In contrast, the horizontal part of the trad-off curve corresponded to x values where there existed large discontinuities in the residual stress distribution. Therefore, the value of μ was chosen in the corner region.
When the parameter
μ was determined, the variable
x was obtained using Equation (12). The difference between the simulated and the measured wafer deformation was checked if it was within the allowable range. As shown in
Figure 14, the simulated wafer deformation still reproduced the morphological characteristics of the measurement results. The difference in distribution is shown in
Figure 14b. The absolute deviation values of about 90% of the points are less than 0.5 μm. About 9% of the total points are over 0.5 μm but less than 1 μm. Only about 1% of the total points are over 1 μm but still less than 2 μm, which means that the calculated deformation fits well with the measured deformation considering the measurement uncertainty of the wafer deformation [
17]. It should be noted that the greatest deviation seems to be more likely to appear on the edge of the wafer since edge roll-off would occur in the polishing process and the error of measured deformation in vicinity of the wafer edge would be a little larger.
Compared with the results in
Figure 3 without using the regularization method, the difference was a little larger but the extent was very limited. That was because the solution
x was not an unbiased estimator anymore after introducing the term ‖
Lx‖
2 in Equation (7). The introduction of regularization methods reduced the variance at the cost of introducing a tolerable amount of bias. As shown in
Figure 15, the calculated residual stress distribution presented the feature of being continuous between adjacent subareas. Besides, the solution turned to be more robust to measurement errors. The residual stress distribution was recalculated from the wafer deformation (
Figure 5b) with random errors, which was shown in
Figure 16. It was found that the change of the solution
x was very limited, proving the validity of the regularization method.
The residual stress was also calculated using the regularization method when the division number was 525 (
Figure 17). The characteristic of the residual stress distribution was almost the same as that when the division number was 336 (
Figure 15), which indicated that the regularized method with continuity constraints was robust to the number of subareas of the wafer. A larger number of subareas was preferred since it was closer to the actual situation but it would increase the computational workload to get the solution. When the increase of the number of subareas does not improve the solution evidently, it means that the differentiation of the wafer surface is enough to character the true status of residual stress distribution.
The Tikhonov matrix
L could also be an Identify matrix
I to improve the solution result [
18]. However, the effect of the positional relationship between different
xis is neglected and the improvement of the continuity of the residual stress distribution across the wafer is limited. Besides, the imposing of the side constraint
Ix tends to lead to smaller residual stress values than the actual scenarios since the constraint term ‖
Ix‖
2 equals to be the Euclidean norm of vector
x. The absolute values of
xis tend to be smaller to minimize the term
μ‖
x‖
2 of the total sum when using an Identify matrix
I as the Tikhonov matrix, especially when the trade-off parameter
μ is large. One extreme case is that
xis would be all zeros when the parameter
μ is infinitely great to minimize the sum of ‖
Ax −
t‖
2 and
μ‖
x‖
2. The calculated
xis using the proposed method in this study could maintain its original accuracy for various wafers since the value of the ‖
Lx‖
2 has no direct correlation with the absolute residual stress values.