3.1. CFD Modeling
From the literature (e.g., [
22] and beyond), it is known that typical temperature gradients in the GaAs melt and crystal in the industrial VGF processes are in the range of 2–10 K/cm and up to 15 K/cm, respectively. The crystal growth rate is typically in the range of 2–4 mm/h. The maximal temperature in the GaAs should not exceed 15 K above the melting temperature of GaAs (ca. T = 1528 K) in order to avoid a great loss of arsenic [
2]. Guided by these facts, 130 process recipes were simulated and values of interface deflections, interface position and temperatures at the monitoring points at the GaAs seed bottom, end of cone and at the melt-free surface (
Figure 1b) were collected. All data in the form of 11-dimensional datasets are shown in parallel coordinates in
Figure 3. Each line in the plot corresponds to one data set (x
1…x
6, y
1…y
5). The generated database was used for DM/ML training and analysis.
Examples of axisymmetric quasi steady-state CFD simulation results for buoyancy-driven flows in the form of temperature and stream function distributions in the VGF-furnace are shown in
Figure 4. Due to the cylindrical geometry of the crucible, the melt flow was always toroidal, varying between multi-vortices (
Figure 4b,c) and single vortex (
Figure 4a) velocity distribution. Interface deflection varied between convex (
Figure 4a), flat and concave (
Figure 4b,c), depending on the used growth recipe.
As expected, our results confirmed the fact that generally favorable flat and/or a slightly convex s/l interface shape (y2 ≤ 0) was easier to obtain by lower growth rates x1 (
Figure 3, blue lines). On the contrary, with an increase of the crystal growth rate (
Figure 3, red lines), more latent heat is generated at the s/l interface and interface deflection turns toward concavity (y2 > 0).
Obviously, the search for the optimal VGF process parameters is a difficult task that requires advanced statistical methods.
3.2. Data Mining
As mentioned before, PCA-biplot was used to visualize training data dispersion and variables correlation and to show the feasibility of dimensionality reduction without loss of information. The findings are given in
Figure 5 and summarized below.
Since the two main components’ combined contribution to the variance is only 60.6%, i.e., not high enough, further discussion about the projections of original variables into the 2-dim subspace and corresponding correlations was not justified. However, the results showed that all vectors were present in all quadrants of the PCA-biplot and were of similar lengths, and therefore the distribution of points can be considered appropriate for further ML/DM analysis.
The k-means clustering method was applied to all 11 variables in this study. The k value varied from k = 2 to k = 10. The best clustering was observed for k = 2. Selected results for input x1 and output y1 are shown in
Figure 6. These point out that, for all growth rates x1, there are 2 clusters of data with respect to the s/l interface position y1, i.e., data corresponding to the interface position y1 in the zone of influence of the upper side heater (z > 0.423 m) behave differently compared to the data in the zone of influence of the lower side heater (z < 0.423 m). This observation was used in further correlation analysis.
Results for the correlation plots for all data (whole crystallization process) for “lower” cluster 1 and “upper” cluster 2 are shown in
Figure 7,
Figure 8 and
Figure 9, respectively.
From the analysis of the correlation coefficients for all data (
Figure 7), it is possible to derive how inputs and outputs are correlated. The most pronounced correlation among inputs was observed by x4 and x5, i.e., by the power of the side heaters. Their correlation coefficient showed that they were weakly up to medium–strongly negatively correlated, with maximal value for r
x4,x5 = −0.611. This result is in agreement with the nature of the VGF process, where the position of the crystallization front corresponds to a certain amount of the heating power and power distribution, and the growth rate determines the interface shape. Consequently, more power in the upper side heater implies less power in the lower side heater and vice versa. Concerning interface deflection y2, it is the most negative correlated by the increase of the power of upper side heater x4 (r
x4,y2 = −0.556) and the most positive correlated by the increase of the power of top inner heater x2 (r
x2,y2 = 0.411). Please note that in this study, convex interface deflection has a negative value (y2 < 0) and that a negative correlation is beneficial for the crystal quality.
For the GaAs temperature at the melt-free surface y3, the most pronounced negative correlation (weak with beneficial influence) had inputs: the power of upper side heater x4 and growth rate x1, since they decreased y3 value and therewith they limited severe As evaporation (rx4,y3 = −0.256, rx1,y3 = −0.232). The most pronounced positive correlation, but a detrimental influence on y3, had the power of the bottom heater x6 and the lower side heater x5 (rx6,y3 = 0.362, rx5,y3 = 0.165). These results can be explained if one recalls the typical heat transfer in the VGF growth, where heat is entering GaAs via melt and leaving via crystal periphery, in addition to the heat generated at the crystallization front. More heat coming from the crystallization front x1 and the upper side heater x4 means less heat coming from the top heaters x2 and x3 that are closer to the melt-free surface. More heat coming into the system from the x5 and x6 retards heat transfer out of the GaAs and consequently triggers the rise of the temperature y3.
From the analysis of the correlation coefficients for the “upper” data cluster 2 (
Figure 8), interface deflection y2 was the strongest negative correlated by the power of upper side heater x4 (r
x4,y2 = −0.772) and the strongest positive correlated by the power of lower side heater x5 (r
x5,y2 = 0.743). The first result was identical to the result for the whole VGF process, i.e., for all data. The second result differed. It can be understood by remembering that the “upper” data cluster is related to the second half of the crystallization process with the crystallization front positioned above the lower side heater. In this case, x5 was bringing heat to the GaAs crystal that was harmful for y2.
For the temperature at the melt-free surface y3, the most pronounced negative and positive correlation were achieved by x4 (rx4,y3 = −0.255) and x6 (rx6,y3 = 0.498), respectively. These findings are similar to the case when data for the whole process are considered.
From the analysis of the correlation coefficients for the “lower” data cluster 1 (
Figure 9), interface deflection y2 was influenced detrimentally in a similar strength by the x4, x1 and x2 (r
x4,y2 = 0.431, r
x1,y2 = 0.421, r
x2,y2 = 0.411) and the most beneficially influenced by the power of lower side heater x5 (r
x5,y2 = −0.777). The results are different in comparison to the result for the whole VGF process, i.e., for all data. This can be explained by the fact that the “lower” data cluster was related to the first half of the crystallization process, with the crystallization front positioned sidewise on the lower side heater and far away down from the top and upper side heaters.
For the temperature at the melt-free surface y3, the most beneficial and detrimental influence had x5 (rx5,y3 = −0.470) and x2 (rx2,y3 = 0.532), respectively. The greater heat was coming from the GaAs side periphery (x5), and less heat was coming from the top (x2) so that, consequently, y3 decreases.
Interestingly, the analysis of all correlation plots pointed out that the power of side heaters had a much stronger influence on the interface shape and maximal GaAs temperature than solely the crystal growth rate.
3.3. Decision Trees
As mentioned before, a successful VGF process is characterized inter alia by a flat crystallization front during the growth and constrained maximal temperatures in the melt to prevent strong arsenic evaporation/loss. The purpose of the DT analysis was to better understand the role of various process parameters and to identify their suitable values for the growth of high-quality crystals.
The most important DT results for both regression (RT) and classification trees (CT) are given in
Figure 10,
Figure 11 and
Figure 12 with errors and summarized results in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5. The errors are given in
Table 1 and
Table 4, in the form of the root mean square error RMSE corresponding to the nodes in the regression tree. The root node in the regression tree stands for the average value of the studied output among all data in the database. The path from the root to leaf states the influence of a certain input on the studied output, with the highest relevance at the top, decreasing downwards.
The resulting RT for interface deflection y2 is shown in
Figure 10. It reveals x2, followed by x4 and x1, as the most decisive inputs for the favorable flat or slightly convex interface shape (interface deflection y1 ≤ 0, the branches marked red in the tree graph). Their significance decreases in the order mentioned above. The most decisive input is x2 (the heating power of the inner top heater) that has a deteriorating effect on the interface flattening, i.e., x2 should be below 678 W to strongly push y2 towards lower values (less concavity), i.e., from average y2 = 0.00272 m to average y2 = 0.00188 m. All decisive inputs and ranges of their optimal values that assure the VGF-GaAs growth with flat or slightly convex s/l interface (y2 ≤ 0 m), derived from RT analysis, are given in
Table 2. For the fast growth of GaAs crystals (growth rate > 3 mm/h) with a flat or slightly convex interface, the process heat should be provided predominantly from the upper side heater (x4), while the bottom and lower side heater should be turned off. Moreover, the inner top heater should provide only a very limited amount of heat to the system. Obtained RT results are consistent with the findings from our correlation analysis, remembering the fact that the most influential input in RT doesn’t mean that its influence is necessarily beneficial for the variable y2. In the literature and among the crystal growers, there is a common opinion that the growth rate x1 has the strongest influence on the interface deflection (i.e., the higher the growth rate, the more generated latent heat at the crystallization front and consequently more concave s/l interface shape). Here obtained RT results do not refute the strong influence of x1. They mean only that other inputs outperformed the importance of x1 for y2.
The same optimization task for y2 was solved using a classification tree. The target variable y2 consists further of real numbers and the classification task was performed associating these numbers with labels “+” for concave interface (y2 > 0) and “-“ for convex interface (y2 < 0). The CT results for interface deflection y2 are given in
Figure 11, and the corresponding most influential inputs and their ranges for optimized growth are given in
Table 3. Except for x3, all other inputs played some role. The most influential input for obtaining the convex interface was x4, followed by x1 and x5. As with the RT, the results showed that during the rapid growth of crystals with a slightly convex interface, the process heat should mainly be provided by the upper side heater (x4), while the lower side heater should be almost switched off.
The resulting RT for the temperature at the melt top rim y3 is showed in
Figure 12. The most decisive inputs and the range of their optimal values that prevent great loss of arsenic are given in
Table 5. The data ranges correspond to the red marked branches in
Figure 12.
The range of suitable parameters vary depending on the chosen maximal allowed temperature value, e.g. either 1520 K or even more conservative 1510 K. Still, all inputs x1-x6 played the role. The most influential input is x6, followed by x4. Other inputs are less important and appear at the lower position in the tree. The increase of the initial average value of y3 from 1520 K→1530 K after the first split, by the increase of the power of the bottom heater x6, showed their positive correlation, but detrimental influence. On the contrary, higher values of the x4 (x4 > 3070 W) after the second split caused the decrease of the average y3 values from 1520 K→1510 K and confirmed their negative correlation (with beneficial influence) observed by the correlation plots. As with the RT for y2, the results of RT analysis for y3 showed again that during the optimized rapid growth of crystals (x1 > 3.28 mm/h) without great loss of arsenic, the process heat should mainly be provided by the upper side heater (x4), while the lower side and bottom heaters (x5 and x6) should be almost switched off (
Table 5).
In summary, our RT and CT analysis revealed the key process parameters, their importance and the ranges of their values for achieving beneficial conditions for the VGF-GaAs growth.
To compare the DM and ML techniques used here, it is important to note that the DM techniques measured the relationship between one pair of variables among the inputs and outputs (x1…x6, y1…y5), while ML/DT measured the relationships between all inputs and one output (x1,…x6, yi) and suggested the range of their optimal values. For the simultaneous optimization of all outputs in relation to all inputs, artificial neural networks are the best choice. The latter, however, is associated with a loss of interpretability, much higher computing times and a vast amount of training data, which is often difficult to come by.