**1. Introduction**

Dendrite is an important component of solidification structure, whose morphological characteristics directly affect the performance of an alloy. Therefore, studying the growth of dendrite is helpful for researchers to optimize the process parameters and improve the performance of alloys. Experimental methods for obtaining the microstructure during solidification have been proposed, such as the electronic probe [1], synchrotron X-ray [2,3], etc. that have been developed, but cannot be applied to product at a large scale due to the expensive cost and harsh environment. Moreover, these methods are unable to show the dynamic evolution process of dendrite growth.

Over the past few decades, a considerable number of numerical models have grown up around the theme of simulating the evolution process of dendrite morphology during solidification, such as level set [4], monte carlo (MC) [5], cellular automata (CA) [6–8], phase field (PF) [9], etc. Among them, the phase field model (PF) and CA model are the most widely used. Due to complex physical equations and massive calculations, the PF model is only applied to small-scale simulations. Whereas the CA model can perform large-scale simulations and reveal the evolution of micro-mesoscopic dendrites, such as the transformation of columnar crystals into equiaxed crystals (CET) [10], the behavior of dendrite deflection in fluids [6–8], and the formation of silicon facet dendrite [11]. Influenced by grid layout and capture rules, the traditional CA model can only simulate dendrites that grow along the x axis or 45◦ with the x axis. However, during the actual solidification process of the alloy, dendrites grow with different preferred orientations, which is caused by the solute profile and can lead to macro-segregation in the cast product [12]. Therefore, it is very important for the model to be able to simulate the dendrite with different orientations. Researchers have proposed various methods to achieve multi-orient dendrite simulation, such

**Citation:** Wang, J.; Meng, H.; Yang, J.; Xie, Z. GPU-Based Cellular Automata Model for Multi-Orient Dendrite Growth and the Application on Binary Alloy. *Crystals* **2023**, *13*, 105. https://doi.org/10.3390/ cryst13010105

Academic Editor: Yuying Wu

Received: 27 November 2022 Revised: 25 December 2022 Accepted: 31 December 2022 Published: 6 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

as defining block cells [13], establishing random ZigZag neighbor cell capture rules [14], etc. These methods can only simulate a dendrite with the same orientation at the same time due to complex algorithms. Hence, Rappaz and Gandin developed the decentered square algorithm (DCSA) [15], which can simulate dendrites with different orientations simultaneously, and it has been widely used to simulate the growth of columnar and equiaxed dendrites [16–19]. However, these models suffered from velocity anisotropy, which means velocities along different orientations are asymmetrical and have multiple interfaces. Then, Luo et al. [20] modified the velocity of the interface cell through the local solute equilibrium and improved the phenomenon of multiple interfaces. Wang et al. [21]. determined the length of the square half diagonal by the preferred orientation to ensure sharpness of the interface and introduced GF to reduce velocity anisotropy.

However, the DCSA rule dynamically determines the capture position and condition by tracking the dendrite tip, which is very complex. Massive calculations of the CA model incorporating the DCSA rule make large-scale simulations very time-consuming. It is, therefore, necessary to speed up the calculation. Some methods have attempted to improve calculation efficiency, such as adaptive meshes [22,23], parallelizing the computer program using MPI technology [24], and parallel computing technology based on serial arithmetic [25]. These accelerators are CPU-based, so the speedup is not good enough for the number of cores, being limited by the hardware architecture. Unlike the CPU, GPU is characterized by integrating thousands of computer cores inside, which makes it the most effective way to achieve acceleration. With the development of GPU in general purpose computing, it has been used in various fields [26,27]. Over the past two decades, the GPUbased acceleration method has become a hotspot in computational and material science [28–30] and has made great progress in dendrite growth simulation [31–34]. However, most of these contributions focused on the PF model, which is still unable to simulate dendrite growth at large scale, even when accelerated by using GPU.

This paper proposes a high-performance CA model incorporating the DCSA capture rule to simulate dendrite growth with different orientations more efficiently. To make full use of the hardware resources and improve the calculation efficiency, parallel algorithms and a task scheduling scheme using multi-stream are proposed. Compared to the traditional CPU-CA model, this model can achieve great acceleration with a speedup of 158×. In addition, the steady-state tip velocity predicted by this work shows great agreement with the analytical value of the LGK model. The simulation results of the binary alloy by this model not only show dendrites with different orientations, but also agree well with the experimental result in situ. This model will be a promising tool for studying the dendrite growth during the large-scale solidification.

The following chapters will cover the establishment, solution, and verification of the model in detail.

#### **2. Materials and Methods**

*2.1. Description of Numerical Model*

#### 2.1.1. Heat Transfer Model

Equation (1) is used to describe heat transfer during alloy solidification.

$$p c\_p \frac{\partial T}{\partial t} = \frac{\partial}{\partial x} \left( \lambda \frac{\partial T}{\partial x} \right) + \frac{\partial}{\partial y} \left( \lambda \frac{\partial T}{\partial y} \right) \tag{1}$$

where *T* is the temperature, *fs* is the solid fraction, *ρ* is the density, *cp* is the equivalent specific heat, and *L* is the latent heat.

#### 2.1.2. Solute Distribution Model

The solute distribution model consists of solute redistribution at the interface and solute diffusion across the entire domain. For one thing, the solute redistribution is determined by Equations (2) and (3) according to local equilibrium.

$$\mathbb{C}\_s = k \mathbb{C}\_l \tag{2}$$

$$d\mathbb{C} = \mathbb{C}\_l (1 - k)\Delta f\_s \tag{3}$$

where *dC* is the residual solute expelled by the interface cell, and *k* is the coefficient of equilibrium solute partition (*k* < 1 in the present work).

First, the expelled solute will be discharged into the remaining liquid of the current cell. Then, the current *Cl* will be compared to the equilibrium solute concentration *Cl* ∗; the part beyond *Cl* <sup>∗</sup> will be discharged to adjacent liquid cells. *Cl* ∗ is calculated by Equation (4).

$$\mathbb{C}\_{l}^{\*} = \mathbb{C}\_{0} + \frac{T - T\_{0}}{m\_{l}} + \frac{\Gamma Ka(\hbar)}{m\_{l}} \tag{4}$$

where *T* is the actual temperature, *T*<sup>0</sup> is the initial temperature, *C*<sup>0</sup> is the initial solute concentration, *ml* is the liquidus slope, and *a*(*n*ˆ) is the function of interface anisotropy.

If the current cell is completely solidified, then all the excluded solute will be discharged to the adjacent liquid cells. How much solute will be discharged into an adjacent liquid cell is designed by weight, as shown in Equation (5).

$$w\_i = \frac{\mathbf{C}\_i^0 - \mathbf{C}\_I}{\sum\_{j=0}^N (\mathbf{C}\_j^0 - \mathbf{C}\_I)} \tag{5}$$

where *Ci* <sup>0</sup> − *Cl* is the solute difference between the interface cell and the adjacent liquid cell *i*, *N* is the number of liquid neighbours.

For another, the solute diffusion coefficient in liquid is three to four orders of magnitude larger than that in solid, so diffusion in solid phase is neglected in this paper. Equation (6) shows the diffusion of a solute in liquid and interface.

$$\frac{\partial \mathbf{C}\_{l}}{\partial t} = \frac{\partial}{\partial \mathbf{x}} \left( D\_{i} \frac{\partial \mathbf{C}\_{l}}{\partial \mathbf{x}} \right) + \frac{\partial}{\partial y} \left( D\_{i} \frac{\partial \mathbf{C}\_{l}}{\partial y} \right) \tag{6}$$

where *Di* is the solute diffusion coefficient, which is governed by Equation (7).

$$D\_l = f\_5 D\_s + (1 - f\_5) D\_l \tag{7}$$

where *Dl* and *Ds* are the diffusion coefficient in liquid and solid, respectively.

#### 2.1.3. CA Model

In the traditional CA model, only fully solidified cells (*fs* = 1) can capture their liquid neighbors. This capture rule is severely affected by the grid layout, so it can only simulate dendrites growing along the x axis and 45◦ with the x axis. However, the growth orientation of dendrite is chaotic in the actual solidification process of the alloy. In order to simulate dendrite growth closer to reality, the decentered square rule for capturing has been adopted by the present CA model. As shown in Figure 1, the key to the improved algorithms is dynamically calculating the nucleation position and nucleation conditions, according to the position where the interface cell (parent cell) reaches the liquid cell. All cells in the entire computing domain are initialized to liquid (*fs* = 0). As the temperature decreases, the one who receives the nucleation conditions will receive the chance to nucleate. The nucleation probability is calculated by Equation (8) combined with random probability.

$$\frac{dn}{d\Delta T} = \frac{n\_{\text{max}}}{\sqrt{2}\exp\left[-\frac{1}{2}\left(\frac{\Delta T - \Delta T\_N}{\Delta T\_V}\right)\right]}\tag{8}$$

where Δ*T* is the undercooling, *nmax* is the maximum density of nuclei given by the integral of undercooling (from 0 to Δ*T*) of Equation (8), Δ*TN* is the average nucleation undercooling, and Δ*T<sup>σ</sup>* is the standard deviation. Once a cell is nucleated, its solid fraction will be changed from 0 to 1, and a decentered square with an angle *θ* between diagonal and x axis will be located at its center. Additionally, four corners of this square penetrate the nearest liquid neighbors, capturing them in the corners and changing them into interface cells. Captured cells will inherit the growth properties of the parent cell and grow along the diagonal to capture its liquid neighbors until it is completely solidified.

**Figure 1.** The diagram of the decentered algorithm (*θ* is the angle between diagonal and x axis).

In order to simulate the dendrite morphology in detail, the local level rule method is used to calculate the increment of the solid fraction [35], wherein Δ*fs* of each time step is determined by Equation (9). The length of the primary arm of the dendrite is updated by Equation (10).

$$
\Delta f\_s = \frac{\left(\mathbf{C}\_l^\* - \mathbf{C}\_l\right)}{\left(\mathbf{C}\_l^\*(1-k)\right)} GF \tag{9}
$$

$$l(t + \Delta t) = l(t) + \Delta f\_s l\_{\max} \tag{10}$$

*lmax* is the maximum half-diagonal length determined by Equation (11).

$$l\_{\max} = l \left(\frac{\Delta x}{\max(\sin \theta, \cos \theta)}\right) \tag{11}$$

Δ*x* is the size of the cell, and *GF* [36] is the geometry factor related to the state of neighbors, which is introduced to avoid multiple interfaces and is limited to no more than 1.

$$GF = \min\left\{1, G\left(\sum\_{m=1}^{4} S\_m^\mathrm{I} + \sqrt{2}\sum\_{m=1}^{4} S\_m^\mathrm{II}\right)\right\} \tag{12}$$

where *G* is an adjustable factor, *sm*<sup>I</sup> and *sm*II are the state of the nearest neighbor and the second nearest neighbor, respectively, and both of them have two values, as shown in Equation (13).

$$\mathbf{S}\_{m'}^{\mathrm{I}} \mathbf{S}\_{m}^{\mathrm{II}} = \begin{cases} \mathbf{0}, \, f\_{\mathrm{s}} < 1 \\ \mathbf{1}, \, f\_{\mathrm{s}} = 1 \end{cases} \tag{13}$$

#### *2.2. Parallel Solver Based on GPUq*

To simulate the morphology of dendrite with clequeqear secondary arms, the cell size is generally divided into 1~2 μm. There will be huge calculations when conducting large-scale simulations, which is time-consuming using CPU. In addition, the purpose of the simulation is to optimize the process parameters by performing many groups of experiments. Therefore, it is necessary to accelerate the calculation. This paper designs a GPU-based parallel solver for acceleration.
