Next Article in Journal
Reinforcement-Learning-Based Trajectory Learning in Frenet Frame for Autonomous Driving
Previous Article in Journal
Structural Design and Static Stiffness Optimization of Magnetorheological Suspension for Automotive Engine
Previous Article in Special Issue
Infinite Series Based on Bessel Zeros
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Parallel Algorithm Based on Regularized Lattice Boltzmann Method for Multi-Layer Grids

1
College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
2
School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2024, 14(16), 6976; https://doi.org/10.3390/app14166976
Submission received: 28 June 2024 / Revised: 30 July 2024 / Accepted: 6 August 2024 / Published: 8 August 2024
(This article belongs to the Special Issue Applied Computational Fluid Dynamics and Thermodynamics)

Abstract

:
The regularized lattice Boltzmann method (RLBM) is an improvement of the lattice Boltzmann method (LBM). The advantage of RLBM is improved accuracy without increasing computational overheads. The paper introduces the method of multi-layer grids, the multi-layer grids have different resolutions which can accurately solve problems in computational fluid dynamics (CFD) without destroying the parallelism of RLBM. Simulating fluid flow usually requires a large number of grid simulations. Therefore, it is necessary to design a parallel algorithm for RLBM based on multi-layer grids. In this paper, a load-balancing-based grid dividing algorithm and an MPI-based parallel algorithm for RLBM on multi-layer grids are proposed. The load balancing-based grid dividing algorithm ensures that the workload is evenly distributed across processes, minimizing the discrepancies in computational load. The MPI-based parallel algorithm for RLBM on multi-layer grids ensures accurate and efficient numerical simulation. Numerical simulations have verified that the proposed algorithms exhibit excellent performance in both 2D and 3D experiments, maintaining high stability and accuracy. The multi-layer grids method is significantly better than single-layer grids in terms of CPU runtime and number of grids required. Comparative analysis with the OpenMP multi-threading method on the multi-layer grid RLBM shows that the proposed algorithm in this paper achieves superior speedup and efficiency.

1. Introduction

The lattice Boltzmann method (LBM) as a mesoscopic scale computational fluid dynamics method [1] has been widely recognized as an effective means of dealing with complex fluid flows [2]. For example, simulating the fluid flow of an infinitely wide wedge [3], coupled conduction and radiative heat transfer in composite materials [4] or indoor airflow [5].
The advantages of LBM are easily handling complex boundaries [6] and high parallelism [7]. However, LBM faces some challenges such as numerical instability at high Reynolds numbers [8]. In order to solve this problem, the regularized lattice Boltzmann method (RLBM) was proposed [9]. The principle of RLBM is to perform a pre-collision step before the collision and streaming steps [10]. The pre-collision step restores symmetries that may not be inherently satisfied during numerical simulations [11]. RLBM maintains the high parallelism of the LBM without significantly increasing the amount of computation [12]. Latt and Chopard [9] introduced a regularisation model based on single relaxation time BGK dynamics. It was demonstrated that the RLBM reduces the computational cost and has improved accuracy and stability over the LBM. Malaspinas [13] proposed regularized boundary conditions for dealing with straight boundaries. The method is applied to standard grids (D2Q9 and D3Q19). Guo [14] discusses arbitrary surface boundary conditions for compressible flows in the context of regularized models. Otomo [15] applied regularized collision models to phase-field multiphase flow models. Basha and Nor Azwadi [16] showed the similarity between the structure of LBM and RLBM. The RLBM has the properties of easily handling complex boundary conditions and high parallelism [17].
RLBM typically uses a uniform grid for numerical calculations. The fact that the accuracy of the grid cannot be flexibly adjusted limits the application of RLBM in some fields. In numerical simulations, there are regions where flow details need to be finely captured. Refining the entire grid would significantly increase the computational overhead [18]. Therefore, it is necessary to design a method for local grid refinement. Filippova and Hanel [19] proposed an FH method where the information of the grid is stored in the vertices. The coarse and fine grids in the FH method exchange information in a bi-directional coupled way. An adaptive mesh refinement (AMR) is proposed, which effectively reduces the loss of computational accuracy [20]. Liu et al. [21] used the AMR method in combination with the immersion boundary lattice Boltzmann method (IB-LBM) to validate the effectiveness of the method in both 2D and 3D problems. Since the distribution functions are exchangeable on different grids, Lin and Lai [22] propose a single coupling method. The method is subject to errors in some cases. Dupuis and Chopard [23] proposed a bi-directional coupled local grid refinement method, which solves the error problem in an excellent way. The method can improve the computational efficiency. In addition, a buffer-based interpolation method between coarse and fine grids was proposed by Chen et al. [24]. The method effectively reduces the computation time. A method for storing grid information in the center point was proposed by Rohde et al. [25] The grid information at different sizes is stored in the grid intersections. Storing the grid information using this method can accurately and uniformly spread the grid information, which improves the accuracy of numerical calculations [26].
The RLBM with multi-layer grids uses a uniform grid for each layer, and the size of the previous grid is twice the size of the subsequent grid. In using the method to calculate equilibrium distribution functions, collision, streaming, and computing macroscopic quantities are localized. Therefore, excellent conditions are provided for the parallelization of RLBM. Hasert, M et al. [27] proposed a new parallel approach that is suitable for numerical simulations under large-scale clustering. A method of computational domain division by one and two dimensions was proposed by Kandhai et al. [28]. The method effectively improves the parallel performance and achieves accurate numerical results. Numerical experiments on multiphase flow were carried out by Pan et al. [29]. The performance differences between different domain divisions were analyzed. In engineering problems, parallel computation of LBM is also applied in different flow domains. Such as nuclear reactor fuel cooling [30] and vehicle external flow field [31]. A load-balancing method was proposed by Schornbaum et al. [32]. The method requires that all grids be distributed to all parallel domains. A parallel algorithm was designed by Abas et al. [33]. The algorithm is based on numerical experiments implemented on the Palabos library [34] with RLBM and MPI cross-node parallelism. Good results were achieved; however, the implementation details of the algorithm were not disclosed. Multi-layer grids RLBM is capable of handling large-scale computational CFD problems. Servers [35] and supercomputers [36] with multiple cores are common choices for solving problems in current scenarios.
These methods demonstrate that RLBM is a stable method. Local grid refinement of multi-layer grids can concentrate the parts that need to be computed accurately in more desired regions. RLBM with multi-layer grids is suitable for parallelization. Based on this, a load balancing-based grid dividing algorithm and an MPI-based parallel algorithm for RLBM on multi-layer grids are proposed in this paper. The load balancing-based grid dividing algorithm ensures that the workload is evenly distributed across processes, minimizing discrepancies in computational load. The MPI-based parallel algorithm for RLBM on multi-layer grids ensures accurate and efficient numerical simulation. The algorithm satisfies load-balanced grid division and MPI-based multi-processing parallel design.
The paper is structured as follows. The Section 1 is the introduction. Section 2 introduces the RLBM and gives the RLBM evolution process for multi-layer grids. Section 3 proposes load balancing-based grid dividing algorithm and MPI-based parallel algorithm for RLBM on multi-layer grids. Numerical experiments and parallel performance analysis are in Section 4. Conclusions are in Section 5.

2. RLBM Based on Multi-Layer Grids

2.1. Governing Equation for RLBM

The advantage of RLBM is that it can improve stability and accuracy without materially increasing computational effort. The idea of the regularized model is to introduce a pre-collision distribution function in front of the collision and streaming.
The evolution equation for the RLBM is as follows:
f i ( r + e i δ t , t + δ t ) = ( 1 1 τ ) f i 1 ( r , t ) + f i e q ( r , t )
where f i is the distribution function [37] at site r and time t, f i 1 is the first-order distribution function. δ t is the lattice time step. τ is the dimensionless relaxation time, which is determined by the fluid viscosity of the fluid. e i is the discrete velocity.
The distribution function satisfies the following relationship:
f i f i e q = f i neq
where f i e q is the equilibrium distribution function and f i neq is the non-equilibrium distribution function. The explicit formula for f i neq is in Appendix A. f i neq is approximately equal to the f i 1 . The first-order distribution function f i 1 has the following relationship:
f i neq f i 1 = ω i 2 c s 4 α β neq Q i α β
where α β neq is the stress tensor. α β neq satisfies the following relationship:
Π α β neq = Π α β f i e q e i α e i β
The equilibrium distribution function f i e q can be obtained from the following equation:
f i e q = ρ w i 3 c s 2 e i · u + 9 2 c s 4 ( e i · u ) 2 3 2 c s 2 u · u + 1
where ρ is the macroscopic density, c s is the lattice sound velocity, c s = 1 / 3 . ω i is the weighting factor of the corresponding grid direction, u is the macroscopic velocity.
As shown in Figure 1, this paper uses the D2Q9 model to solve the two-dimensional problem.
The weighting factor ω i is represented as follows:
ω i = 4 / 9 , 1 / 9 , 1 / 36 , i = 0 , i = 1 , 2 , 3 , 4 , i = 5 , 6 , 7 , 8 .
The discrete velocity e i is represented as follows:
e i = e 0 , e 1 , , e 8 = 0 1 0 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1
As shown in Figure 2, this paper uses the D3Q19 model to solve the three-dimensional problem.
The weighting factor ω i is represented as follows:
ω i = 1 / 3 , 1 / 18 , 1 / 36 , i = 0 , i = 1 , 2 , , 6 , i = 7 , 8 , , 18 .
The discrete velocity e i is represented as follows:
e i = 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1
The symmetric tensor Q i α β is defined as follows:
Q i α β = e i α e i β c s 2 δ α β
In the RLBM model, the momentum flux tensor, the macroscopic velocity and the macroscopic density are given by
Π α β = i = 0 f i e i α e i β
ρ u = i = 1 e i f i = i = 1 e i f i e q
ρ = i = 0 f i = i = 0 f i e q
The evolution equation consists of two parts:
Collision:
f i + ( r , t ) = ( 1 1 τ ) f i 1 ( r , t ) + f i e q ( r , t )
Streaming:
f i ( r + e i δ t , t + δ t ) = f i + ( r , t )
In this paper, the boundary processing uses two formats: straight boundary [38] and YMS format [39]. In multi-layer grids, RLBM, the innermost grid surrounds the entire object. Therefore, it can be viewed as using a single-layer grid for curved boundaries using the YMS format. The YMS format can be directly applied to the multi-layer grids RLBM program.
The evolution of the RLBM is depicted in Figure 3.

2.2. RLBM Based on Multi-Layer Grids

To facilitate the description of the model, the 2D grid is depicted in Figure 4.
There are three layers of grids in Figure 4. The light blue frame represents the first layer, the brown frame indicates the second layer and the dark blue frame corresponds to the third layer. The two black circles denote the solid boundaries within the flow field. The first layer grid is half the size of the second, and this pattern continues.
To facilitate grid division and parallel computation, this paper adopts a center-point format for storing grid point information. The resulting storage format is similar to a fork-tree structure [12], which facilitates programming to process.
In a locally refined multi-layer grid structure, the ratio of the coarse grid size to the fine grid size is set to n = δ r c / δ r s = 2 , where c represents a coarse grid and s represents a fine grid. Thus, in a 2D problem, a coarse grid can be subdivided into four fine grid cells. To ensure fluid viscosity consistency throughout the flow field, the following relationships need to be satisfied:
δ t s = 1 n δ t c , τ s = 1 2 + n τ c 1 2
Velocity continuity needs to be satisfied at the intersection interface of grids of different sizes. Therefore, it must be satisfied that the equilibrium state distribution equations have the same value on the grid intersections. The following relationship is obtained:
f i , c neq = n τ c τ s f i , s neq
A local grid refinement method based on distribution functions and bi-directional coupling was developed by Dupuis and Chopard [23]. In combination with the evolution of the RLBM, the paper establishes the relationship between the fine grid and the coarse grid.
f i , c + = f i , s eq + n τ c 1 τ s 1 ( f i , s f i , s eq )
f i , s + = f i , c eq + τ s 1 n ( τ c 1 ) ( f i , c f i , c eq )
where f i , c + is obtained by interpolating [40] the coarse grid through the fine grid and f i , s + is obtained by interpolating the fine grid through the coarse grid. To ensure the moments of the coarse and fine grids are identical, the coarse grids perform one collision and streaming, while fine grids require two collisions and streamings.
The interpolation process between grids of different scales is shown in Figure 5. Firstly the grid information needs to be initialized and a pre-collision step is performed. Subsequently, the coarse grid undergoes one collision and streaming, while the fine grid undergoes two collisions and streamings. An interpolation between fine and coarse grids is conducted. This process is repeated until termination.

3. Parallel Strategy and Algorithm

This paper describes the division of the grid in the two-dimensional case. A load balancing strategy [34] prevents a process from going into an idle state while waiting for data from other processes. When performing simulations, all grid cells are assigned to available processes. The presence of idle processes is prevented. The effective assignment of grids to processes is essential for efficiency improvements.
In the simulation scenario, the grid for each layer is stored in a one-dimensional array. This array stores the number of grids involved in each column of the evolution step for the grid region. The one-dimensional array is partitioned according to the number of processes.
Firstly, this paper separates the different grid layers into individual dimensions. Divide the various layers of the grid into multiple subdomains along the same direction. Subdomains with the same number of processes in different layers are formed into a separate computational region. At the same time, data exchange between boundary grids basically occurs only between neighboring processes. In this paper, the grid division algorithm is designed in Algorithm 1. The flowchart for grid division is shown in Figure 6.
Algorithm 1 Load Balancing-Based Grid Dividing Algorithm
Input: Number of grid layers: N; Number of processes: n p ;
Output: File after grid division.
 1:
int C o l [ N ] [ R ] = 0 , T o t a l [ N ] [ R ] = 0 ;
 2:
// R is the number of columns in the current layer grid.
 3:
// C o l [ N ] [ R ] represents how many grids are stored in each of the columns 1 to R from layer 1 to layer N.
 4:
// T o t a l [ N ] [ R ] represents how many grids are stored in column 1 to the current column.
 5:
for int k = 0 to N 1  do
 6:
    for int j = 0 to R 1  do
 7:
         C o l [ k ] [ j ] = l ; //l is the number of grids in the j t h column of the k t h layer.
 8:
    end for
 9:
end for
10:
for int k = 0 to N 1  do
11:
    for int j = 0 to R 1  do
12:
         T o t a l [ k ] [ j ] = T o t a l [ k ] [ j 1 ] + C o l [ k ] [ j ] ;
13:
    end for
14:
end for
15:
for int i d = 0 to n p 1  do
16:
    for int k = 0 to N 1  do
17:
        for int j = 0 to R 1  do
18:
           if  T o t a l [ k ] [ j ] > T o t a l [ k ] [ R 1 ] ( i d + 1 ) / n p  then
19:
               int a = T o t a l [ k ] [ j ] T o t a l [ k ] [ R 1 ] ( i d + 1 ) / n p ;
20:
               int b = T o t a l [ k ] [ R 1 ] ( i d + 1 ) / n p T o t a l [ k ] [ j 1 ] ;
21:
               if a < b then
22:
                   Set the process number in this column to i d ;
23:
                   ++ i d ;
24:
               else
25:
                   ++ i d ;
26:
                   Set the process number in this column to i d ;
27:
               end if
28:
           else
29:
               Set the process number in this column to i d ;
30:
           end if
31:
        end for
32:
    end for
33:
end for
The division of the multi-layer grid is shown schematically in Figure 7.
The three-layer grid structure of a two-dimensional flow over two circular cylinders is taken as an example in this paper. Using the multi-layer grids RLBM, all grid points within the same layer have similar computational demands. Coarse grid points undergo half the computations compared to fine grid points. The grid area in Figure 7 consists of three distinct grid sizes. Based on the division idea in Algorithm 1, a parallel grid division was performed. Grids of varying resolutions are partitioned into four equal segments. Each MPI process manages a segment of each grid layer. The data transfer method utilizing this parallel partitioning is illustrated in Figure 7. Red denotes process 0, yellow denotes process 1, purple denotes process 2, and blue denotes process 3.
The data-transfer method of the multi-layer grid is shown schematically in Figure 8.
Under this parallel strategy, each process exchanges data solely with two neighboring processes. The parallel algorithm proposed in this paper relies on the grid division algorithm within the computational domain. After dividing the grid into sections, the data at the grid junctions include data for evolution and data for interpolation. The data used for evolution are from the same layer of grid connections, the outermost vertical space on the grid. The data used for interpolation are from the intersection of different layers of the grid. These data are used to complete the interpolation between coarse and fine grids. Both the data used for transfer and the data used for interpolation contain information about the neighboring grids and the values of the distribution function. Parallel processing can be carried out after data transfer of interpolated and transmitted data. In this section, Algorithm 2 gives an MPI-based parallel algorithm for RLBM on multi-layer grids.
To prevent deadlocks from occurring, each grid performs separate collisions, streaming, and computation of macro-physical quantities within the computational domain for each process. Neighboring processes pass the information on the boundary grid at the grid junction and then compute. This optimizes communication efficiency. Each process first performs a local computation. Finally, the results of the computations in each region are restored to the results of the complete original problem.
Algorithm 2 MPI-based parallel algorithm for RLBM on multi-layer grids
  • Input: Grid files after division, total number of MPI processes n p , number of grid layers N, Initial information of the flow field.
  • Output: Evolution results.
  •      1. Calculate the number of grid points in each layer of the multi-layer grids.
  •      2. The multi-layer grids are uniformly divided into n p copies of grid regions according to Algorithm 1.
  •      3. Allocate a grid area to each process in the MPI.
  •      4. From layer N to layer 1, each process executes the RLBM based on multi-layer grid.
  •      5. Once the grid calculations are complete at each layer, a synchronisation operation is performed.
  •      6. Determine whether the exit condition has been met, if not, continue with step 4 and step 5.
  •      7. Output calculation results.

4. Numerical Simulations

This section provides two types of numerical validation: flow over two circular cylinders in the 2D case and flow around a sphere in the 3D case. The parallel performance of 3D numerical experiments is evaluated and analyzed.

4.1. Flow over Two Circular Cylinders in a 2D Case

To demonstrate the precision of the parallel algorithm of RLBM based on multi-layer grids, the flow over two circular cylinders at different spacing ratios was simulated. Two cylinders of the same size are placed in the channel. The diameter of the cylinder is defined as D. The Reynolds number was set the same in all scenarios of this experiment at Re = 200. The length of the channel is 80D and the width of the channel is 60D. The channel’s inlet velocity is U (U = 0.1). The front cylinder is located 20D from the left boundary. The back cylinder is positioned at a distance L from the front cylinder. This experiment investigates the impact on the flow field as the position of the rear cylinder changes. The distance L between two cylinders is taken as 1.5D, 2D, 3D, and 4D. A schematic of the simulation scenario is shown in Figure 9.
The fluid region grid is set up as follows: The size of the refined two-layer grid area is 50 D × 35 D . and 20 D × 15 D . , respectively. Each layer’s grid size is half that of the preceding layer. Boundary grid refinement was performed around the two cylinders to capture flow details. The three-layer grid structure is shown in Figure 10.
The flow field is analyzed for different spacing ratios as follows: As the evolutionary steps increase, the vortices between the cylinders gradually stabilize. The streamlines of the upstream cylinders closely follow the downstream cylinders. A symmetrical vortex pair was created between the two cylinders. The vortex of the downstream cylinder could not form a stable vortex pair due to the interference of the upstream cylinder, as shown in Figure 11, Figure 12, Figure 13 and Figure 14. The different colors of velocity contours represent different ranges of velocity values.
The vortex between the two cylinders cannot be stable and symmetrical when the spacing ratio (L/D = 3). The existence of critical spacing is proved [41].
In this paper, the lift coefficients C L and the drag coefficients C D are used to verify the accuracy of the algorithm when performing numerical calculations.
C D = F D 1 2 ρ U 2 D
C L = F L 1 2 ρ U 2 D
Table 1 and Table 2 demonstrate the results obtained using the algorithm of this paper in comparison with the published literature [41,42,43].
Figure 15, Figure 16, Figure 17 and Figure 18 represent different spacings of the two cylinders. The cyclic variation of the lift coefficients C L and drag coefficients C D of the fluid flowing through the cylinders with time iteration steps.
In Figure 19, the number of grids and CPU runtime are compared for a single-layer grid and a three-layer grid. The CPU time for a three-layer grid is only 1/8th of the original, while the number of grids to be computed is 1/4th of the original. The results demonstrate that the multi-layer grids regularized lattice Boltzmann method (RLBM) exhibits high accuracy and stability.

4.2. Flow around a Sphere in 3D Case

To demonstrate the precision of the parallel algorithm of RLBM based on multi-layer grids, flow around a sphere was simulated. To place a sphere in a closed box. The radius of the sphere is defined as R. The box size is 16 R × 16 R × 25 R . The inlet flow velocity is U(U = 0.1). The setup of the simulation scene is shown in Figure 20. A three-layer grid was used to perform the numerical calculations. The dimension of the first layer of the grid is 1/6R. The size of the second layer of the grid region is 10 R × 10 R × 15 R . The size of the third layer of the grid area is 6 R × 6 R × 10 R .
In this section, experiments were conducted with four sets of Reynolds numbers of 50, 100, 150, and 200. The length X S of the recirculation region is different under different conditions. The flow diagrams at different Reynolds numbers are shown in Figure 21, Figure 22, Figure 23 and Figure 24. The length X S of the recirculation region under different conditions is compared with the published literature [44,45,46] in Table 3.
In this paper, the drag coefficients C D and length X S of the recirculation region are used to verify the accuracy of the algorithm when performing numerical calculations.
C D = 8 F x ρ π U 2 D 2
Table 3 demonstrates the results obtained using the algorithm of this paper in comparison with the published literature [44,45,46].

4.3. Performance Evaluation

In this section, we evaluate the algorithm proposed in this paper in terms of both speed-up ratio and efficiency. The scenario configuration for the simulation in this section is the same as in Section 4.2. This section evaluates parallel performance using speedup ratio and efficiency. The speedup ratio is defined as follows:
S p e e d u p = T s e r i a l T p a r a l l e l ( p )
where T s e r i a l is the serial execution time and T p a r a l l e l ( p ) is the parallel execution time using p cores.
The efficiency is defined as follows:
E f f i c i e n c y = S p e e d u p p
The speedup ratio and efficiency of the parallel algorithm proposed in this paper at different numbers of cores is demonstrated in Table 4.
In order to facilitate the comparison of the performance of the algorithms in this paper, parallel computation is performed using OpenMP under the same experimental conditions and scenarios. The speedup ratio and efficiency of the OpenMP at different numbers of cores is demonstrated in Table 5.
The performance of the multi-layer grids RLBM algorithm using the parallel algorithm proposed in this paper is compared with the multi-layer grids RLBM algorithm using OpenMP multi-threaded computation. The comparison of speed-up ratios is shown schematically in Figure 25. The comparison of efficiency is shown schematically in Figure 26.
It is clear that the parallel algorithm proposed in this paper exhibits a significant speed-up ratio for all numbers of cores. For example, in the case of 16 cores, the algorithm achieves a speed-up ratio of 9.05, while the speed-up ratio using OpenMP is only 4.621. It shows that the algorithm proposed in this paper is able to make better use of parallel computing resources and achieve higher computational efficiency when scaling up to more processor cores. As the number of processor cores increases, the speed-up ratio of MPI increases significantly more than that of OpenMP, demonstrating excellent scalability.
The algorithm proposed in this paper is significantly more efficient than OpenMP for all cores. In the two-core case, the MPI is 87% efficient compared to 73% for the OpenMP. This trend is more pronounced as the number of cores increases. It shows that the algorithm in this paper is better able to maintain efficient load balancing and low communication overhead when increasing the number of cores. Although the efficiency of the MPI algorithm decreases as the number of cores increases, the decrease is significantly smaller than that of the OpenMP algorithm. The efficiency of the MPI algorithm is still maintained at a reasonable level in larger-scale parallel computations, indicating the effectiveness of its load-balancing strategy.
The rationality and efficiency of the algorithm in this paper are proved, and it can greatly reduce the computational bottleneck of individual processes and improve the overall computational efficiency. The algorithm only needs to communicate between neighboring processes and only needs to exchange information about the boundary grid. This mode of communication greatly reduces the amount of communication and allows the parallel algorithm to remain highly efficient at larger scales.

5. Conclusions

The regularized lattice Boltzmann method is an improvement based on the lattice Boltzmann method. It can improve accuracy without significantly increasing computational overheads. The multi-layer grids method can be effectively combined with RLBM. Focusing the fine grid on the parts that require more accurate calculations. The aim is to improve the computational accuracy and stability. A large number of grids is usually required for numerical calculations. RLBM has a natural parallelism; therefore, designing a parallel algorithm is necessary. In this paper, a load-balancing grid division algorithm is designed. It can achieve computational load balancing among the processes and minimize the communication overhead. Based on the above algorithm, an MPI-based parallel algorithm for RLBM on multi-layer grids is proposed. It can obtain the numerical results in a faster way. The accuracy and feasibility of the proposed method are verified by numerical experiments on flow over two circular cylinders in the 2D case and flow around a sphere in the 3D case. The experimental results show that the computation time is significantly reduced as the number of processes increases. Parallel algorithms have excellent performance in terms of speedup. In this paper, the proposed algorithm is compared with the use of OpenMP in a 3D flow around a sphere simulation scenario. The speed-up is close to 60 percent when using 16 cores. Using MPI is 28% more efficient than the parallel method using OpenMP. The algorithm performs well in large-scale computing tasks. Meanwhile, the load balancing strategy effectively reduces communication overhead, maintaining high parallel efficiency. In conclusion, the parallel algorithm on a regularized lattice Boltzmann method for multi-layer grids proposed in this paper offers an effective solution for massively parallel computing. The accuracy and stability of the algorithm will be tested in more CFD scenarios in the future, such as multiple cylinder arrangement, moving boundary cases, etc.

Author Contributions

Conceptualization, W.Z. and Y.W.; Data curation, Z.L. and Y.Z.; Formal analysis, Z.L. and Y.Z.; Methodology, Z.L. and Y.Z.; Resources, Z.L.; Software, Z.L. and Y.Z.; Supervision, Z.L., W.Z. and Y.W.; Validation, Z.L. and Y.Z.; Writing—original draft, Z.L. and Y.Z.; Writing—review and editing, Z.L., W.Z. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 42376194).

Data Availability Statement

Experimental data related to this paper can be requested from the authors by email if any researcher in need of the data, email: [email protected].

Acknowledgments

The authors would like to express their gratitude for the support of Fishery Engineering and Equipment Innovation Team of Shanghai High-level Local University.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

In the D2Q9 model, the x and y directions are considered. Combined with Equation (7) and Figure 1b, the explicit formula for the non-equilibrium distribution function f i neq can be expressed as:
f i neq = ω i 2 c s 4 u x 2 c s 2 e i x 2 + 2 u x u y c s 2 e i x e i y + u y 2 c s 2 e i y 2 ω i e i x 2 ω i e i y 2
where e i x and e i y are the components of the velocity vector e i in the x and y directions.
In the D3Q19 model, the x, y and z directions are considered. Combined with Equation (9) and Figure 2b, the explicit formula for the non-equilibrium distribution function f i neq can be expressed as:
f i neq = ω i 2 c s 4 u x 2 c s 2 e i x 2 + 2 u x u y c s 2 e i x e i y + 2 u x u z c s 2 e i x e i z + u y 2 c s 2 e i y 2 + 2 u y u z c s 2 e i y e i z + u z 2 c s 2 e i z 2 ω i ( e i x 2 + e i y 2 + e i z 2 )
where e i x , e i y and e i z are the components of the velocity vector e i in the x, y and z directions.

References

  1. Xue, X.; Yao, H.D.; Davidson, L. Synthetic turbulence generator for lattice Boltzmann method at the interface between RANS and LES. Phys. Fluids 2022, 34, 055118. [Google Scholar] [CrossRef]
  2. Lai, H.; Lin, C.; Gan, Y.; Li, D.; Chen, L. The influences of acceleration on compressible Rayleigh-Taylor instability with non-equilibrium effects. Comput. Fluids 2023, 266, 106037. [Google Scholar] [CrossRef]
  3. Jiao, C.; Leng, Z.; Zou, D.; Ta, N.; Rao, Z. Numerical research of the infinitely wide wedge flow based on the lattice Boltzmann method. Proc. Inst. Mech. Eng. J J. Eng. Tribol. 2021, 235, 343–350. [Google Scholar] [CrossRef]
  4. Tong, Z.; Li, M.; Xie, T.; Gu, Z. Lattice Boltzmann method for conduction and radiation heat transfer in composite materials. J. Therm. Sci. 2022, 31, 777–789. [Google Scholar] [CrossRef]
  5. Li, C.; Zhao, Y.; He, Y.; Luo, K.H.; Li, Y. Simulation of indoor harmful gas dispersion and airflow using three-dimensional lattice Boltzmann method based large-eddy simulation. AIP Adv. 2021, 11, 035235. [Google Scholar] [CrossRef]
  6. Lai, H.; Li, D.; Lin, C.; Chen, L.; Ye, H.; Zhu, J. Investigation of effects of initial interface conditions on the two-dimensional single-mode compressible Rayleigh–Taylor instability: Based on the discrete Boltzmann method. Comput. Fluids 2024, 277, 106289. [Google Scholar] [CrossRef]
  7. Liu, Z.; Ruan, J.; Song, W.; Zhou, L.; Guo, W.; Xu, J. Parallel Scheme for Multi-Layer Refinement Non-Uniform Grid Lattice Boltzmann Method Based on Load Balancing. Energies 2022, 15, 7884. [Google Scholar] [CrossRef]
  8. Shu, C.; Niu, X.D.; Chew, Y.T.; Cai, Q.D. A fractional step lattice Boltzmann method for simulating high Reynolds number flows. Math. Comput. Simul. 2006, 72, 201–205. [Google Scholar] [CrossRef]
  9. Latt, J.; Chopard, B. Lattice Boltzmann method with regularized pre-collision distribution functions. Math. Comput. Simul. 2006, 72, 165–168. [Google Scholar] [CrossRef]
  10. Liu, Z.; Li, Y.; Song, W. Regularized lattice Boltzmann method parallel model on heterogeneous platforms. Concurr. Comput. Pract. Exp. 2022, 34, e6875. [Google Scholar] [CrossRef]
  11. Liu, Z.; Chen, Y.; Xiao, W.; Song, W.; Li, Y. Large-Scale Cluster Parallel Strategy for Regularized Lattice Boltzmann Method with Sub-Grid Scale Model in Large Eddy Simulation. Appl. Sci. 2023, 13, 11078. [Google Scholar] [CrossRef]
  12. Liu, Z.; Zhao, Y.; Shi, S.; Wang, Y. A regularized lattice Boltzmann method based on the multi-layer grid using a buffer scheme. Int. J. Modern Phys. C 2024, in press. [CrossRef]
  13. Malaspinas, O.; Chopard, B.; Latt, J. General regularized boundary condition for multi-speed lattice Boltzmann models. Comput. Fluids 2011, 49, 29–35. [Google Scholar] [CrossRef]
  14. Feng, Y.; Guo, S.; Jacob, J.; Sagaut, P. Solid wall and open boundary conditions in hybrid recursive regularized lattice Boltzmann method for compressible flows. Phys. Fluids 2019, 31, 126103. [Google Scholar] [CrossRef]
  15. Otomo, H.; Zhang, R.; Chen, H. Improved phase-field-based lattice Boltzmann models with a filtered collision operator. Int. J. Modern Phys. C 2019, 30, 1941009. [Google Scholar] [CrossRef]
  16. Basha, M.; Nor Azwadi, C.S. Regularized lattice Boltzmann simulation of laminar mixed convection in the entrance region of 2-D channels. Heat Transf. A Appl. 2013, 63, 867–878. [Google Scholar] [CrossRef]
  17. Agarwal, A.; Gupta, S.; Prakash, A. A comparative study of three-dimensional discrete velocity set in LBM for turbulent flow over bluff body. J. Braz. Soc. Mech. Sci. Eng. 2021, 43, 39. [Google Scholar] [CrossRef]
  18. Liu, Z.; Li, S.; Ruan, J.; Zhang, W.; Zhou, L.; Huang, D.; Xu, J. A new multi-level grid multiple-relaxation-time lattice Boltzmann method with spatial interpolation. Mathematics 2023, 11, 1089. [Google Scholar] [CrossRef]
  19. Filippova, O.; Hanel, D. Grid refinement for lattice-BGK models. J. Comput. Phys. 1998, 147, 219–228. [Google Scholar] [CrossRef]
  20. Guzik, S.M.; Weisgraber, T.H.; Colella, P.; Alder, B.J. Interpolation methods and the accuracy of lattice-Boltzmann mesh refinement. J. Comput. Phys. 2014, 259, 461–487. [Google Scholar] [CrossRef]
  21. Liu, Z.; Tian, F.B.; Feng, X. An efficient geometry-adaptive mesh refinement framework and its application in the immersed boundary lattice Boltzmann method. Methods Appl. Mech. Eng. 2022, 392, 114662. [Google Scholar] [CrossRef]
  22. Lin, C.L.; Lai, Y.G. Lattice Boltzmann method on composite grids. Phys. Rev. E 2000, 62, 2219. [Google Scholar] [CrossRef] [PubMed]
  23. Dupuis, A.; Chopard, B. Theory and applications of an alternative lattice Boltzmann grid refinement algorithm. Phys. Rev. E 2003, 67, 066707. [Google Scholar] [CrossRef] [PubMed]
  24. Chen, S.; Peng, C.; Teng, Y.; Wang, L.P.; Zhang, K. Improving lattice Boltzmann simulation of moving particles in a viscous flow using local grid refinement. Comput. Fluids 2016, 136, 228–246. [Google Scholar] [CrossRef]
  25. Rohde, M.; Kandhai, D.; Derksen, J.; Van den Akker, H.E. A generic, mass conservative local grid refinement technique for lattice-Boltzmann schemes. Int. J. Numer. Methods Fluids 2006, 51, 439–468. [Google Scholar] [CrossRef]
  26. Eitel-Amor, G.; Meinke, M.; Schröder, W. A lattice-Boltzmann method with hierarchically refined meshes. Comput. Fluids 2013, 75, 127–139. [Google Scholar] [CrossRef]
  27. Hasert, M.; Masilamani, K.; Zimny, S.; Klimach, H.; Qi, J.; Bernsdorf, J.; Roller, S. Complex fluid simulations with the parallel tree-based lattice Boltzmann solver Musubi. J. Comput. Sci. 2014, 5, 784–794. [Google Scholar] [CrossRef]
  28. Kandhai, D.; Koponen, A.; Hoekstra, A.G.; Kataja, M.; Timonen, J.; Sloot, P.M. Lattice-Boltzmann hydrodynamics on parallel systems. Comput. Phys. Commun. 1998, 111, 14–26. [Google Scholar] [CrossRef]
  29. Pan, C.; Prins, J.F.; Miller, C.T. A high-performance lattice Boltzmann implementation to model flow in porous media. Comput. Phys. Commun. 2004, 158, 89–105. [Google Scholar] [CrossRef]
  30. Onodera, N.; Idomura, Y.; Uesawa, S.; Yamashita, S.; Yoshida, H. Locally mesh-refined lattice Boltzmann method for fuel debris air cooling analysis on GPU supercomputer. Mech. Eng. J. 2020, 7, 19-00531. [Google Scholar] [CrossRef]
  31. Pasquali, A.; Schönherr, M.; Geier, M.; Krafczyk, M. Simulation of external aerodynamics of the DrivAer model with the LBM on GPGPUs. In Parallel Computing: On the Road to Exascale; IOS Press: Clifton, VA, USA, 2016; Volume 27, pp. 391–400. [Google Scholar]
  32. Schornbaum, F.; Rude, U. Massively parallel algorithms for the lattice Boltzmann method on nonuniform grids. SIAM J. Sci. Comput. 2016, 38, C96–C126. [Google Scholar] [CrossRef]
  33. Abas, A.; Mokhtar, N.H.; Ishak, M.H.H.; Abdullah, M.Z.; Ho Tian, A. Lattice Boltzmann model of 3D multiphase flow in artery bifurcation aneurysm problem. Comput. Math. Methods Med. 2016, 2016, 6143126. [Google Scholar] [CrossRef] [PubMed]
  34. Tan, J.; Sinno, T.R.; Diamond, S.L. A parallel fluid–solid coupling model using LAMMPS and Palabos based on the immersed boundary method. J. Comput. Sci. 2018, 25, 89–100. [Google Scholar] [CrossRef] [PubMed]
  35. Mekala, M.; Dhiman, G.; Srivastava, G.; Nain, Z.; Zhang, H.; Viriyasitavat, W.; Varma, G. A DRL-based service offloading approach using DAG for edge computational orchestration. IEEE Trans. Comput. Soc. Syst. 2022. early access. [Google Scholar] [CrossRef]
  36. Shukla, S.K.; Gupta, V.K.; Joshi, K.; Gupta, A.; Singh, M.K. Self-aware execution environment model (SAE2) for the performance improvement of multicore systems. Int. J. Mod. Res. 2022, 2, 17–27. [Google Scholar]
  37. Safdari Shadloo, M. Numerical simulation of compressible flows by lattice Boltzmann method. Numer. Heat Transf. Part A Appl. 2019, 75, 167–182. [Google Scholar] [CrossRef]
  38. Guo, Z.; Zheng, C.; Shi, B. An extrapolation method for boundary conditions in lattice Boltzmann method. Phys. Fluids 2002, 14, 2007–2010. [Google Scholar] [CrossRef]
  39. Yu, D.; Mei, R.; Shyy, W. A unified boundary treatment in lattice boltzmann method. In Proceedings of the 41st Aerospace Sciences Meeting and Exhibit, Reno, NV, USA, 6–9 January 2003; p. 953. [Google Scholar]
  40. Fakhari, A.; Lee, T. Numerics of the lattice Boltzmann method on nonuniform grids: Standard LBM and finite-difference LBM. Comput. Fluids 2015, 107, 205–213. [Google Scholar] [CrossRef]
  41. Alam, M.M. Lift forces induced by phase lag between the vortex sheddings from two tandem bluff bodies. Fluids Struct. 2016, 65, 217–237. [Google Scholar] [CrossRef]
  42. Koda, Y.; Lien, F.S. Aerodynamic effects of the early three-dimensional instabilities in the flow over one and two circular cylinders in tandem predicted by the lattice Boltzmann method. Comput. Fluids 2013, 74, 32–43. [Google Scholar] [CrossRef]
  43. Meneghini, J.R.; Saltara, F.; Siqueira, C.D.; Ferrari, J.A., Jr. Numerical simulation of flow interference between two circular cylinders in tandem and side-by-side arrangements. J. Fluids Struct. 2001, 15, 327–350. [Google Scholar] [CrossRef]
  44. Xu, L.; Li, J.; Chen, R. A scalable parallel unstructured finite volume lattice Boltzmann method for three-dimensional incompressible flow simulations. Int. J. Numer. Methods Fluids 2021, 93, 2744–2762. [Google Scholar] [CrossRef]
  45. Cheng, C.; Galindo-Torres, S.A.; Zhang, X.; Zhang, P.; Scheuermann, A.; Li, L. An improved immersed moving boundary for the coupled discrete element lattice Boltzmann method. Comput. Fluids 2018, 177, 12–19. [Google Scholar] [CrossRef]
  46. Wu, J.; Shu, C. An improved immersed boundary-lattice Boltzmann method for simulating three-dimensional incompressible flows. J. Comput. Phys. 2010, 229, 5022–5042. [Google Scholar] [CrossRef]
Figure 1. (a) The D2Q9 velocity discrete model. (b) The D2Q9 distribution function model.
Figure 1. (a) The D2Q9 velocity discrete model. (b) The D2Q9 distribution function model.
Applsci 14 06976 g001
Figure 2. (a) The D3Q19 velocity discrete model. (b) The D3Q19 distribution function model
Figure 2. (a) The D3Q19 velocity discrete model. (b) The D3Q19 distribution function model
Applsci 14 06976 g002
Figure 3. Flow chart of RLBM.
Figure 3. Flow chart of RLBM.
Applsci 14 06976 g003
Figure 4. Schematic diagram for multi-layer grids.
Figure 4. Schematic diagram for multi-layer grids.
Applsci 14 06976 g004
Figure 5. Schematic diagram of the interpolation process.
Figure 5. Schematic diagram of the interpolation process.
Applsci 14 06976 g005
Figure 6. Flow chart for grid division.
Figure 6. Flow chart for grid division.
Applsci 14 06976 g006
Figure 7. Schematic diagram of the multi-layer grid division.
Figure 7. Schematic diagram of the multi-layer grid division.
Applsci 14 06976 g007
Figure 8. Schematic diagram of MPI data transfer method.
Figure 8. Schematic diagram of MPI data transfer method.
Applsci 14 06976 g008
Figure 9. Schematic diagram of two-dimensional flow over two circular cylinders.
Figure 9. Schematic diagram of two-dimensional flow over two circular cylinders.
Applsci 14 06976 g009
Figure 10. Three-layer grid structure diagram.
Figure 10. Three-layer grid structure diagram.
Applsci 14 06976 g010
Figure 11. Velocitycontours and streamlines diagram at L / D = 1.5 .
Figure 11. Velocitycontours and streamlines diagram at L / D = 1.5 .
Applsci 14 06976 g011
Figure 12. Velocity contours and streamlines diagram at L / D = 2 .
Figure 12. Velocity contours and streamlines diagram at L / D = 2 .
Applsci 14 06976 g012
Figure 13. Velocity contours and streamlines diagram at L / D = 3 .
Figure 13. Velocity contours and streamlines diagram at L / D = 3 .
Applsci 14 06976 g013
Figure 14. Velocity contours and streamlines diagram at L / D = 4 .
Figure 14. Velocity contours and streamlines diagram at L / D = 4 .
Applsci 14 06976 g014
Figure 15. Cylindrical bypass: C D and C L when Re = 200 and L / D = 1.5 . (a) Front cylinder; (b) rear cylinder.
Figure 15. Cylindrical bypass: C D and C L when Re = 200 and L / D = 1.5 . (a) Front cylinder; (b) rear cylinder.
Applsci 14 06976 g015
Figure 16. Cylindrical bypass: C D and C L when Re = 200 and L / D = 2 . (a) Front cylinder; (b) rear cylinder.
Figure 16. Cylindrical bypass: C D and C L when Re = 200 and L / D = 2 . (a) Front cylinder; (b) rear cylinder.
Applsci 14 06976 g016
Figure 17. Cylindrical bypass: C D and C L when Re = 200 and L / D = 3 . (a) Front cylinder; (b) rear cylinder.
Figure 17. Cylindrical bypass: C D and C L when Re = 200 and L / D = 3 . (a) Front cylinder; (b) rear cylinder.
Applsci 14 06976 g017
Figure 18. Cylindrical bypass: C D and C L when Re = 200 and L / D = 4 . (a) Front cylinder; (b) rear cylinder.
Figure 18. Cylindrical bypass: C D and C L when Re = 200 and L / D = 4 . (a) Front cylinder; (b) rear cylinder.
Applsci 14 06976 g018
Figure 19. Comparison of the running overhead of single-layer grid and multi-layer grid.
Figure 19. Comparison of the running overhead of single-layer grid and multi-layer grid.
Applsci 14 06976 g019
Figure 20. Velocity contours and streamlines diagram at L / D = 1.5 .
Figure 20. Velocity contours and streamlines diagram at L / D = 1.5 .
Applsci 14 06976 g020
Figure 21. Streamlines of the flow field of a three-dimensional sphere at Re = 50.
Figure 21. Streamlines of the flow field of a three-dimensional sphere at Re = 50.
Applsci 14 06976 g021
Figure 22. Streamlines of the flow field of a three-dimensional sphere at Re = 100.
Figure 22. Streamlines of the flow field of a three-dimensional sphere at Re = 100.
Applsci 14 06976 g022
Figure 23. Streamlines of the flow field of a three-dimensional sphere at Re = 150.
Figure 23. Streamlines of the flow field of a three-dimensional sphere at Re = 150.
Applsci 14 06976 g023
Figure 24. Streamlines of the flow field of a three-dimensional sphere at Re = 200.
Figure 24. Streamlines of the flow field of a three-dimensional sphere at Re = 200.
Applsci 14 06976 g024
Figure 25. Comparison of the speed-up between the parallel algorithm in this paper and the parallel algorithm using OpenMP.
Figure 25. Comparison of the speed-up between the parallel algorithm in this paper and the parallel algorithm using OpenMP.
Applsci 14 06976 g025
Figure 26. Comparison of the efficiency between the parallel algorithm in this paper and the parallel algorithm using OpenMP.
Figure 26. Comparison of the efficiency between the parallel algorithm in this paper and the parallel algorithm using OpenMP.
Applsci 14 06976 g026
Table 1. C L and C D ¯ of the front cylinder when Re = 200.
Table 1. C L and C D ¯ of the front cylinder when Re = 200.
Koda et al. [42]Alam et al. [41]Meneghini et al. [43]Present Work
L / D C D 1 ¯ C L 1 C D 1 ¯ C L 1 C D 1 ¯ C L 1 C D 1 ¯ C L 1
1.51.1070.0161.061.0700.020
21.031.0520.047
31.0430.0231.0180.0221.001.0430.022
41.2670.5321.2550.5511.181.2560.547
Table 2. C L and C D ¯ of the rear cylinder when Re = 200.
Table 2. C L and C D ¯ of the rear cylinder when Re = 200.
Koda et al. [42]Alam et al. [41]Meneghini et al. [43]Present Work
L / D C D 2 ¯ C L 2 C D 2 ¯ C L 2 C D 2 ¯ C L 2 C D 2 ¯ C L 2
1.5−0.2010.041−0.08−0.2110.056
2−0.17−0.2310.189
3−0.1300.209−0.1300.212−0.08−0.1320.267
40.4821.2100.3601.1170.3800.3801.117
Table 3. Drag coefficients C D and the length X S of the recirculation region.
Table 3. Drag coefficients C D and the length X S of the recirculation region.
Cheng et al. [45]Wu et al. [46]Xu et al. [44]Present Work
Re C D ¯ X S C D ¯ X S C D ¯ X S C D ¯ X S
501.560.431.5750.44
1001.0991.1281.120.901.080.925
1500.851.260.8811.25
2000.8000.8000.811.420.7831.40
Table 4. The speedup ratio and efficiency of the algorithm proposed in this paper.
Table 4. The speedup ratio and efficiency of the algorithm proposed in this paper.
Cores124816
Speedup11.733.375.779.05
Efficiency100%86.51%84.25%72.13%56.56%
Table 5. The speedup ratio and efficiency of the OpenMP.
Table 5. The speedup ratio and efficiency of the OpenMP.
Cores124816
Speedup11.462.353.054.62
Efficiency100%73.00%58.75%38.18%28.88%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Zhao, Y.; Zhu, W.; Wang, Y. A Parallel Algorithm Based on Regularized Lattice Boltzmann Method for Multi-Layer Grids. Appl. Sci. 2024, 14, 6976. https://doi.org/10.3390/app14166976

AMA Style

Liu Z, Zhao Y, Zhu W, Wang Y. A Parallel Algorithm Based on Regularized Lattice Boltzmann Method for Multi-Layer Grids. Applied Sciences. 2024; 14(16):6976. https://doi.org/10.3390/app14166976

Chicago/Turabian Style

Liu, Zhixiang, Yunhao Zhao, Wenhao Zhu, and Yang Wang. 2024. "A Parallel Algorithm Based on Regularized Lattice Boltzmann Method for Multi-Layer Grids" Applied Sciences 14, no. 16: 6976. https://doi.org/10.3390/app14166976

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop