Next Article in Journal
Urban Drool Water Quality in Denver, Colorado: Pollutant Occurrences and Sources in Dry-Weather Flows
Previous Article in Journal
Flood Hazard Mapping with Distributed Hydrological Simulations and Remote-Sensed Slackwater Sediments in Ungauged Basins
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

GPU-Accelerated Laplace Equation Model Development Based on CUDA Fortran

Korea Institute of Civil Engineering and Building Technology, 283 Goyangdae-ro, Ilsanseo-gu, Goyang 10223, Korea
*
Author to whom correspondence should be addressed.
Water 2021, 13(23), 3435; https://doi.org/10.3390/w13233435
Submission received: 5 November 2021 / Revised: 29 November 2021 / Accepted: 2 December 2021 / Published: 4 December 2021

Abstract

:
In this study, a CUDA Fortran-based GPU-accelerated Laplace equation model was developed and applied to several cases. The Laplace equation is one of the equations that can physically analyze the groundwater flows, and is an equation that can provide analytical solutions. Such a numerical model requires a large amount of data to physically regenerate the flow with high accuracy, and requires computational time. These numerical models require a large amount of data to physically reproduce the flow with high accuracy and require computational time. As a way to shorten the computation time by applying CUDA technology, large-scale parallel computations were performed on the GPU, and a program was written to reduce the number of data transfers between the CPU and GPU. A GPU consists of many ALUs specialized in graphic processing, and can perform more concurrent computations than a CPU using multiple ALUs. The computation results of the GPU-accelerated model were compared with the analytical solution of the Laplace equation to verify the accuracy. The computation results of the GPU-accelerated Laplace equation model were in good agreement with the analytical solution. As the number of grids increased, the computational time of the GPU-accelerated model gradually reduced compared to the computational time of the CPU-based Laplace equation model. As a result, the computational time of the GPU-accelerated Laplace equation model was reduced by up to about 50 times.

1. Introduction

In the field of Computational Fluid Dynamics (CFD), research to realistically express computational fluid results based on improvements in computer performance is actively being conducted. Such research results require a large amount of data to physically regenerate the flow with high accuracy, and thus require significant computational time to process data. That is, an important factor called computational execution time became a consideration. In the past, only a Central Processing Unit (CPU) was used to perform a computation on a large amount of data; since about 20 years ago, a technique for using a Graphic Processing Unit (GPU) has been developed and used [1]. Gradually, GPUs have emerged as a viable, inexpensive and highly portable alternative to large and expensive high-performance computing clusters [2].
A GPU is a device that quickly processes graphic work under the command of a CPU, converts a digital signal into an image, and transmits it to a display device. The hardware of a CPU consists of an Arithmetical Logic Unit (ALU) with a complex physical structure to enable various tasks, and the hardware of a GPU consists of an ALU that is relatively simpler than that of a CPU. However, since the GPU contains a lot of ALUs, it is possible to perform more computations at once than the CPU. GPUs require programming in a different way from a CPU when performing data computations due to differences in the CPU and hardware. NVIDIA’s CUDA (Compute Unified Device Architecture) [3] and Khronos Group’s OpenCL (Open Computing Language) [4] are typical examples of programming using GPUs [5]. GPU manufacturer NVIDIA has developed a compiler that can use the GPU for data computation in collaboration with the compiler supplier, the Portland Group Incorporated (PGI). One of the results, CUDA Fortran, is a compiler incorporating CUDA and PGI Fortran.
CUDA is a General-Purpose Computing on Graphics Processing Units (GPGPU) technology that allows a GPU to write parallel data processing algorithms using programming languages. GPGPU is a technology that uses the GPU, which had previously only worked on graphics under CPU commands, to computation applications that the CPU was in charge of, and GPU performs better in large-scale parallel processing because it uses more cores than the CPU [6]. In 2007, NVIDIA announced the CUDA architecture, capable of GPU-based parallel programming, and began introducing hardware such as accelerators equipped with CUDA processors [3].
Over the past few years, research on massively parallel computations using GPUs in the CFD field has been conducted. In 2011, an open source software called SPHysics (www.sphysics.org accessed on 4 November 2021) was developed using the GPU-based Smooth Particle Hydrodynamics (SPH) method for free-surface fluid analysis [7,8]. Chow et al. [2] developed a GPU-based three dimensions Incompressible Smooth Particle Hydrodynamics (ISPH), achieving up to 18 times faster simulation execution time. Afrasiabi et al. [9] developed a more optimized GPU-based 2D SPH and achieved up to about 264 times faster execution time. Vanderbauwhede and Takemi [5] developed an OpenCL-based Weather Research and Forecasting (WRF) model, and the results showed that the computational execution time of the model using the GPU was reduced by two times compared to the case of the model using the CPU. Kim et al. [10] developed a CUDA Fortran-based WRF model, and the result of this study also said that GPU can improve computation efficiency by five times compared to the CPU. Chang [11] applied CUDA to compressible flow cases based on high-order accuracy numerical techniques. The computational execution time using the GPU was reduced by about two times compared to the Open Multi-Processing (OpenMP) parallel computation using 12 cores of the CPU.
In this study, a CUDA Fortran-based GPU-accelerated Laplace equation model was developed and applied. Laplace equation is one of the various equations that can physically analyze under groundwater flows, and is an equation that can calculate analytical solutions. The computation results of the GPU-accelerated Laplace equation model were compared with the analytical solution to verify the accuracy. Performance with respect to computational execution time was compared by measuring the overall program execution time of the GPU-based Laplace equation model and the CPU-based Laplace equation model. In addition, by configuring the number of grids in various ways, the performance improvement of the computational execution time according to the number of grids was investigated.

2. Numerical Method

2.1. Governing Equation

In the two-dimensional non-rotational flow, the velocity component can be expressed as Equation (1) as a scalar function ϕ ( x , y , t ) .
2 ϕ x 2 + 2 ϕ y 2 = 0 ,
x and y are the horizontal and vertical axes, t is the time, and ϕ is the velocity potential. Non-viscosity, incompressibility, and non-rotational flow fields are governed by Laplace equations, and these flows are called potential flows.

2.2. Finite Volume Method

In this study, the Finite Volume Method [12] was adopted to discretize the Laplace equation. The finite volume method is known as a method that is convenient to apply to irregular curved grid networks as shown in Figure 1. Equation (2) is presented to apply the numerical analysis of the Laplace equation using the finite volume method for the computation element A B C D in Figure 1.
A B C D 2 ϕ x 2 + 2 ϕ y 2 d x d y = A B C D H · n d s = 0 ,
where H · n d s = ( ϕ / x ) d y ( ϕ / y ) d x .
Equation (3) is a discretization of Equation (2).
ϕ x i , j 1 / 2 Δ y A B ϕ y i , j 1 / 2 Δ x A B + ϕ x i + 1 / 2 , j Δ y B C ϕ y i + 1 / 2 , j Δ x B C + ϕ x i , j + 1 / 2 Δ y C D ϕ y i , j + 1 / 2 Δ x C D + ϕ x i 1 / 2 , j Δ y D A ϕ y i 1 / 2 , j Δ x D A = 0 .
( i , j ) is an index of a grid point, Δ x is an x-direction grid interval, and Δ y is a y-direction grid interval. ϕ / x i , j 1 / 2 is the result of the computation element A B C D in Figure 1, as shown in Equation (4a). ϕ / y i , j 1 / 2 is expressed as Equation (4b).
ϕ x i , j 1 / 2 = 1 S A B C D ϕ x d x d y = 1 S A B C D ϕ d y
ϕ y i , j 1 / 2 = 1 S A B C D ϕ y d x d y = 1 S A B C D ϕ d x ,
where S A B C D is the area of the computation element A B C D . If the grid is not overly distorted, the parameters of Equation (5) can be assumed as in Equations (6a)–(6c).
A B C D ϕ d y ϕ i , j 1 Δ y A B + ϕ B Δ y B C + ϕ i , j Δ y C D + ϕ A Δ y D A
Δ y A B Δ y C D Δ y A B
Δ y B C Δ y D A Δ y j 1 , j
S A B C D = Δ x A B Δ y j 1 , j Δ y A B Δ x j 1 , j .
Therefore, as a result of using Equation (5), Equations (4a) and (4b) become Equations (7a) and (7b), respectively.
ϕ x i , j 1 / 2 = Δ y A B ( ϕ i , j 1 ϕ i , j ) + Δ y i 1 , j ( ϕ B ϕ A ) S A B C D
ϕ y i , j 1 / 2 = [ Δ x A B ( ϕ i , j 1 ϕ i , j ) + Δ x i 1 , j ( ϕ B ϕ A ) ] S A B C D .
ϕ / x i + 1 / 2 , j can also be expressed in the same way. Using the process of Equations (4a)–(7b), Equation (3) can be expressed as Equation (8).
Q A B ( ϕ i , j 1 ϕ i , j ) + P A B ( ϕ B ϕ A ) + Q B C ( ϕ i + 1 , j ϕ i , j ) + P B C ( ϕ C ϕ B ) + Q C D ( ϕ i + 1 , j ϕ i , j ) + P C D ( ϕ D ϕ C ) + Q D A ( ϕ i 1 , j ϕ i , j ) + P D A ( ϕ A ϕ C ) = 0 ,
where Q A B and ϕ A in Equation (8) are the same as in Equations (9a)–(9c). ϕ A represents the average value for the four adjacent nodes as in Equation (9b).
Q A B = ( Δ x A B 2 + Δ y A B 2 ) S A B C D
P A B = ( Δ x A B Δ x i 1 , j + Δ y A B Δ y i 1 , j ) S A B C D
ϕ A = 0.25 ( ϕ i , j + ϕ i 1 , j + ϕ i 1 , j 1 + ϕ i , j 1 ) .
Therefore, Equation (8) becomes Equation (10). A more detailed description of Equation (9) is presented by Fletcher [12].
0.25 ( P C D P D A ) ϕ i 1 , j + 1 + [ Q C D + 0.25 ( P B C P D A ) ] ϕ i , j + 1 + 0.25 ( P B C P C D ) ϕ i + 1 , j + 1 + [ Q D A + 0.25 ( P C D P A B ) ] ϕ i 1 , j ( Q A B + Q B C + Q C D + Q D A ) ϕ i , j + [ Q B C + 0.25 ( P A B P C D ) ] ϕ i + 1 , j + 0.25 ( P D A P A B ) ϕ i 1 , j 1 + [ Q A B + 0.25 ( P D A P B C ) ] ϕ i , j 1 + 0.25 ( P A B P B C ) ϕ i + 1 , j 1 = 0 .
The solution of Equation (10) was obtained using the Successive Over-Relaxation technique as in Equations (11) and (12).
ϕ i , j * = { 0.25 ( P C D P D A ) ϕ i 1 , j + 1 + [ Q C D + 0.25 ( P B C P D A ) ] ϕ i , j + 1 + 0.25 ( P B C P C D ) ϕ i + 1 , j + 1 + [ Q D A + 0.25 ( P C D P A B ) ] ϕ i 1 , j + [ Q B C + 0.25 ( P A B P C D ) ] ϕ i + 1 , j + 0.25 ( P D A P A B ) ϕ i 1 , j 1 + [ Q A B + 0.25 ( P D A P B C ) ] ϕ i , j 1 + 0.25 ( P A B P B C ) ϕ i + 1 , j 1 } } n / ( Q A B + Q B C + Q C D + Q D A )
ϕ i , j k + 1 = ϕ i , j k + λ ( ϕ i , j * ϕ i , j k ) ,
where ϕ i , j k + 1 is the solution obtained in the current step, and ϕ i , j k is the solution obtained in the previous step. λ is the relaxation parameter and is λ = 1.5 in this study.
Where | ϕ i , j * ϕ i , j k | < 1.0 × 10 5 , the calculation is finished.

3. GPU-Accelerated Computing

Figure 2 is a flowchart of the GPU-accelerated Laplace equation model. The solid line indicates the sequence of programs, and the dotted line means the sequence of programs and data transfer between the CPU and GPU.

3.1. CUDA Parallel Programming

Parallel programming of CUDA was implemented to reduce the computation time of GPU-accelerated Laplace equations. The existing CPU-based sequential program has been changed to a parallel program to enable GPU programming. The CUDA generates grids and blocks for the number of grids in the calculation area. A group of blocks is a grid, a group of threads is a block, and a thread refers to the single core operating on the GPU. In the parallel programming of CUDA, one instruction is assigned to each core, and the threads within one block perform operations in parallel at the same time. The array of threads is stored in blockDim, and threadidx has an index. The block array is stored in gridDim, and the index is stored in blockidx [3].

3.2. GPU Hardware Structure and Features

Figure 3 shows a simplified CPU and GPU hardware structure. A control unit is a device that instructs the operation of a processor, and controls the communication and coordination between input/output devices. ALU is a digital circuit that calculates arithmetic and logical operations. ALU is a digital circuit that calculates arithmetic and logical operations. As shown in Figure 3, the CPU consists of a small number of ALUs, but the GPU consists of many ALUs. For the hardware used in this study, the number of cores containing ALU was six cores with Intel Xeon E5-2620 v2 @ 2.10 GHz for the CPU, and 5760 with NVIDIA GeForce GTXTITAN Z for the GPU (Table 1).
In CUDA programming, there is an additional process called data transfer between the GPU and CPU that does not exist in existing CPU programming due to the difference in the hardware structure between the CPU and GPU. Figure 4 shows the data processing flow of CUDA for the dotted line in Figure 2 in detail. The CPU and GPU have independent memory that each can control, and cannot control the memory of the other. When the GPU processes data, data are first transmitted from the CPU memory to the GPU memory, and the CPU instructs the data processing command to the GPU. The GPU processes data in parallel at each core, and transfers the processed result from the GPU memory to the CPU memory.

3.3. Data Transfer between the CPU and GPU

The objective of GPU-accelerated models is to shorten the time of numerical computation. Data transfer occurs between the CPU and GPU due to the difference in high-ware, and it should be programmed so that data transfer between them does not occur frequently. Therefore, the number of data transfers between the CPU and GPU should be reduced to a minimum, and the program should be implemented so that continuous operations are performed on the GPU. Figure 5 concisely shows the program system of the GPU-accelerated model developed in this study. Figure 6 shows a part of the code for the CUDA Fortran numerical computations part of Figure 2. Data transfer between CPU and GPU was performed only in rows 1 and 11 to minimize the number of transfers. Lines 3 to 7 are instructions for the continuous operation of the GPU. However, the GPU-accelerated Laplace equation model does not run continuously on the GPU. As shown in Figure 5 and Figure 6 in lines 2 and 8 to 10, it is programmed so that GPU operation is temporarily suspended due to CPU operation while GPU operation is being performed. Then, GPU operation is performed again after the CPU operation is completed. Such a cause is judged to be the limitation of the function command of the CUDA program.

4. Numerical Cases and Results

4.1. Comparison of the Numerical Result and the Analysis Solution

The accuracy of the GPU-accelerated Laplace equation model was analyzed by comparing the result, from the numerical simulation and analytical solution for Equations (13a) and (13b), which are publicly known.
2 ϕ x 2 + 2 ϕ y 2 = 0 , 0 < x < a , 0 < y < b
ϕ ( 0 , y ) = 0 ϕ ( a , y ) = 0 ϕ ( x , 0 ) = 0 ϕ ( x , b ) = f ( x ) ,
where x and y are the horizontal and vertical axes, ϕ is the dimensionless temperature, a and b are arbitrary constants. The analytical solutions for Equations (13a) and (13b) are the same as those for Equation (14) [14].
ϕ ( x , y ) = n = 1 A n s i n h n π a y s i n n π a x ,
where A n is Equation (15), and in the case of f ( x ) = 100 , a = 1 , b = 1 , A n becomes Equation (16).
A n = 2 a s i n h n π b a 0 a f ( x ) s i n n π a x d x
A n = 200 1 ( 1 ) n n π s i n h n π .
In the numerical tests, numerical grids the size of 0.05 × 0.05 were applied. Figure 7 shows the numerical model and the analytical solution result, and the two results agree well. The Normalized Root Mean Square Error (NRMSE) of the numerical analysis and analytical solution results was calculated and presented. For NRMSE, Equation (17) was used.
NRMSE = 1 m k = 1 m A k F k A k 2 ,
where m is the number of numerical analysis result values, A k is the analysis result value, and F k is the numerical analysis result value. The NRMSE for Figure 7 is 3.9 × 10 4 , which indicates that the numerical analysis result of the GPU-accelerated Laplace equation and the analysis solution agree well.

4.2. Groundwater Flow around Sheet Pile Dam

The velocity potential and stream function were obtained by applying the GPU-accelerated Laplace equation model to a sheet pile dam, which is often used as a benchmark for groundwater flow. Figure 8 is a cross-sectional view of the sheet pile dam. The upstream water depth is 2 m, the downstream water depth is 0.5 m, and the sheet piles are located up to 15 m deep in the soil layer. The impermeable layer is just below the soil layer.
Figure 9a,b shows the boundary conditions for finding the velocity potential and the stream function. Dirichlet boundary conditions and Neumann boundary conditions were applied. Section 1 to Section 9 are Dirichlet boundary conditions. Section 1 is ϕ = 2.0 m, Section 2 is ϕ = 0.5 m, Section 3 is ϕ = 0 m, Section 4 to Section 6 is ψ = 0.5 m 2 /s, and Section 7 to Section 9 is ψ = 2.0 m 2 /s.
Figure 10 is the numerical analysis result of the CPU-accelerated Laplace equation model. Figure 10a shows the mesh configuration for the computational domain. Figure 10b shows the equipotential line by finding the velocity potential, and Figure 10c shows the streamline by finding the stream function. As a result of applying the GPU-accelerated Laplace equation model to the sheet pile dam, it was confirmed that the equipotential line and the streamline were approximately perpendicular to each other as shown in Figure 10d. This indicates that the GPU-accelerated Laplace equation model represents the groundwater flow around the sheet pile dam relatively well.

4.3. Irregular Calculation Area

In order to apply the GPU-accelerated Laplace model to the irregular calculation area, a sector-shaped calculation area, as shown in Figure 11, was set.
Where R W = R Z = 0.1 m , R X = R Y = 1 m and the angle between R X and R Y is 90 degrees. Dirichlet boundary conditions were applied as boundary conditions for the computational domain, and the value is the same as Equation (18).
ϕ = 0 a t W X ϕ = s i n θ r X Y a t X Y ϕ = 1 r Y Z a t Y Z ϕ = s i n θ r W Z a t Z W
The numerical computation results of GPU-accelerated Laplace equations were verified by comparing them with the CPU-based numerical analysis results preceded by Fletcher [12]. Figure 12 shows the results of calculating the velocity potential using the GPU-accelerated Laplace equation model for various numbers of grids. The NRMSE for Figure 12a and the numerical analysis results of the GPU-accelerated Laplace equation and the numerical analysis results of the CPU-based Laplace equation by Fletcher [12] agree well. The result of calculating the velocity potential by increasing the number of grids from Figure 12b–d showed a similar trend to that of Figure 12a.

4.4. Performance of GPU-Accelerated Laplace Equation Models

The performance of the GPU-accelerated Laplace equation model for computational execution time was compared by measuring the overall program execution time of each GPU- and CPU-based Laplace equation model for the results of Figure 13. The GPU-accelerated numerical model and the CPU-based numerical model assumed the same algorithm. Speedup, a measure of the performance improvement of parallel computers, was used for the performance of the calculation time. The speedup is the same as Equation (19).
S p e e d u p = C P U t i m e G P U t i m e ,
where GPU time is the overall program execution time of the GPU-accelerated numerical model, and is the overall program execution time of the CPU-based numerical model. The relationship between the number of grids and the speedup was confirmed by variously configuring the number of grids in the calculation area. As shown in Figure 13, the number of grids increased from 1.0 × 10 4 to 4.0 × 10 6 , and as the number of grids increased, the speedup also increased. An increase in the number of grids is associated with an increase in the data to be processed. As the number of data increases, the numerical simulation time increases. As the number of grids increases, the performance of the model tends to increase the efficiency of GPU parallel processing. As a result, the GPU-accelerated Laplace equation model shortened the computation time by up to 50 times compared to the CPU-based Laplace equation model. It is judged that, when a large-scale operation is performed, the use of a GPU can shorten the operation time compared to using only one CPU.
In the developed model, some data operations are being performed on the CPU due to the limitations of the functions provided by CUDA Fortran. To obtain higher acceleration results, the model must be programmed so that most data operations are performed on the GPU.

5. Conclusions

In this study, in order to shorten the computational execution time for the numerical analysis of Laplace equations, a GPU-accelerated Laplace equation model was implemented to verify the accuracy of the numerical analysis results and to confirm the performance of the computational execution time. The GPU-accelerated Laplace equation model applied a technique called CUDA and implemented the use of the Fortran language. As a way to shorten the computation time by applying CUDA technology, large-scale parallel computations were performed on the GPU, and a program was written to reduce the number of data transfers between the CPU and GPU.
The results of the GPU-accelerated Laplace equation model were in good agreement with the analytical solution and the results of previous studies. As for the computational execution time performance of the model, the performance increased as the number of grids increased. When the number of grids was 4.0 × 10 6 , the time was shortened by up to about 50 times. When performing large-scale calculations, the performance of the GPU-accelerated Laplace equation model with respect to time is higher than the computation execution time of the existing CPU-based numerical model. We conclude from these results that the GPU-accelerated Laplace equation model can be applied to the research of groundwater flows targeting large areas.
Although this study provides an early stage for the GPU-based Laplace equation model, several areas of development are still needed. Various algorithms will be developed for the CUDA Fortran library. If most of the data processing is performed on the GPU, there is room for a further reduction in simulation execution time. In addition, the developed model should be studied on optimization. If the performance of CUDA is improved, it is judged that the performance of the CUDA Fortran-based model will also be further improved.

Author Contributions

Conceptualization, B.K., K.S.Y. and H.-J.K.; methodology, B.K., K.S.Y. and H.-J.K.; soft-ware and validation, B.K.; writing—original draft preparation, B.K.; writing—review and editing, H.-J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by a grant (Grant 127568) from the Water Management Research Program funded by the Ministry of Environment of the Korean government.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This study was supported by a grant (Grant 127568) from the Water Management Research Program funded by the Ministry of Environment of the Korean government.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ALUArithmetical Logic Unit
CFDComputational Fluid Dynamics
CPUCentral Processing Unit
CUDACompute Unified Device Architecture
GPGPUGeneral-Purpose Computing on Graphics Processing Units
GPUGraphic Processing Unit
ISPHIncompressible Smooth Particle Hydrodynamics
NRMSENormalized Root Mean Square Error
OpenCLOpen Computing Language
OpenMPOpen Multi-Processing
PGIPortland Group Incorporated
SPHSmooth Particle Hydrodynamics
WRFWeather Research and Forecasting

References

  1. Harju, A.; Siro, T.; Canova, F.F.; Hakala, S.; Rantalaiho, T. Computational physics on graphics processing units. In International Workshop on Applied Parallel Computing; Springer: Berlin/Heidelberg, Germany, 2012; pp. 3–26. [Google Scholar]
  2. Chow, A.D.; Rogers, B.D.; Lind, S.J.; Stansby, P.K. Incompressible SPH (ISPH) with fast Poisson solver on a GPU. Comput. Phys. Commun. 2018, 226, 81–103. [Google Scholar] [CrossRef]
  3. NVIDIA. Cuda c Programming Guide; Version 4.0; NVIDIA Corporation: Santa Clara, CA, USA, 2011. [Google Scholar]
  4. Munshi, A.; Gaster, B.; Mattson, T.G.; Ginsburg, D. OpenCL Programming Guide; Pearson Education: London, UK, 2011. [Google Scholar]
  5. Vanderbauwhede, W.; Takemi, T. An investigation into the feasibility and benefits of gpu/multicore acceleration of the weather research and forecasting model. In Proceedings of the 2013 International Conference on High Performance Computing & Simulation (HPCS), Helsinki, Finland, 1–5 July 2013; pp. 482–489. [Google Scholar]
  6. Bae, S.K. Acceleration of Word2vec Using GPUs. Master’s Thesis, University of Seoul, Seoul, Korea, 2017. [Google Scholar]
  7. Gomez Gesteira, M.; Crespo, A.J.; Rogers, B.D.; Dalrymple, R.A.; Dominguez, J.M.; Barreiro, A. Sphysics—Development of a freesurface fluid solver—Part 2: Efficiency and test cases. Comput. Geosci. 2012, 48, 300–307. [Google Scholar] [CrossRef]
  8. Gomez Gesteira, M.; Rogers, B.D.; Crespo, A.J.; Dalrymple, R.A.; Narayanaswamy, M.; Dominguez, J.M. Sphysics—Development of a free-surface fluid solver—Part 1: Theory and formulations. Comput. Geosci. 2012, 48, 289–299. [Google Scholar] [CrossRef]
  9. Afrasiabi, M.; Klippel, H.; Röthlin, M.; Wegener, K. An improved thermal model for SPH metal cutting simulations on GPU. Appl. Math. Model. 2021, 100, 728–750. [Google Scholar] [CrossRef]
  10. Kim, Y.T.; Lee, Y.L.; Chung, K.Y. WRF Physics Models Using GP-GPUs with CUDA Fortran. Korean Meteorol. Soc. 2013, 23, 231–235. [Google Scholar] [CrossRef] [Green Version]
  11. Chang, T.K. Efficient Computation of Compressible flow by Higher-Order Method Accelerated Using GPU. Master’s Thesis, Seoul National University, Seoul, Korea, 2014. [Google Scholar]
  12. Fletcher, C. Computational Techniques for Fluid Dynamics 1; Springer: New York, NY, USA, 1988; pp. 98–116. [Google Scholar]
  13. Kim, B. Development of GPU-Accelerated Numerical Model for Surface and Ground Water Flow. Ph.D. Thesis, University of Seoul, Seoul, Korea, 2019. [Google Scholar]
  14. Zill, D.; Wright, W.S.; Cullen, M.R. Advanced Engineering Mathematics; Jones & Bartlett Learning: Burlington, MA, USA, 2011; pp. 564–566. [Google Scholar]
Figure 1. Finite volume for a distorted grid [12].
Figure 1. Finite volume for a distorted grid [12].
Water 13 03435 g001
Figure 2. Flowchart for numerical scheme.
Figure 2. Flowchart for numerical scheme.
Water 13 03435 g002
Figure 3. Hardware architecture of the CPU and GPU [13].
Figure 3. Hardware architecture of the CPU and GPU [13].
Water 13 03435 g003
Figure 4. CUDA data processing flow [13].
Figure 4. CUDA data processing flow [13].
Water 13 03435 g004
Figure 5. Program system of GPU-accelerated Laplace equation model; solid line: data transfer, dash line: task performance.
Figure 5. Program system of GPU-accelerated Laplace equation model; solid line: data transfer, dash line: task performance.
Water 13 03435 g005
Figure 6. CUDA program code.
Figure 6. CUDA program code.
Water 13 03435 g006
Figure 7. Comparison of analytical and numerical methods results for mesh configuration.
Figure 7. Comparison of analytical and numerical methods results for mesh configuration.
Water 13 03435 g007
Figure 8. Cross-section of sheet pile dam.
Figure 8. Cross-section of sheet pile dam.
Water 13 03435 g008
Figure 9. Boundary conditions, solid line: Dirichlet boundary conditions, dash line: Neumann boundary conditions. (a) Velocity potential, (b) Stream function.
Figure 9. Boundary conditions, solid line: Dirichlet boundary conditions, dash line: Neumann boundary conditions. (a) Velocity potential, (b) Stream function.
Water 13 03435 g009
Figure 10. Computed results of velocity potential and stream function for mesh configuration. (a) Mesh configuration, (b) Equipotential line, (c) Streamline, (d) Flow net.
Figure 10. Computed results of velocity potential and stream function for mesh configuration. (a) Mesh configuration, (b) Equipotential line, (c) Streamline, (d) Flow net.
Water 13 03435 g010
Figure 11. Computational domain for the solution of Laplace equation; number of grids: 24 × 24 .
Figure 11. Computational domain for the solution of Laplace equation; number of grids: 24 × 24 .
Water 13 03435 g011
Figure 12. Computed velocity potential for various number of grids. (a) number of grids: 6 × 6 , (b) number of grids: 12 × 12 , (c) number of grids: 18 × 18 , (d) number of grids: 24 × 24 .
Figure 12. Computed velocity potential for various number of grids. (a) number of grids: 6 × 6 , (b) number of grids: 12 × 12 , (c) number of grids: 18 × 18 , (d) number of grids: 24 × 24 .
Water 13 03435 g012
Figure 13. Performance measurement of computation time of the GPU-accelerated Laplace equation model.
Figure 13. Performance measurement of computation time of the GPU-accelerated Laplace equation model.
Water 13 03435 g013
Table 1. System specification.
Table 1. System specification.
CPUIntel Xeon E5-2620 v2 (2.10 GHz)
GPUNVIDIA GeForce GTXTITAN Z
CUDA cores5760
Peak GPU Clock/Boost876 MHz
Peak GFLOPS10,091 GFLOPS SP
Combined Memory bandwidth675 GB/s
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, B.; Yoon, K.S.; Kim, H.-J. GPU-Accelerated Laplace Equation Model Development Based on CUDA Fortran. Water 2021, 13, 3435. https://doi.org/10.3390/w13233435

AMA Style

Kim B, Yoon KS, Kim H-J. GPU-Accelerated Laplace Equation Model Development Based on CUDA Fortran. Water. 2021; 13(23):3435. https://doi.org/10.3390/w13233435

Chicago/Turabian Style

Kim, Boram, Kwang Seok Yoon, and Hyung-Jun Kim. 2021. "GPU-Accelerated Laplace Equation Model Development Based on CUDA Fortran" Water 13, no. 23: 3435. https://doi.org/10.3390/w13233435

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop