**1. Introduction**

Rigorous modeling and solution of large-scale electromagnetic compatibility (EMC) problems often require prohibitive computational resources. Fast algorithms and techniques, as well as hardware platforms with high pipelining capability, are usually used to circumvent this problem (e.g., [1–15]). Graphics processing units (GPUs) are characterized by a high parallelism capability. Figure 1 presents schematically the CPU and GPU architectures. As can be seen in Figure 1, the number of threads in GPUs is dramatically higher than that in CPUs. A CPU consists of a few immense arithmetic logic unit (ALU) cores aiming to process with high cache memory and one enormous control module capable of managing a few threads at the same time. On the other hand, a GPU includes many small ALUs, small control modules, and a small cache. Furthermore, it can execute thousands of threads concurrently since it is optimized for parallel operations [16]. One of the particular examples of a large-scale EMC problem is the evaluation of lightning electromagnetic fields, and coupling to structures. It is a large-scale problem because of (i) the scale of the solution domain, which can be in the order of several tens of kilometers, and (ii) the complexity of the propagation media, when considering the inhomogeneity and roughness of the soil (e.g., [17–24]). In most of the studies focused on the evaluation of lightning radiated electromagnetic fields and their induced disturbances on nearby structures, such as overhead transmission lines and buried cables (e.g., [25–31]), computations have been carried out by making use of CPU-based systems.

**Figure 1.** CPU and graphics processing unit (GPU) architectures.

However, there exist several attempts in which different hardware platforms have been used for the same study (e.g., [32–36]). One example is the use of GPU-based compute unified device architecture (CUDA) programming for the finite difference time domain (FDTD) evaluation of lightning electromagnetic fields. Due to its high-speed calculation, this approach facilitates three-dimensional modeling of a problem, taking into account parameter uncertainties such as irregular lightning channel and surface roughness [34]. It is noted that this approach is supposedly 100 times faster than the serial processing approach in CPU [33]. Although the GPU-based CUDA programming approach is highly efficient and provides the programmer with the flexibility to utilize various memories such as cache memory, it requires the involvement of the programmer for the determination of many programming details [37]. The OpenACC programming model suggested by NVIDIA, CAPS, Cray, and the Portland Group is a general user-driven directive-based parallel programming model developed for engineers and scientists in OpenACC. The programmer has the capability of incorporating compiler directives and library routines into FORTRAN, C, or C++ source codes to assign the area within the code that needs expedition in parallel on GPU. In fact, the programmer is not preoccupied with parallelism details and, in turn, leaves these tasks to the compiler. This programming model efficiently and intelligently launches the kernels and parallels the code onto the GPU. OpenACC was proposed for the first time in 2011 as a high-level programming approach that has offered almost all types of processors with high performance and portability. This programming model allows executing and creating codes using the available and future graphics hardware [38]. This is because OpenACC can transfer calculations and data from the host to the accelerator device. Despite their different architectures, the two former hardware components (host and accelerator device) and their corresponding memories are either shared or separated. An accelerated computing model of OpenACC is presented in Figure 2. According to Figure 2, the OpenACC compiler executes the code and manages the transferred data between the host and accelerator device. Furthermore, OpenACC benefits from a top-level compiler directive for the sake of portability to parallelize different sections of the code and use parallel and optimized compilers to develop and execute the corresponding codes. Several compilation directives or programs can be defined by OpenACC for parallel execution of code fragments. Recently, graphical processing using OpenACC has caught the attention of many researchers due to its relative simplicity and high performance (e.g., [39,40]). One of the particular examples of using OpenACC on graphics processors is to calculate the vector potential using the finite difference generated by time-domain Green's functions of layered media where the computational speed was 45.97 times the CPU computational speed on MATLAB [40].

**Figure 2.** Open accelerator (OpenACC) model [38].

In this paper, OpenACC-based GPU processing is used to tackle the problem of long computation time in the FDTD method for the evaluation of electromagnetic fields generated by a lightning channel. It is worth noting that OpenACC benefits from relatively simple programming and dramatically increases computational speed. The rest of the paper is organized as follows. Section 2 presents the required steps for executing the computational FDTD code in graphics processing using OpenACC. Section 3 describes the adopted models and computational methods, as well as FDTD parameters. Section 4 presents the processing speed for the proposed method, and the results are compared with CPU serial processing speed. Concluding remarks are provided in Section 5.
