**4. Performance Analysis**

This section compares the graphics processing time using OpenACC for the considered 3D-FDTD problem in this study (dimensions of the volume 1500 × 1500 × 1500 m), with the execution time of this algorithm using a serial C programming language in CPU. Table 4 presents the comparison of the performance of OpenACC compared to CPU serial C. As can be seen in Table 4, the OpenACC execution time in an efficient state was 21.22 times that of the serial processing in the CPU without compromising precision. As can be seen in Table 3, the calculation time in the GPU process was drastically reduced by storing the variables of the float type (single precision). Although float variables (single precision) led to a decrease in the accuracy of the computation, this reduction was negligible. Figure 10 depicts the difference (computing error) between CPU and GPU processing for point A, at which the mountain height was zero (Hm = 0). The error is smaller than 0.008% and can be considered insignificant. In addition, Figure 11 indicates the gain in the computation speed as a function of the size of the workspace. It can be seen that the GPU performance increases with larger simulation domains.



**Figure 10.** (**a**) The blue solid line and pink line represent the CPU and GPU processing, respectively, (**b**) Numerical error between the CPU and GPU processing.

**Figure 11.** Gain in GPU runtime with respect to CPU for various numbers of nodes.
