*2.1. GPU*

GPUs have been widely adopted to accelerate graphical processing algorithms for image processing and computer graphics that handle large amounts of data using their highly parallel architecture [12–16]. In particular, modern GPUs have been developed in the form of general-purpose computing on graphics processing units (GPGPUs) as a substitute for CPUs in HPC for various scientific algorithms, such as deep learning, genome mapping, and power flow analysis. This is because GPUs significantly outperform CPUs in terms of the cost-effectiveness of floating-point computational throughput.

Figure 1 briefly outlines the overall hardware architecture of a GPU and its programming model. The GPU is specialized to dramatically accelerate computation-intensive tasks due to its massive parallel architecture, which employs a large number of streaming multiprocessors (SMs) consisting of 32 scalar processors (SPs) that operate in a lockstep manner. The GPU supports parallel execution of a computational kernel consisting of multiple thread blocks, and each thread block is divided into warps, i.e., groups of 32 threads. All threads in each warp are executed simultaneously on a single SM. A warp scheduler selects one eligible warp at a time and concurrently executes all threads of the warp on the processing cores, i.e., the SPs, of the SM using lockstep synchronization. Therefore, we can exploit extremely high thread-level parallelism (TLP) by interleaving a large number of threads to the processing cores. Programmers can implement parallel programming models on GPUs using open computing language (OpenCL) and compute unified device architecture (CUDA).

**Figure 1.** GPU hardware architecture and programming model.

#### *2.2. Numerical Analysis of Power Systems*

## 2.2.1. Power Flow Analysis

A power flow analysis is a representative numerical method that estimates the electric power flow of an interconnected power system. It is designed to repeatedly perform power flow computations to obtain the voltage magnitudes and angles of all the buses in a power system and calculate the real and reactive powers of peripheral equipment connected to the buses. Because the power flow computation mainly consists of nonlinear algebraic equations, various approaches, including Gauss–Seidel (GS), Newton–Raphson (NR) [17], and fast-decoupled power flow (FDPF) [18] methods, have been developed to accelerate the solution of the equations.

The GS method is often used because it is the first practical approach proposed for estimating power flow in large-scale power systems. Because an admittance matrix of a power system contains many zero coefficients, power flow analysis is faster using the GS method than using the NR method for a single iteration. However, the GS method cannot easily be widely adopted in power flow analysis due to its poor convergence characteristics. Therefore, though the GS method is popular in power flow analysis, it is mentioned here only briefly because only a few components can benefit from parallel processing with GPUs.

The NR method is currently the most prevalent for power flow analysis. A number of studies have discussed acceleration of the NR method by applying GPU-based parallel processing techniques. In general, we can write a nodal equation of a power system network as follows:

$$I = \mathbb{Y}\_{bus} \cdot V \tag{1}$$

where *Ybus*, *I*, and *V* denote an admittance matrix of the network, a current vector, and a voltage vector, respectively. *Ii* indicates the total power flow at the *i*-th bus as in Equation (2).

$$I\_{\bar{i}} = \sum\_{k=1}^{n} \mathcal{Y}\_{ik} \mathcal{V}\_{\bar{k}} \tag{2}$$

A complex power equation can be expressed separately with real and imaginary parts, *P* and *Q*, as follows:

$$S\_i = V\_i(\sum\_{k=1}^n Y\_{ik} V\_k)^\* = V\_i \sum\_{k=1}^n Y\_{ik}^\* V\_k^\* = P\_i + jQ\_i \tag{3}$$

$$P\_i = \sum\_{k=1}^{n} \left| V\_i \right| \cdot \left| V\_k \right| \cdot \left( G\_{ik} \text{cos}\theta\_{ik} + B\_{ik} \text{sin}\theta\_{ik} \right) \tag{4}$$

$$Q\_i = \sum\_{k=1}^{n} |V\_i| \cdot |V\_k| \cdot \left( G\_{ik} \text{sin}\theta\_{ik} - B\_{ik} \text{cos}\theta\_{ik} \right) \tag{5}$$

Equations (4) and (5) show the power balance equations of the *i*-th bus in polar form. The NR method employs an iterative technique to solve the two nonlinear power balance equations. The set of resulting linear equations can be formulated in a matrix form as follows:

$$
\Delta f = f \cdot \Delta \mathbf{x} \tag{6}
$$

where *J* is a Jacobian coefficient matrix. Equation (7) shows the complete formulation to obtain the power mismatch, i.e., Δ*P* and Δ*Q*, through Equations (4) and (5) using the NR method.

$$
\begin{bmatrix}
\Delta P\\\Delta Q
\end{bmatrix} = \underbrace{\begin{bmatrix}
\frac{\partial P}{\partial \theta}\frac{\partial P}{\partial V}\\\frac{\partial Q}{\partial \theta}\frac{\partial Q}{\partial V}
\end{bmatrix}}\_{\text{Jacobian}} \begin{bmatrix}
\Delta \theta\\\Delta V
\end{bmatrix} \tag{7}
$$

For the sake of clarity, Figure 2 provides a succinct illustration of the iterative process of calculating a power flow solution using the NR method. The method starts with arbitrary initial values of *V* and *θ*, and then calculates the power mismatch using Equation (7). Also, the method determines whether the mismatch is converged to complete the iteration. If it is not converged, the method computes a Jacobian matrix of the power flow system with updated *P* and *Q* first, and derives the mismatches of voltage and phase angles, i.e., Δ*V* and Δ*θ*, from Equation (7). Then, the method updates the voltage and phase angles with the mismatches and performs the overall procedures iteratively.

**Figure 2.** Brief algorithm of the Newton–Raphson (NR) method.

As previously mentioned, we need to carry out heavy matrix calculations, such as an LU decomposition for updating both the Jacobian coefficients and the mismatch of voltage and phase angles. Nevertheless, the NR method performs the power flow computation much faster than the GS method by converging to the solution with fewer iterations; thus, the method is more suitable for large-scale power systems [19]. Also, the Jacobian matrix produced by the NR method can provide

an index for sensitivity analysis or some other control problems. Hence, in this paper, we discuss several previous studies focused on accelerating the NR method using GPU-based parallel computing in detail.

Finally, the FDPF is a method of approximating a solution based on the NR method, which required significantly less computational effort [17]. Although the NR method is the most popular method for PF studies, calculation is slow because it requires calculation of many inversions of the Jacobian matrix for each iteration. To resolve this drawback, some methods, including FDPF, have been proposed to apply approximation approaches instead of using the Jacobian coefficient matrix in its full form. In addition, the FDPF method exploits P-Q decoupling, which means the active and reactive powers are not significantly affected by the magnitude and phase angle change of the bus voltage, respectively. The method does not update its Jacobian coefficient matrix during the convergence iterations for reducing the overhead of numerical computations, so the Jacobian matrix is inverted only once during calculation of the solution. As a result, the FDPF method can formulate the NR method more simply; however, this comes with the disadvantage that more iterations are required to reach the solution and the method more often fails to eventually converge. Hence, the method is usually adopted in specific situations where fast calculation is compulsory for the power flow calculation.
