**1. Introduction**

Ocean general circulation models (OGCMs) have become increasingly important for understanding oceanic dynamic processes and ocean environment forecasting. In recent years, OGCMs have been developed with finer resolution (10–100 km for eddy-resolving ocean models) and more physical procedures due to increasing scientific requirements. Accuracy is one of the important goals of ocean simulation and forecasting. In general, OGCMs with finer horizontal resolution can provide more accurate results [1]. However, the computational cost will be almost three orders of magnitude if the horizontal resolution is fined by one order of magnitude [2,3]. For an operational forecasting system, the computation should be finished within 1∼2 h. So, the parallel efficiency becomes a grea<sup>t</sup> challenge for OGCMs. Due to the low parallel efficiency, current global operational forecasting systems are still in 0.1◦ (∼10 km) to 1◦ (∼100 km) resolution, which falls far short of expectations. Therefore, with the resolution finer, the demand for improving computational performance is more and more urgent.

**Citation:** Yang, X.; Zhou, S.; Zhou, S.; Song, Z.; Liu, W. A Barotropic Solver for High-Resolution Ocean General Circulation Models. *J. Mar. Sci. Eng.* **2021**, *9*, 421. https://doi.org/ 10.3390/jmse9040421

Academic Editor: Marcos G. Sotillo

Received: 8 March 2021 Accepted: 9 April 2021 Published: 14 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

To increase the computational efficiency of OGCMs, the hydrostatic primitive equations for momentum can be solved using a split-explicit, time-stepping scheme that requires special treatment and coupling between barotropic and baroclinic modes. The baroclinic mode is used for three-dimensional prognostic variables, which are the slow processes, such as the advective term, while the barotropic mode solves the free-surface and associated barotropic velocity equations, such as surface inertial gravity wave, which are the fast processes. To improve the simulation efficiency, a finite number of barotropic time steps were carried out within each baroclinic step (details in Appendix A). In barotropic mode, the implicit free-surface method is common because it allows a large time step to efficiently calculate the fast gravity mode [4,5]. But this method requires solving elliptic equations (details are discussed in Section 2). One of the commonly used methods of solving an elliptic equation is the conjugate gradient (CG) method [6]. To reduce the execution time of the CG method, one way is to reduce the communication costs by reducing the number of iterations to convergence. Thus, preconditioning is commonly used for the CG method to reduce the number of iterations, assuming the cost of preconditioning is reasonable [7], which has been widely applied in OGCMs [8–10]. As the Preconditioned CG method (PCG) method used in OGCMs requires the calculation of global variables, which also expects a large number of global reductions, it becomes the bottleneck when the model is run in the large-scale parallel computation. To further reduce the associated communication cost and improve scalability, [11] tried to apply a preconditioned Classical Stiefel Iteration method instead of the PCG method in the Parallel Ocean Programme to solve the elliptic system of equations in the barotropic mode, which requires no global reduction except for checking convergence. However, the PCG method is still popular and widely used due to less requirement in memory, better stability, and faster and clearer representation. Thus, the other way to reduce the number of global reductions to improve the large-scale efficiency of the PCG method is necessary and rarely done before.

The Nucleus for European Modelling of the Ocean (NEMO) is a state-of-the-art OGCM that has been widely used by ocean and climate communities for oceanographic, forecasting, and climate studies [9]. The barotropic solver recommended in NEMO is also the PCG method because it can effectively solve elliptic equations with vector computers. When using thousands of processes, the PCG method in NEMO scales poorly (Figure 1), as do other OGCMs. After analyzing, we found that the poor scaling performance of the PCG method was caused by the operation of global reductions (i.e., function MPI\_Allreduce combines values from all processes and distributes the result back to all processes), which could account for more than 96% of the total execution time when using more than 8000 processes (Figure 2).

**Figure 1.** Speedup of the 5-km Nucleus for European Modelling of the Ocean (NEMO) experiment using the preconditioned conjugate gradient (PCG) solver (blue line) and the percentage of the barotropic time of the total execution time (red line). The baseline of the speedup is the running time corresponding to 400 processes.

**Figure 2.** Percentage of the global reduction (MPI\_Allreduce) time of the total execution time in the 5-km NEMO experiment with the PCG solver.

In this study, we developed a new method, namely, a communication-avoiding Krylov subspace method with PCG (CA-PCG), to reduce the number of global reductions and improve the PCG method in the barotropic mode in NEMO. The results showed a high decrease in the total execution time for the high-resolution simulation with NEMO and provided a reference for the OGCMs using PCG solver for developing effective measures to improve the scalability on large scales.

#### **2. Materials and Methods**

## *2.1. NEMO Model*

The Nucleus for European Modelling of the Ocean (NEMO) is a framework of oceanrelated engines, namely, OPA (O'cean PArall´elis´e) for the ocean dynamics and thermodynamics, LIM2 (Louvain-la-Neuve Ice Model version 2) for the sea–ice dynamics and thermodynamics, and TOP3 (Tracer in the Ocean Paradigm version 3) for the biogeochemistry **[9]**. The ocean component of NEMO was developed from the OPA model, as described in [12]. This model has been used for a wide range of applications, both regional and global, as a forced ocean model and as a model coupled with the sea-ice and/or the atmosphere. The barotropic mode optimized in this study is a procedure that solves the free-surface, and associated barotropic velocity equations belong to the OPA model.
