*2.5. Experimental Description*

The NEMO model optimized in this study was version 3.6. To assess the performance of the NEMO model and the validity of the model results after optimization using the CA-PCG method, two experiments were designed: a low-resolution experiment, named ORCA1, with a horizontal grid resolution of nearly 100 km (grid dimensions: 362 × 292) and a high-resolution experiment, named GYRE, with a horizontal grid resolution of nearly 5 km (grid dimensions: 7502 × 5002). The optimization of the NEMO model using the CA-PCG solver was mainly to reduce the communication costs. So, the performance improvement should be more effective on large-scale simulations than on a small scale. Thus, we used a small-scale experiment to validate the results' accuracy and a large-scale experiment to test the scalability of this approach. The flow chart of the total experiments is shown in Figure 3. First, a high-resolution experiment was designed to evaluate the performance of the NEMO model with a PCG solver. Second, a new solver, i.e., CA-PCG solver, was developed and applied to the NEMO model. The next step was to validate the accuracy of the model results after optimization by comparing them with those from ensembles with PCG solver using low-resolution experiments. Last, the improvement of the performance of the NEMO model with the CA-PCG solver was tested using a high-resolution experiment. The description of the parallel platform used in this study is provided in Appendix D.

**Figure 3.** Flow chart of the experiments in this study. The experiments were designed in order of step1, step2, step3, and step4.

## 2.5.1. Low-Resolution Simulation

ORCA1 is one of the most frequently used horizontal grid resolutions in NEMO. The grid was derived from https://forge.ipsl.jussieu.fr/nemo/ (accessed on 10 April 2021), while the bathymetry file was generated by referring to http://www.noc.ac.uk/ (accessed on 10 April 2021). The initial conditions of temperature and salinity were from a combination of the World Ocean Atlas (WOA) 2009 [14,15] and the Polar Science Center Hydrographic Climatology (PHC) version 3 (updated from [16]) data. We also used the WOA 2009 and PHC v3.0 temperature data as observations to validate the model results (details in Section 2.6). The atmospheric forcing data were from the climatology Coordinated Ocean-ice Reference Experiments phase II (CORE-II) forcing data set [17,18]. The time step used in this experiment was 3600 s (i.e., the rn\_rdt parameter in NEMO was set to 3600). Experiments using the PCG and CA-PCG solvers were designed, regarded as ORCA1\_PCG and ORCA1\_CAPCG, respectively. Besides, to validate the model results, an ensemble consisting of 101 ocean simulations using the PCG solver was also created.

## 2.5.2. High-Resolution Simulation

The idealized GYRE application in NEMO was presented to test and analyze the scalability of high-resolution simulation on a large scale. There was no file Input/Output other than the reading of the workload definition during initialization. This experiment used a time step of 7200 s (rn\_rdt = 7200). To guarantee that the total execution time of all of the experiments was no less than 3 min, the experiments were run for 1000 steps. Corresponding experiments using the PCG and CA-PCG solvers were also designed, regarded as GYRE\_PCG and GYRE\_CAPCG, respectively. Notably, each experiment was repeated three times, and the mean values of the execution time were discussed.

#### *2.6. Community Earth System Model (CESM) Port-Verification Tool*

The methodology we used to validate the results was the Community Earth System Model (CESM) port-verification tool (CESM-PVT), which was developed to determine whether a change in CESM would result in distinguished biases from the natural variability of the system [19]. First, in [19], an ensemble *E* = { *X*1, *X*2, ... *Xm*} consisting of 101 simulations, which differed only in a random perturbation of the initial temperature condition at the order of <sup>10</sup>−14, was run. At a given point *j*, we had a series of possible results for each variable *X* from the ensemble *E* = { *X*1(*j*), *X*2(*j*), ... *Xm*(*j*)}. Then, the mean and standard deviation of this series at a given point *j* was defined as *μ*(*j*) and *<sup>δ</sup>*(*j*), respectively. The root mean square z-score (RMSZ) of the ensemble data was calculated as follows.

$$RMSZ(X) = \sqrt{\frac{1}{n} \sum\_{j=1}^{n} \left(\frac{X(j) - \mu(\mathbf{j})}{\delta(\mathbf{j})}\right)^2} \tag{2}$$

where *n* is the total number of grid points in *X*. Moreover, the *RMSZ* score of the new case was also calculated. If this *RMSZ* fell within the distribution of the ensemble's *RMSZ*, then the result was considered to have passed the accuracy test. The details could refer to [11,19].

According to [19], this methodology not only benefits the model data of the CESM, but is also applicable for evaluating the data of other simulations. To verify the accuracy of the NEMO model using the CA-PCG solver, we also considered 101 simulations, which were the same as for the ORCA1\_PCG experiment, except for a random perturbation of the initial ocean temperature at the order of 10−14. The ensemble runs here were 10 years in length, which is a long enough duration for an ocean model to become stable after a disturbance.

To verify the result, we did some revisions to the CESM-PVT method. First, to increase the reliability, we treated observations (WOA + PHC data) instead of the mean value of ORCA1\_PCG ensemble members as the "reference value" here. This is because the only thing we cared about was whether the model results were consistent with observations, no

matter before or after optimization. Second, for each ensemble member, the global mean value was calculated first, and then the mean biases and standard deviations compared to the observed global mean value were obtained at each time point. The global mean value of the CA-PCG experiment and the mean bias were calculated at the same time. Finally, the mean biases of each ensemble member and the CA-PCG experiment were divided by the corresponding standard deviations at each time point to obtain the z-score biases, respectively. We revised the CESM-PVT method by comparing the global mean value instead of the point-to-point value because the biases may be magnified at a specific point. For example, when the results between the ORCA1\_PCG ensembles and observations are very close, any destabilization in the ORCA1\_CAPCG results may lead to a large deviation after being divided by these extremely small standard deviations. Verification using the global mean value would avoid this issue effectively. Besides, the global mean value is an effective indicator of ocean simulation evaluation. The equation is revised as follows.

$$Z(X) = \frac{\frac{\sum\_{j=1}^{n} X(j)}{n} - \mu(\mathbf{j})}{\delta(\mathbf{j})} \tag{3}$$

where ∑*n j*=1 *X*(*j*) *n* is the calculation of global mean value at a given point *j*. The *μ*(j) represents the value in observation. The *δ* bs is the standard deviation of the ensemble time series.
