**2. Methodology**

#### *2.1. Framework of J-STREAM Phase I*

A model inter-comparison project in Japan, J-STREAM, was initiated in 2016. One aim of J-STREAM is to investigate di fferences in simulated concentrations of secondary atmospheric pollutants such as PM2.5 components and ozone over urban areas in Japan due to di fferences between model frames and/or model settings, including boundary and inputted conditions and physical and chemical mechanisms. Detailed model settings are described below. Furthermore, these including an introduction of J-STREAM can be found in previous research for the overview [5] and the performance on ozone [13].

The main target of the first phase of J-STREAM (J-STREAM Phase I) is to evaluate the general performances of participant models on secondary atmospheric concentrations over urban areas in Japan. Daily concentrations of PM2.5 components in each season among others were treated as subjects of evaluation in this paper. The enhanced simulation periods of J-STREAM Phase I were the spring of 2013, 27 April–26 May 2013, the summer of 2013, 12 July–10 August 2013, the autumn of 2013, 11 October–9 November 2013, and the winter of 2014, 10 January–8 February 2014, which corresponded to the seasonal periods of the national observation frame for PM2.5 components (Table 1). The detailed evaluations and additional experiments for individual participant models can be found in [14].

**Table 1.** Dates of enhanced simulation periods for model evaluations including a simulation spin-up.Updated from the overview of Japan's study for reference air quality modeling (J-STREAM) [5].


Four nested model domains, d01, d02, d03, and d04, on a Lambert conformal map projection were employed in the J-STREAM project [5]. The finest domains, d03 and d04, with a 5 × 5 km grid, cover the major city clusters in western Japan, including Osaka, Kobe, Kyoto, and Nagoya, and the Tokyo metropolitan area, respectively. Simulated concentrations in the d03 and d04 domains were used for model evaluations, and the results are discussed in following sections. Figure 1 shows the d03 and d04 domains, including the locations of ambient APMSs (AAPMSs), for which simulated concentrations were evaluated via comparisons with observations.

**Figure 1.** Finest model domains for J-STREAM, d03 (**a**) and d04 (**b**). d03 covers major city clusters located in western Japan including Osaka, Kobe, Kyoto, and Nagoya. d04 covers the Tokyo metropolitan area. The red circles indicate the locations of ambient air pollution monitoring stations (AAPMSs). The black circles indicate the locations of the meteorological observation stations of the Japan Meteorological Agency (JMA).

#### *2.2. Baseline Meteorological Model Configurations*

The baseline meteorological simulation for J-STREAM Phase I was performed by the Weather Research and Forecasting (WRF) model, using the Advanced Research WRF (ARW) Version 3.7.1 [15]. The WRF inputted data were acquired from the National Centers for Environmental Prediction Final Operational Model Global Tropospheric Analyses (ds083.2) with a 1 × 1 degree resolution [16] and the Real-Time, Global Sea Surface Temperature High-Resolution (RTG\_SST\_HR) analysis with a 1/12 × 1/12 degree resolution [17] and a temporal resolution of 6 h for the initial and boundary conditions. The horizontal configurations of the one-way nested model domains, d01, d02, d03, and d04, are 220 × 170 grids with a 45-km horizontal resolution, 154 × 160 grids with a 15-km resolution, 82 × 61 grids with a 5-km resolution, and 64 × 70 grids with a 5-km resolution, respectively. The vertical grid structure consists of 31 layers from the surface to the model top (100 hPa). Five grids were trimmed off each of the four lateral boundaries for the offline CTMs. The physics parameterizations applied in this model included the WRF Single-Moment 5-class scheme [18], the Radiative Transfer Model (RRTM) [19] for a longwave radiation scheme, the Dudhia scheme [20] for a shortwave radiation scheme, the Noah Land Surface Model [21], the Mellor-Yamada Nakanishi and Niino surface layer scheme level 2.5 [22], and the Kain-Fritsch convective parameterization [23] for d01 and d02. No convection parameterization was used for the 5-km domains. The grid-nudging four-dimensional data assimilation technique was employed for wind, temperature, and water vapor from level 11 (approximately 2 km) to the top of the model at 100 hPa with the nudging coefficients of 1.0 × 10−<sup>4</sup> and 0.5 × 10−<sup>5</sup> s<sup>−</sup><sup>1</sup> for d01 and d02, respectively. Most of the participant CTMs employed baseline meteorological fields, while others employed the meteorology based on different model settings. The differences in the model settings in some participant models are described in Section 2.3.

The baseline meteorological fields were compared with hourly observations of the Japan Meteorological Agency (JMA) for the observation stations within d03 and d04 (Figure 1) over four seasons: the spring of 2013 (11–26 May 2013), the summer of 2013 (27 July–10 August 2013), the autumn of 2013 (25 October–9 November 2013), and the winter of 2014 (24 January–7 February 2014) (Figures 2 and 3). The hourly observed and modeled meteorological variables were averaged for all meteorological observatories for each domain.

**Figure 2.** Spatially averaged precipitation, temperature, wind direction, and wind speed over four seasons. These results are based on hourly observed values and simulated at all JMA stations for d03.

**Figure 3.** Spatially averaged precipitation, temperature, wind direction, and wind speed over the four seasons. These results are based on hourly observed values and simulated at all JMA stations for d04.

The WRF using the baseline setting can generally simulate the observed meteorological conditions well. Meanwhile, WRF tended to overestimate the observed wind speeds. This was likely affected by the sparse horizontal resolution and coarse land information. The simulation performance of wind patterns was slightly better for d04 than that for d03 (Figures 2 and 3). However, simulated precipitation timing and their amounts were consistent with the observations (Figures 2 and 3).

#### *2.3. Chemical Transport Model Configurations*

A total of 32 simulations were performed using three types of regional CTMs in J-STREAM Phase I: Community Multiscale Air Quality (CMAQ) [24], Comprehensive Air quality Model with eXtensions (CAMx) [25], and Weather Research and Forecasting-Chemistry (WRF-Chem) [26]. Table 2 presents the configurations of the employed models. All participants conducted J-STREAM simulation under their own usual simulation conditions. The CMAQ group (M01–M28) included several versions, i.e., chemical mechanisms: Statewide Air Pollution Research Center mechanism (SAPRC) 99 [27], SAPRC07 [28], Carbon Bond (CB) 05 [29], and Regional Atmospheric Chemistry Mechanism (RACM) 2 [30], and three types of CMAQ aerosol calculation techniques [31]: aero5, aero6, and aero6 with the volatility basis set (VBS) approach [32]. These CMAQ aerosol calculation techniques employed ISORROPIA Version 1 [33,34] as an aerosol thermodynamic model and the second version of ISORROPIA (ISORROPIA Version 2) for updating the crustal species thermodynamics, the speciation schemes, and the SO4<sup>2</sup>− formation pathway [35] for versions after 5.0. The basic techniques of aero5 and aero6 include secondary organic aerosol (SOA) formation processes based on empirical parameters for SOA yields [36]. Major or minor updates were reflected in the chemical and aerosol mechanisms in the later versions. One CAMx model (M29) applied in J-STREAM used the SAPRC07 chemistry and the coarse and fine aerosol scheme treating both static coarse and fine mode aerosols [37]. The WRF-Chem group (M30–M32) included two Versions (3.8.1 and 3.7.1) that employed RADM2: the aerosol module of the Modal Aerosol Dynamics Model for Europe (MADE) [38] and the SOA Model (SORGAM) [39].

As described in detail in an overview article on J-STREAM [5], participants were requested to run CTM simulations during the enhanced target periods of four seasons for d03 or d04. As shown in Table 2, the simulations for some participant models began at d01 (M02, M03, M07–M15, M20, and M30–M32), but others began from the more inner domains. Fifteen participant models (M01–07, M14, M15, M21–M24, M26, M29, and M30) submitted their results for both domains for all four seasons, but the other participants submitted results for only selected seasons, with the highest number of model results for the summer of 2013.

Initial concentrations on the first day of each season and boundary concentrations throughout the entire target period were generated in the simulation using the M15 setting via CMAQ Version 5.0.2 with the SAPRC07–aero6 mechanisms for d01 and d02. Boundary concentrations for d01 of M15 were obtained from results for a chemical atmospheric general circulation model designed for studying atmospheric environment and radiative forcing, CHASER [40] for the Hemispheric Transport of Air Pollution (HTAP) Version 2 [41]. In J-STREAM Phase I, model-ready mosaic emission data corresponding to all participant chemical–aerosol mechanisms involving multiple emission inventories and results from an emission model for biogenic volatile organic compounds were provided: HTAP Version 2.2 [42] and Global Fire Emissions Database Version 4.1 [43] for Asian anthropogenic emissions, the Japan Auto–Oil Program (JATOP) emission inventory database (JEI-DB) [44], the updated JEI–DB [5], and Sasakawa Peace Foundation emissions for ships for Japanese anthropogenic emissions, volcanic emission data from Aerosol Comparisons between Observations and Models (AeroCom) [45] and JMA [46], and estimations obtained by using Model of Emissions of Gases and Aerosols from Nature Version 2.1 [47]. Most participant CTMs used model-ready input data; however, some participant CTMs performed simulations in their own emission frames. M03 used EAGrid2010-JAPAN [48], and M20 and M27 used EAGrid2000-JAPAN [49] for anthropogenic emissions in Japan. For the Asian scale anthropogenic emissions, M20 employed NASA INTEX-B [50] instead of HTAP Version 2.2. Additionally, some CTMs employed di fferent emission injection heights. The Model for Ozone and Related chemical Tracers Version 4 (MOZART-4) [51], for instance, was used as boundary conditions in some model settings.

As mentioned in Section 2.2., most of the participants employed the baseline meteorological fields; however, other CTMs (M07, M20) used WRF-ARW outputs based on their own conditions, including physical options, parameterizations, and a fine input meteorological analysis data, which is the grid point value derived from the mesoscale model (GPV MSM) data by JMA.


**Table 2.** Configurations of participant chemical transport models (CTMs) submitted for J-STREAM Phase I, updated from an overview of J-STREAM [5].

**Table 2.** *Cont.*


1 "o" indicates the domains that participants used to conduct their simulations. 2 Input emissions. "o" indicates that the baseline model-ready emission is used. "E1" uses EAGrid2010-JAPAN [48] and HTAP Version 2.2 [42]. "E2" uses EAGrid2000-JAPAN [49] and NASA INTEX-B [50]. "E3" uses EAGrid2000-JAPAN [49]. 3 Boundary concentration. "o" indicates that the baseline boundary concentration is used. "M" uses MOZART-4 [51]. "D" uses CMAQ defaults. 4 Meteorological condition. "o" indicates that the baseline metrological condition is used. "W" uses the meteorology simulated using WRF-ARW with own conditions, including physical options, parameterizations, and meteorological reanalysis. "WC" indicates the meteorology simulated using WRF-Chem with own conditions including physical options and parameterizations. 5 "o" indicates data submitted for d03 and d04 in each season. "su" was submitted for only summer. 6 NH4 + and total PM2.5 were not submitted.

#### **3. Observational Data for Model Evaluation**

A monitoring framework of ambient PM2.5 components was initiated in the fiscal year 2011 under the Japan governmen<sup>t</sup> initiative [4]. Over a period of at least two weeks set for each season, 1-day accumulated concentrations of PM2.5 components, including ions (e.g., SO4 2−, NO3 −, and NH4 +), inorganic elements (e.g., Na, Al, K, and Ca), and carbonaceous aerosols (EC and OC), were monitored using the filter pack method at selected stations from three types of APMSs, including AAPMSs, roadside APMSs (RAPMSs), and background monitoring stations (BGMSs). PM2.5 mass concentrations determined gravimetrically by weighing the filters were employed as the PM2.5 mass concentration in this paper. Monitoring data from valid AAPMSs that obtained data for each PM2.5 component over a period of at least eight days (53%) from each target period (up to 15 days) per each station were used to evaluate the performances of the participant CTMs. The number of valid AAPMSs was 16–22 stations for each domain and season. The data acquisition rate was highest in the summer, while a poor data acquisition rate was found for NO3 − in autumn. Observed gaseous pollutants at these AAPMSs were also used to evaluate the simulated nitric oxide (NO), nitrogen dioxide (NO2), and sulfur dioxide (SO2).

Figures 4 and 5 present observed daily concentrations for PM2.5 components, i.e., SO4 2−, NO3 −, NH4 +, EC, and OC, and total PM2.5 mass for each 12- to 15-day seasonal period at the AAPMSs for d03 and d04, respectively. The box-and-whisker and black dots (outliers) means the di fferences between AAPMSs in each domain.

In general, the concentrations of the total PM2.5 and its components within a single domain exhibit similar day-to-day variabilities for each season. However, the frequency distributions of daily concentrations between the AAPMSs in each domain were enhanced, particularly for elevated concentrations (Figures 4 and 5). Therefore, the spatially averaged concentrations obtained from daily monitoring data for di fferent AAPMSs within each domain were used for time series analysis hereafter.

For d03, i.e., western Japan, the seasonal-average total PM2.5 concentrations were 17.3, 23.1, 20.3, and 19.6 μg/m3, with maximum daily concentrations of 31.9, 37.6, 36.3, and 34.3 μg/m<sup>3</sup> for spring, summer, autumn, and winter. The summer PM2.5 concentration was slightly higher than those for the other seasons; however, the seasonal characteristics of the PM2.5 concentration are unclear. SO4 2− was a dominant PM2.5 component, accounting for approximately 40% (9.1 μg/m3) of the total PM2.5 mass concentration in the summer. Meanwhile, from autumn to winter, the ratios of NO3 −and OC to total PM2.5 mass increased. The ratios of the five major PM2.5 components were similar, with values of 12%–19% in the winter. On the dates when PM2.5 was elevated, the AAPMS di fferences in PM2.5 concentration levels increased, and considerably high PM2.5 was found at AAPMSs placed at major cities: Osaka and Nagoya. These results were compared with those from rural areas.

**Figure 4.** Box-plots of observed daily concentrations of total particulate matter with a diameter of 2.5 μm or less (PM2.5) and its components: (**a**) sulfates (SO4<sup>2</sup>−), (**b**) nitrates (NO3−), (**c**) ammonium (NH4+), (**d**) elemental carbon (EC), (**e**) organic carbon (OC), and (**f**) total PM2.5 mass, at AAPMSs within d03 for the four seasons. The open circles indicate spatially averages obtained from daily concentrations observed at AAPMSs within d03. The black dots indicate the outliers. The box-and-whisker and outliers represent the frequency distributions of daily concentrations observed at AAPMSs in d03. D presents the numbers of days with available observations, and N presents the number of AAPMSs.

**Figure 5.** Box-plots of observed daily concentrations of total PM2.5 and its components: (**a**) SO42–, (**b**) NO3<sup>−</sup>, (**c**) NH4<sup>+</sup>, (**d**) EC, (**e**) OC, and (**f**) total PM2.5 mass, at AAPMSs within d04 for the four seasons. The open circles indicate spatially averages obtained from daily concentrations observed at AAPMSs within d04. The black dots indicate the outliers. The box-and-whisker and outliers represent the frequency distributions of daily concentrations observed at AAPMSs in d04. D presents the numbers of days with available observations, and N presents the number of AAPMSs.

For d04, the Tokyo metropolitan area, which is several hundred kilometers east of d03, the day-to-day changes in the concentrations of total PM2.5 and its components were similar to those for d03; however, the seasonal-average concentrations: 15.2, 18.6, 17.9, and 19.3 μg/m<sup>3</sup> for spring, summer, autumn, and winter, were slightly lower than those for d03; whereas the maximum daily concentrations were 26.3, 35.4, 41.9, and 46.8 μg/m3. The elevated daily concentrations were obviously higher than those for d03 in the autumn and winter. Wintertime PM2.5 concentrations were slightly higher than those in the other seasons, with increased daily concentrations; however, the seasonal characteristics of the PM2.5 concentration were unclear for d04. The daily variabilities of the total PM2.5 mass were characterized by SO4<sup>2</sup>− in spring and summer, where the ratios of SO4<sup>2</sup>− to total PM2.5 mass were 32% (4.8 μg/m3) and 39% (4.8 μg/m3), respectively. In autumn, the ratios of the other PM2.5 components, including OC, NO3<sup>−</sup>, and NH4<sup>+</sup>, to the total PM2.5 mass increased. The OC and NO3−concentrations were both higher than the SO4<sup>2</sup>− concentration in winter. In particular, for the first pollutant peak on 25 January, OC and NO3− were dominant, accounting for 22% (9.2 μg/m3) and 23% (9.6 μg/m3) of the total PM2.5 mass concentration, respectively. For the peak on 2 February, NO3− was dominant, accounting for 22% (9.2 μg/m3). SO4<sup>2</sup>− was the dominant PM2.5 component throughout the year for d03, but for d04, OC and NO3− levels were higher than SO4<sup>2</sup>− levels in the winter. On the dates PM2.5 elevated, the AAPMSs differences of PM2.5 concentration levels were increased, and the considerably high PM2.5 were found at the AAPMSs placed on the central area of d04, i.e., the Tokyo metropolitan area.

#### **4. Results and Discussion**

#### *4.1. Hourly Concentrations of Primary Pollutants*

Major gaseous pollutants were also monitored at the AAPMSs. Figures 6 and 7 present the spatial averages of observed and simulated hourly concentrations of NO, NO2, and SO2 from different AAPMSs for d03 and d04, respectively. Table 3 summarizes the ensemble performances of the participant CTMs at each AAPMS for each season.

**Figure 6.** Spatial averages of observed and simulated hourly concentrations for (**a**) NO, (**b**) nitrogen dioxide (NO2), and (**c**) sulfur oxide (SO2) from different AAPMSs within d03 over the four seasons.

**Figure 7.** Spatial averages of observed and simulated hourly concentrations for (**a**) NO, (**b**) NO2, and (**c**) SO2 from different AAPMSs within d04 over the four seasons.

**Table 3.** Observed and simulated averaged concentrations 1 and ensemble performances 2 of the participant CTMs for hourly concentrations of NO, NO2, and SO2 at each AAPMS in each season.


1 Observation MEAN calculated from hourly value at each AAPMS in d03 and d04 for each season, respectively. Model MEAN calculated from seasonal averages from hourly value in each CTM corresponding to available observations at each AAPMS in d03 and d04. 2 Ensemble means of NMB (normalized mean bias), Correl (correlation coefficient), IoA (index of agreement), N (the number of available observation stations) calculated from all pairs of observations and simulations for each AAPMS and CTM in d03 and d04, respectively.

In general, most CTMs showed good agreemen<sup>t</sup> with the observed concentration levels of NO and NO2 for each season, with regular diurnal patterns of NO in the warmer seasons. However, none of the models fully reproduced the elevated concentrations, e.g., for 19–20 May and 3 November (NO and NO2) and 30 January (NO), with differences of 50%–200% between the observations and models, among others; the models tend to overestimate the observed daily maximums of NO: around 10–20 ppbv in spring and around 10–30 ppbv in summer, by a factor of 2.

For midnight on 30 January, all participant CTMs could not simulate the considerably increased level of NO before the rapid NO decrease associated with airmass changes, although all participants reproduced the NO decrease well. This suggests that CTMs successfully simulated the concentration change owing to meteorological changes in the synoptic scale but failed to simulate an increase in the amounts owing to local scale meteorological changes such as the strong atmospheric stability, especially during colder seasons. All models tended to overevaluate the daytime NO reduction. In particular, two WRF-Chem types (M30 and M31) and M05 produced strikingly low constant values, 0.001 or 0.000 ppbv, during the daylight hours in summer and autumn. The normalized mean bias (NMB) for both domains produced a strong underestimation of NO (approximately −40% to −50%), except during the spring. Underestimates of NO at remote stations in Japan have been observed for regional CTMs, as reported by MICS–Asia III results [9], and the correlations and index of agreemen<sup>t</sup> (IoA) values ranged from 0.18 to 0.43 and 0.41 to 0.51, respectively. The performance levels of each model exhibited substantial differences between both domains and seasons. The differences between seasons are likely related to meteorology simulation abilities, but the reasons for the differences appearing between domains are unclear in this stage.

The differences for NO2 in each model were large. Among these models, M31, M32, and M30 tended to overestimate elevated NO2 levels. The lower levels of NO2 obtained by M30 were often comparable to the NO2 concentration obtained by M03, which provided considerably lower NO2 concentrations compared to other models. These results sugges<sup>t</sup> that the differences in meteorological conditions and NOx chemistry in each model produced the NO2 discrepancy between the models. Most of the models produced better results for NO2 than for NO, with ensemble averages of seasonal statistics, e.g., correlation values, of 0.56 (d03) and 0.55 (d04), 0.72 (d03), and 0.71 (d04), particularly in the winter.

Over the year, most models obviously overestimated the observed SO2, with an ensemble bias of 1.7–4.2 ppbv (NMB: 120%–350%) for d03 and 1.5–2.5 ppbv (NMB: 160%–470%) for d04. In addition, relatively high SO2 levels were found for M30, M31, and M32. Meanwhile, M03 and M20 tended to produce lower concentrations compared to the other models, with a negative bias of −1.3 ppbv (M03) and −0.2 ppbv (M20) recorded especially in the spring; and exhibited better performances (IoA: 0.58–0.59) over the other models (IoA: 0.30–0.39), especially in the winter. The input SO2 emissions into two CMAQ simulations (M03 and M20) differed from SO2 emissions of J-STREAM. For example, SO2 emissions in both total and bottom layers of J-STREAM were more than twice those of M03 for d03, respectively. Meanwhile, for d04, including active volcanos, although the total SO2 emissions of J-STREAM were half those of M03, the bottom layer SO2 emissions of J-STREAM were 1.3 times those of M03. The differences in divided SO2 emission amounts in the lower layers possibly affected the simulated atmospheric SO2 concentrations. The second-best model setting, M03, performed slightly better (IoA: 0.41) than other models, which suggests that atmospheric SO2 concentrations were considerably affected by the input emission conditions, including the injection heights. Although modifications of emission conditions help to produce better SO2 simulation, using modifications alone to resolve the overestimation of SO2 (up to 470%) is not realistic.

The differences among models with respect to emissions, chemistries, and meteorological conditions led to major differences in simulated primary pollutant concentrations; moreover, the simulated differences between similar model settings increased in the winter.

#### *4.2. Simulated Daily Concentrations of PM2.5 Components and Total PM2.5 Mass*

Figures 8 and 9 present spatially averages obtained from observed and simulated daily concentrations for PM2.5 components (SO4<sup>2</sup><sup>−</sup>, NO3<sup>−</sup>, NH4<sup>+</sup>, EC, and OC) and total PM2.5 mass for different AAPMSs in d03 and d04, respectively. The seasonal ensemble performances of the participant CTMs at each AAPMS are also summarized as statistics in Tables 4 and 5 for each domain. The goal and criteria levels for CTM performance statistics, NMB, normalized mean error (NME), and correlation were recommended by Emery et al. [52], and the fractional bias (FB) and fractional error (FE) were recommended by Boylan and Russell [53], which is listed in Table A1. Individual model performance reports of each CTM are shown in Tables A2 and A3.

**Figure 8.** Spatially averaged concentrations of PM2.5: (**a**) SO4<sup>2</sup><sup>−</sup>, (**b**) NO3<sup>−</sup>, (**c**) NH4<sup>+</sup>, (**d**) EC, (**e**) OC, and (**f**) total PM2.5 mass over the four seasons. These results are based on daily concentrations observed and simulated for AAPMSs in d03. The thick solid lines with open circles present observations, and the colored lines present model results.

**Figure 9.** Spatially averaged concentrations of PM2.5: (**a**) SO4<sup>2</sup><sup>−</sup>, (**b**) NO3<sup>−</sup>, (**c**) NH4<sup>+</sup>, (**d**) EC, (**e**) OC, and (**f**) total PM2.5 over the four seasons. These results are based on daily concentrations observed and simulated for AAPMSs in d04. The thick solid lines with open circles present observations, and the colored lines present model results.


**Table 4.** Observed and simulated averaged concentrations 1 and ensemble performances 2 of the participant CTMs for daily concentrations of total PM2.5 and PM2.5 components at each AAPMS within d03.

1 Observation MEAN calculated from daily value at each AAPMS in d03 for each season. Model MEAN calculated from seasonal averages from daily value in each CTM corresponding to available observations at each AAPMS in d03. 2 Ensemble mean of MB (mean bias), ERROR (mean error), RMSE (root mean square error), NMB (normalized mean bias), NME (normalized mean error), FB (fractional bias), FE (fractional error), Correl (correlation coefficient), IoA (index of agreement), N (the number of available observation stations) calculated from all pairs of observation and simulation for each AAPMS and CTM in d03. Observation data from valid AAPMSs that obtained data for each PM2.5 component over a period of at least eight days (53%) from each target period (up to 15 days) per each station were used to evaluate the performances (MB, ERROR, RMSE, NMB, NME, FB, FE, Correl, and IoA) of the participant CTMs.

**Table 5.** Observed and simulated averaged concentrations 1 and ensemble performances 2 of the participant CTMs for daily concentrations of total PM2.5 and PM2.5 components at each AAPMS within d04.



**Table 5.** *Cont.*

1 Observation MEAN calculated from daily value at each AAPMS in d04 for each season. Model MEAN calculated from seasonal averages from daily value in each CTM corresponding to available observations at each AAPMS in d04. 2 E Ensemble means of MB (mean bias), ERROR (mean error), RMSE (root mean square error), NMB (normalized mean bias), NME (normalized mean error), FB (fractional bias), FE (fractional error), Correl (correlation coefficient), IoA (index of agreement), N (the number of available observation stations) calculated from all pair of observation and simulation for each AAPMS and CTM in d04. Observation data from valid AAPMSs that obtained data for each PM2.5 component over a period of at least eight days (53%) from each target period (up to 15 days) per each station were used to evaluate the performances (MB, ERROR, RMSE, NMB, NME, FB, FE, Correl, and IoA) of the participant CTMs.

With SO4<sup>2</sup>− as a dominant PM2.5 component, most CTMs showed good agreemen<sup>t</sup> with daily concentration levels and day-to-day changes in both domains for each season, with the exception of a few model settings. Overall, the ensemble statistics, including the NMB (−0.85, 1.65%), NME (30.34, 29.11), FB (3.66, −13.77%), FE (30.41, 34.28), and correlation (0.74, 0.86), passed the goal level in d03 for summer and autumn. For d04, the NMB (−7.5%), NME (30.34), FB (−13.04%), FE (32.83%), and correlation (0.84) passed the goal level for summer. With the exception of d03 in winter and d04 in summer, the correlation and IoA indicated excellent performance, with maximum values of 0.74–0.88 and 0.79–0.87 for d04 in winter. Most CTMs underestimated the observed SO4<sup>2</sup>− in d04 on 29–30 July, with relatively low values for the correlation (0.36) and IoA (0.52). This result may lead to underestimations of the total PM2.5 mass in connection with the NH4+ concentrations. WRF-Chem (M30, 31) clearly overestimated SO4<sup>2</sup>− concentrations in PM2.5 due to the SO4<sup>2</sup>− mass build-up problem associated with the nucleation calculation in MADE/SORGAM [54]. In addition, the WRF-Chem group employed their own physical parameterizations such as cumulus convection and microphysics for their meteorological simulations. Additional sensitivity simulations for meteorological fields are required to quantitatively evaluate the model inter-differences of SO4<sup>2</sup>− and total PM2.5 mass concentrations owing to the differences in meteorological simulations. We will perform this in the next phase. The largest positive biases were found in M31, with 3.0–9.7 μg/m<sup>3</sup> (NMB: 52%–177%) for d03 and 3.8–10.2 μg/m<sup>3</sup> (NMB: 131%–240%) for d04. These simulated overestimations were slightly higher for CMAQ Version 4.7.1 (M27 and M28), particularly for d04 in spring. This trend indicates that the updated sulfur chemistries in CMAQ Version 5.0 [35,55–58] enhanced the performance of this model compared to the previous versions. In winter, CAMx (M29) performed better, with biases of −0.32 μg/m<sup>3</sup> (NMB: −7.3%) for d04 and 0.39 μg/m<sup>3</sup> (NMB: 3.4%) for d04 under the same emission condition. This result is attributed to an underestimation of SO4<sup>2</sup>− by the dominant participant model, CMAQ, which may be caused by an inadequate aqueous-phase SO4<sup>2</sup>− production by Fe- and Mn-catalyzed O2 oxidation [14].

All participant CTMs overestimated NO3− levels in warmer seasons, with ensemble biases of 1.22–1.55 μg/m<sup>3</sup> (NMB: 194%–651%) for d03 and 0.85–1.99 μg/m<sup>3</sup> (NMB: 145%–588%) for d04. The largest positive biases were found in summer. Above all, M20, M30, and M31 strongly overestimated elevated NO3−levels. Only M11 showed relatively good agreemen<sup>t</sup> with observations for d03 in summer, with a minimum bias of 0.12 μg/m<sup>3</sup> (NMB: 91%) and improved values for the correlation (0.46) and IoA (0.54). However, M11 also produced low concentrations for SO4 2− and NH4 +. As observed for d04 in autumn, all models exhibited better performance for the daily concentration levels and day-to-day changes in NO3 −. For example, M30 has a minimum bias of 0.14 μg/m<sup>3</sup> (NMB: 11%), which passed the goal NMB level for 24-h NO3 −. Some deviations in NO3 − between observations and the models were attributed to NH4 + and potentially NH4NO3. In winter, most models reproduced day-to-day changes in both domains but tended to underestimate elevated NO3 − levels, with ensemble mean biases of −0.89 μg/m<sup>3</sup> (NMB: −18.9%) and −2.36 μg/m<sup>3</sup> (NMB: −42.8%). A previous model inter-comparison study for the Tokyo metropolitan area, UMICS, concluded that the participant models overestimated NO3 − levels in both summer and winter [11,12], although available observations included only one winter and three summer stations. In our validations, most models produced higher NO3 − levels in spring and summer, lower NO3 − levels in winter, and moderate NO3 − levels in autumn, compared with accumulated observation data for d03 and d04. This result is expected to be more accurate than previous reports because a greater number of observations (for 18–22 stations) were included.

As mentioned above, the day-to-day variations in NH4 + were consistent with those of SO4 2− and NO3 −. Therefore, most CTMs showed good agreemen<sup>t</sup> with daily concentration levels and day-to-day changes in both domains for each season, with the exception of some elevated peaks. Above all, the ensemble performances indicators, FE and FB, were −27.9%–8.9% and 34.3%–41.1%, thus passing the goal level in both domains for all seasons except winter. Notably, the di fferences among models increased in summer. Two WRF-Chem models (M32, M31) predicted higher NH4 + levels, with biases of 1.96–3.03 μg/m<sup>3</sup> (NMB: 84%–130%) and 1.72–2.71 μg/m<sup>3</sup> (NMB: 85%–61%) for d03 and d04, respectively. The M20 model, which employed EAGrid for emissions and an original configuration for meteorology, also produced relatively high NH4 + levels in d03, with a bias of 1.68 μg/m<sup>3</sup> (NMB: 51%). These overpredictions were likely associated with those of SO4 2− and NO3 − in summer. Meanwhile, relatively larger negative biases were found for M11, at −1.18 μg/m<sup>3</sup> (NMB: 35%) for d03 and −0.81 μg/m<sup>3</sup> (NMB: 33%) for d04.

The EC levels simulated by most CTMs were considerably lower than the observations in both domains for all season. The model ensemble biases were −0.90 to −0.20 μg/m<sup>3</sup> (NMB: −46% to −22%) and −2.77 to −0.39 μg/m<sup>3</sup> (NMB: −58% to −40%) for d03 and d04, respectively, with larger values for Tokyo. Both models employing EAGrid2000-JAPAN (M20 (d03) and M27 (d04)) produced higher EC values than other CTMs with di fferent emission settings, and relatively better NMB values were obtained, at −20% to −3% and −35% to 42%, respectively. This trend suggests that the EC emissions of J-STREAM might be underestimated.

The CTMs reproduced some of elevated OC levels in the warmer seasons, but clearly underestimated the observed OC levels for autumn and winter, with model ensemble biases of −1.78 to −0.01 μg/m<sup>3</sup> (NMB: −42% to 7%) and −2.77 to −0.81 μg/m<sup>3</sup> (NMB: −59% to −39%) for d03 and d04, respectively, which are similar to the EC values. Additionally, as observed for the EC, the negative biases of OC for the Tokyo area were larger than those for western Japan. However, the negative biases of all participant CTMs have been clearly moderated compared with the UMICS cases [11,12]. Among the models, M02, M03, and M11 predicted relatively higher OC levels and overestimated the summer OC concentrations. Full-domain nesting simulations were performed via M02 and M03 using a relatively recent CMAQ model (Version 5.1), which includes updates for some chemical and aerosol mechanisms, such as POA aging, SOA mass yields with new pathways from isoprene, alkanes, and PAHs, and SOA formation reactions in the aqueous-phase chemistry. Continual nesting simulations for the Asian scale (d01) performed by CMAQ Version 5.1 exhibited higher regional-scale OC levels, leading to higher OC levels in urban areas in Japan compared with previous versions. Thus, an empirical SOA yield model can predict the same OC concentration level as the VBS model M11. It should be noted that e ffect of the updated SOA yield mechanisms was not clear at the urban scale when using CMAQ Version 5.1 or higher (e.g., M01, M04–05). Additionally, to evaluate simulated OC concentrations, more observational data are needed.

Overall, most CTMs showed good agreemen<sup>t</sup> with observed concentration levels of total PM2.5 mass in both domains for each season. These results are likely associated with the reproducibility of some dominant components, e.g., SO4 2− and NH4 +. Moreover, CTMs tended to fail at reproducing some heavily polluted situations and underestimated the considerably high PM2.5 concentrations (approximately 40–50 μg/m3). A considerable underestimation (≈30 μg/m3) of total PM2.5 associated with PM2.5 components, except for SO4 2−, was observed for d04 in the winter season, 25 January and 2 February; during that time, the nighttime simulated surface temperature was clearly lower than that in the observations (Figure 3). This implies that the simulated higher surface temperature compared with that in the observations formed weaker atmospheric stability, which produced weaker accumulations of particulate pollutants at nighttime, especially during colder seasons. The model ensemble biases were −8.66 to −0.99 μg/m<sup>3</sup> (NMB: −43% to −5%) for d03 and −2.91 to −11.98 μg/m<sup>3</sup> (NMB: −55% to −19%) for d04. The largest negative biases are found in winter due to underestimations of NH4NO3, particularly for d04. M31 and M32 tended to overpredict the total PM2.5 due to overestimates of inorganic compounds. Of the model ensemble statistics for d03, the NMB (−5%, 13%) NME (22%, 26%), FB (−9%, −17%), FE (26%, 29%), and correlation (0.81, 0.78) passed the goal level for 24-h total PM2.5 mass in spring and summer, respectively. In addition, the majority of the other statistical indicators passed the criteria levels as well.
