Next Article in Journal
Modeling the Effect of Climate Change on Evapotranspiration in the Thrace Region
Previous Article in Journal
Effects of Climate Change on the Future Attractiveness of Tourist Destinations in Greece
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rapid PM2.5-Induced Health Impact Assessment: A Novel Approach Using Conditional U-Net CMAQ Surrogate Model

1
EH Research & Consulting Co., Ltd., E TechHive, 410 Jeongseojin-ro, Seo-gu, Incheon 22689, Republic of Korea
2
Civil and Environmental Engineering, College of Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Republic of Korea
3
Environmental Planning Institute, Seoul National University, Rm 303, Bldg 82, 1, Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
*
Author to whom correspondence should be addressed.
Atmosphere 2024, 15(10), 1186; https://doi.org/10.3390/atmos15101186
Submission received: 2 September 2024 / Revised: 25 September 2024 / Accepted: 30 September 2024 / Published: 2 October 2024
(This article belongs to the Special Issue Air Pollution: Health Risks and Mitigation Strategies)

Abstract

:
There is a pressing need for tools that can rapidly predict PM 2.5 concentrations and assess health impacts under various emission scenarios, aiding in the selection of optimal mitigation strategies. Traditional chemical transport models (CTMs) like CMAQ are accurate but computationally intensive, limiting practical scenario analysis. To address this, we propose a novel method integrating a conditional U-Net surrogate model with health impact assessments, enabling swift estimation of PM 2.5 concentrations and related health effects. The U-Net model was trained with 2019 South Korean PM 2.5 data, including precursor emissions and boundary conditions. Our model showed high accuracy and significant efficiency, reducing processing times while maintaining reliability. By combining this surrogate model with the EPA’s BenMAP-CE tool, we estimated potential premature deaths under various emission reduction scenarios in South Korea, extending projections to 2050 to account for demographic changes. Additionally, we assessed the required PM 2.5 emission reductions needed to counteract the increase in premature deaths due to an aging population. This integrated framework offers an efficient, user-friendly tool that bridges complex air quality modeling with practical policy evaluation, supporting the development of effective strategies to reduce PM 2.5 -related health risks and estimate economic benefits.

1. Introduction

Although air pollution levels, including those of particulate matter ( PM 2.5 ), have shown some improvement over time, the aging population poses a significant challenge. If current emission levels are maintained, the number of premature deaths attributed to PM 2.5 is expected to continue rising due to the increasing vulnerability of older adults [1,2,3]. Therefore, additional efforts to reduce emissions are necessary to counteract the rise in premature deaths associated with population aging. This study aims to address this issue by establishing a method for predicting PM 2.5 concentrations in response to emission reductions and estimating premature deaths based on these predicted concentrations. Specifically, we develop a model that incorporates the aging population’s impact on premature death rates and determines the additional emission reductions required to offset the increase in premature deaths.
The complex nature of PM 2.5 pollution in South Korea presents a significant challenge for air quality management [4,5]. PM 2.5 levels in South Korea are influenced by a combination of domestic emissions and transboundary pollution, primarily from China. While domestic emissions are a crucial target for mitigation efforts, the impact of foreign sources, which is beyond the country’s direct control, must also be considered [4,6,7,8,9]. To address this, the South Korean government is implementing a comprehensive strategy that includes regional assessments, advanced PM 2.5 concentration modeling, and dynamic emission reduction policies [10]. Emergency measures are also in place for severe pollution events [11]. Effective management requires the ability to rapidly simulate and evaluate multiple emission scenarios under varying conditions, including the impact of transboundary pollution. Therefore, there is a need for a PM 2.5 concentration prediction model that can account for the influence of boundary conditions from neighboring countries, such as China. Developing such models is essential for establishing rational, responsive, and effective policies for managing fine particulate matter and safeguarding public health in South Korea.
One of the most well-known air pollutant concentration simulation tools is the Community Multiscale Air Quality Modeling System (CMAQ), developed by the U.S. Environmental Protection Agency (EPA) [12]. CMAQ accurately simulates pollutant dispersion based on physical and chemical processes and is widely used by governments and organizations, including those in South Korea, for air quality management. While CMAQ offers the advantage of incorporating complex chemical reactions and providing sophisticated simulations, its high data management complexity, substantial computational demands, and lengthy processing times for each emission scenario pose significant challenges for efficient policy evaluation across multiple air quality scenarios [13]. To address these limitations, research has focused on developing surrogate models using statistical techniques like polynomial chaos and response surface modeling (pf-RSM) and deep response surface modeling (deepRSM) [14,15,16,17], as well as deep learning approaches such as Convolutional Neural Networks (CNN) and U-Net models [18,19]. These surrogate models have demonstrated the ability to emulate CMAQ with high performance. However, they still face challenges in terms of data handling complexity, which can limit their accessibility for non-experts in air pollution simulation—such as those focused on evaluating health impacts and formulating air pollution policies.
In this study, we present an innovative methodology that bridges the gap between complex air quality modeling and practical policy evaluation (Figure 1). Our approach integrates a high-speed CMAQ surrogate model based on a conditional U-Net architecture with the widely used Environmental Benefits Mapping and Analysis Program (BenMAP) health impact assessment tool [20,21]. This combined framework allows policymakers to swiftly assess the health impacts of various precursor emission strategies. The surrogate model, designed to emulate annual PM 2.5 concentrations in South Korea, uses adjustable precursor emission rates and boundary conditions as inputs. It demonstrates high performance and scientific validity, as confirmed by input–output relationship analysis. Notably, the model significantly reduces computational time, enhancing efficiency for specific applications. By leveraging this surrogate model and BenMAP pipeline, we can evaluate how human health impacts shift under different annual emission scenarios and boundary conditions, providing decision-makers with a more efficient and accurate tool for policy development.
The paper is organized as follows: Section 2 (Materials and Methods) provides details on data preparation, explains the conditional U-Net architecture, describes the training and evaluation processes, and outlines the health impact assessment approach using BenMAP. Section 3 (Results) presents the performance metrics of the surrogate model and the outcomes of the health impact assessments. Finally, Section 4 (Discussion) explores the implications of our findings, acknowledges the study’s limitations, and proposes directions for future research. This work represents a significant advancement toward more responsive and effective air quality management, especially in regions dealing with complex PM 2.5 pollution challenges.

2. Materials and Methods

2.1. Preparing Datasets

2.1.1. Community Multiscale Air Quality

In this study, we used the Community Multiscale Air Quality (CMAQ) modeling system, version 4.7, to simulate annual average air pollutant concentrations over South Korea for the year 2019. CMAQ, developed by the U.S. Environmental Protection Agency, is a state-of-the-art tool that integrates emissions, transport, chemical transformation, and deposition processes. Our model setup incorporated the SAPRC-99 chemical mechanism and included two nested domains: a Korean domain with a resolution of 9 km (82 × 67 grid cells).
To capture the vertical structure of the atmosphere, we used a 30-layer configuration in our simulations. The SMOKE-Asia system was used to process precursor emissions, using the CREATE v3.0 emissions inventory as the basis. Importantly, our study went beyond standard CMAQ simulations by incorporating varying boundary conditions. This approach allowed us to evaluate the effects of varying regional background pollutant levels on local PM 2.5 concentrations. We conducted multiple year-long simulations, each with different boundary conditions, to generate a comprehensive dataset of annual average PM 2.5 concentrations. This dataset not only captured the spatial variability of PM 2.5 across South Korea, but also reflected the influence of changing regional pollution patterns, providing a robust basis for training and evaluating our conditional U-Net model.

2.1.2. Control Matrix

When the original CMAQ simulation is run, the precursor emissions are represented as a two-dimensional map, and each grid is assigned the corresponding emission value. However, such data are difficult to use and are prone to limitations that reduce their usefulness. Here, we have created a one-dimensional control matrix that shows the activity of precursor emissions for each region as a percentage of the reference scenario. By systematically varying the activity levels of five precursor emission activities ( NO x , SO 2 , NH 3 , VOC , PM 2.5 ) in 17 different regions of Korea, 238 unique scenarios were created. Of the 238 scenarios, half of the data are simulated PM 2.5 concentrations with boundary conditions set at 0.5x. This approach allowed for a broad investigation of potential emission patterns and air quality impacts. To efficiently sample the vast parameter space of emission variations, we used Latin Hypercube Sampling (LHS). This statistical technique allowed us to generate a nearly random but well-distributed set of control factors for each emission sector and region. The control factors were varied between 0.5 and 1.5 times the baseline values, allowing for a comprehensive exploration of potential emissions changes while minimizing the number of simulations required. Of these 238 scenario data, 100 were used to train our model and the remaining 138 were used as test data for evaluation in the text. The training and evaluation data were carefully controlled to ensure there were no data omissions, and the data were randomly extracted. Table 1 shows an example of a control matrix.

2.2. Emulating CMAQ Simulation

2.2.1. Conditional U-Net Architecture

The basic concept of a conditional U-Net is an architecture used in the diffusion model, which has been widely used in the field of image generation in recent years, especially in the field of stable diffusion [22,23]. In Stable Diffusion, the conditional U-Net is used to predict the diffusion of noise in the latent tensor of a VAE-encoded image at a given time, using an integer representing the latent tensor and time step, and the prompt information encoded in the text encoder. The prompt information encoded by the text encoder is input. In this study, we have adapted the architecture of the conditional U-Net used in stable diffusion, where the input parameters are “the latent tensor of a certain image” and “the condition of a given time and prompt”, to a model that takes as input “the emission information from a particular scenario” and “the activity of the boundary condition”. We apply this modified structure to our emulator to capture the relationship between emissions/boundary activities and PM 2.5 concentration.
Figure 2 shows a schematic of the conditional U-Net architecture for emulating the annual PM 2.5 concentration calculated by the CMAQ simulation. This customized architecture processes two primary inputs: precursor activities (85-dimensional vector) and a boundary condition (scalar). The precursor activities are gridded to produce an initial 82 × 67 × 5 spatial representation, which is then resized to 96 × 64 × 5 to facilitate efficient CNN computation. This resized input passes through a series of CNN blocks that progressively reduce the spatial dimensions while increasing the feature depth. Simultaneously, the boundary condition is processed through Multi-Layer Perceptrons (MLPs), increasing its dimensionality to interact with the spatially encoded precursor information at multiple scales via element-wise addition. The network includes a bottleneck layer ( 12 × 8 × 40 ) for maximum feature compression, followed by a decoder path that upsamples features back to the original dimensions. Skip connections between the corresponding encoder and decoder layers preserve fine-grained spatial information. The final output is an 82 × 67 × 1 tensor representing the predicted annual PM 2.5 concentration, matching the spatial resolution of the CMAQ simulation. This architecture efficiently captures the complex relationships between precursor emissions, boundary conditions, and PM 2.5 concentrations, providing a computationally efficient alternative to full CMAQ simulations. It is worth noting that the hyperparameters shown in the figure, such as layer dimensions and feature depths, can be adjusted to meet specific problem requirements or computational constraints, providing flexibility in model customization.

2.2.2. Gridding Strategy

Our model uses 86-length vector (control matrix with eighty-five precursor activities and one boundary activity) as an input variable. To take advantage of the spatial feature extraction capabilities of convolutional neural networks (CNNs), we developed a preprocessing technique that transforms the input from a simple one-dimensional vector to a structured three-dimensional matrix. This approach involves mapping the data onto an 82 × 67 grid, reflecting the geographical characteristics of Korea, with each grid cell assigned the emission activities of the five precursors. The resulting 82 × 67 × 5 matrix serves as the input to our conditional U-net, which effectively preserves the spatial relationships between regions while capturing the emission patterns of each precursor. The boundary activity input is fed into the conditional layer without building a 2D map. See Figure 2.
This preprocessing is performed automatically by a custom-designed layer that holds index information for each local government as fixed variables and transforms the input one-dimensional control matrix into the appropriate two-dimensional structure. Users of the model can simply provide one-dimensional input data without having to understand the complexities of this transformation process. The administrative divisions and grid index information for Korea used in this study were referenced from [24]. By adopting this spatial structuring approach, our model can accurately capture the relationship between precursor emissions and PM 2.5 concentrations, which is calculated by CMAQ simulation, thus contributing to improved air quality predictions across South Korea.

2.3. Evaluation Methods

In order to increase the level of confidence in the emulation model presented in this study, and to show that there are no problems in using it for health impact assessment and policy making, quantitative and qualitative evaluations were performed simultaneously. The quantitative evaluation was conducted using several indices used to evaluate the predictive results of deep learning models. The qualitative evaluation analyzed the causal relationships between the inputs and outputs of the model and the contribution of the input parameters to show that the model successfully learned the physical and chemical processes between PM 2.5 and its precursors reported in previous studies.

2.3.1. Performance Metrics

To quantitatively assess the performance of our Conditional U-Net model in emulating CMAQ simulations, we used three widely used statistical metrics: Mean Absolute Error (MAE), Normalized Mean Absolute Error (NMAE), and Coefficient of Determination ( r 2 ). These metrics provide complementary insight into the model’s accuracy, scale-independent error, and explanatory power, respectively. The Mean Absolute Error (MAE) measures the average size of the errors in the predictions without regard to their direction. It is calculated as
M A E = 1 n i = 1 n | y i y ^ i |
where y i is the true value (CMAQ simulation result), y ^ i is the predicted value (Conditional U-Net output), and n is the number of samples. The MAE is expressed in the same units as the PM 2.5 concentrations (μ g / m 3 ), providing an intuitive measure of the average prediction error. Normalized Mean Absolute Error (NMAE) expresses the MAE as a percentage of the observed mean, providing a scale-independent measure of error:
N M A E = M A E y ¯ × 100
where y ¯ is the mean of the observations. NMAE allows for the comparison of model performance over different scales or data sets. The coefficient of determination ( r 2 ) quantifies the proportion of variance in the dependent variable that can be predicted by the independent variable(s). It is calculated as
r 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
r 2 ranges from 0 to 1, with 1 indicating perfect prediction. It provides insight into how well the model captures the variability in the CMAQ simulations. These metrics were calculated for both the training and test datasets to assess the performance and generalizability of the model. By using these complementary metrics, we aim to provide a comprehensive evaluation of our model’s ability to accurately emulate CMAQ simulations of PM 2.5 concentrations.

2.3.2. Contribution Analysis Using SHAP Value

To evaluate the relative importance of input variables in the Conditional U-Net CMAQ surrogate model, we used SHapley Additive exPlanations (SHAP) values. SHAP values provide a unified measure of feature importance based on coalitional game theory, providing consistent and locally accurate attributions that satisfy desirable properties such as local accuracy, missingness, and consistency [25]. This approach allows for the decomposition of a prediction into the additive contribution of each input feature, enabling both global feature importance analysis and detailed local explanations for individual predictions. The SHAP value for a feature i can be expressed as
ϕ i = S F { i } | S | ! ( | F | | S | 1 ) ! | F | ! [ f S { i } ( s S { i } ) f S ( x S ) ]
where F is the set of all features, S is a subset of features, and f x is the model output. This formula captures the average marginal contribution of feature i across all possible feature subsets. SHAP values are particularly useful in complex models like out conditional U-Net, where the interactions between input variables can be intricate and non-linear.
We used the PermutationExplainer algorithm to compute SHAP values for our conditional U-Net model. This model-independent method estimates SHAP values by permuting feature values and observing the effect on model output. By aggregating SHAP values across multiple predictions, we generated SHAP summary plots to visualize the magnitude, frequency, and direction of each input variable’s effect on model output. This analysis provided insight into the key drivers of PM 2.5 concentrations in our surrogate model, allowing for a nuanced understanding of the relative contributions of different input parameters to predicted air quality impacts.
SHAP values have been successfully used in several recent studies to interpret machine learning models for air pollution prediction and source attribution. For example, SHAP values have been used to explain predictions of NO 2 concentrations in Madrid [26] and to quantify the contributions of meteorological factors to PM 2.5 concentrations in Paris [27]. SHAP analysis has also been used to determine the relative importance of different variables in predicting PM 2.5 levels in Zhejiang Province, China [28]. More recently, SHAP values were used to analyze the contributions of different sources of air pollution events in Zibo, China [29] and to estimate the source contributions of NH 3 in Taichung City, Taiwan [30]. These studies demonstrate the growing use of SHAP values in air pollution research for proving interpretable insights from complex machine learning models. The successful use of SHAP analysis in different geographical regions and for different air pollutants supports its validity as an analytical tool in this field.

2.4. Health Impact Assessment

2.4.1. BenMAP-CE Methodology

The health benefits associated with changes in air pollution concentrations were evaluated using the BenMAP-CE tool, which employs the following equation to quantify the relationship between air quality improvements and corresponding health impacts on the population [20].
Health Benefit = Y 0 · Population · 1 e β · Δ C
In this equation, Supplementary Material S5 the Health Benefit represents the estimated reduction in adverse health outcomes (e.g., premature mortality) due to changes in pollutant concentrations, calculated separately for different regions and age groups. The variable Y 0 denotes the baseline incidence rate of the health outcome per unit population, which varies by region and age group. Population refers to the total number of individuals within each specified region and age group. The parameter β is the concentration-response function coefficient, reflecting the sensitivity of the health outcome to changes in pollution levels. Δ C represents the change in PM 2.5 concentration, specific to each region.
The BenMAP-CE framework integrates the following:
  • Air quality data (monitored or modeled)
  • Detailed population demographics
  • Baseline health incidence rates
  • Concentration-response functions from the epidemiological literature
  • Economic valuation methods
The tool applies this function at fine spatial resolutions, typically grid cells, to account for local variations in air quality, population, and health status. It can assess impacts for different pollutants, primarily PM 2.5 and ozone, at different geographic scales. Also, the analysis process includes defining the study area, specifying air quality changes, estimating population exposure, selecting health impact functions, calculating health outcomes, aggregating results, applying economic valuation, and characterizing uncertainty. Finally, the open source nature of BenMAP-CE ensures transparency and allows for community-driven improvements. By providing a standardized, scientifically rigorous approach to health impact assessment, it plays a critical role in informing air quality management decisions and supporting evidence-based environmental health policies.

2.4.2. Input Parameters of Health Impact Assessment

Regional PM 2.5 concentration forecast data by scenario: Regional PM 2.5 concentrations were estimated using the Conditional U-Net CMAQ surrogate model, with the year 2019 serving as the baseline. Various emission reduction scenarios were modeled by adjusting boundary conditions to 100% and 50% of baseline values and varying precursor emissions to 100%, 90%, 75%, 50%, 25%, and 10% of the baseline. The annual average PM 2.5 concentrations were then predicted for each of the 17 provinces in South Korea.
2019 Population Counts and Estimates: Population data for 2019 and projections through 2050 were obtained from the Korean Statistical Information Service (KOSIS). The population data included age groups from 30 to 80+ years, segmented into five-year intervals across the 17 provinces [31].
Mortality rates by disease: Mortality rates for cardiovascular diseases (CVD) and respiratory diseases (RD) were derived from the 2019 “Deaths and Death Rates by Cause” statistics provided by KOSIS, segmented similarly by age and region [32]. For future projections up to 2050, it was assumed that these mortality rates would remain constant at the 2019 levels.
Beta Coefficient: Beta coefficients for long-term exposure to PM 2.5 , specific to CVD and RD, were applied using default values provided by the World Health Organization (WHO). These coefficients, as outlined in the WHO’s AirQ+ tool, are based on the findings of [33] and reflect the increased risk per 10 µg/m3 increase in annual PM 2.5 concentrations. For individuals aged 30 and above, the beta coefficient for CVD is 0.0104 (95% CI: 0.0086–0.013) and for RD, it is 0.0095 (95% CI: 0.0030–0.017) [33,34].

3. Results

The results of this study can be divided into two main categories. One is the confirmation of the emulation performance of the CMAQ emulator, and the other is the results of the health effect assessment using this emulator and its analysis. In this section, we discuss these results.

3.1. Emulation Performance Evaluation

We first verify the emulation performance of the CMAQ simulation results for the conditional U-Net structure used in our methodology. This is an important step in determining whether the health effects assessment using this conditional U-Net is reliable. The performance is compared from three perspectives: (1) for quantitative evaluation, simulation and emulation results are compared by scatterplotting all grid verbs, (2) the mean absolute error (MAE), normalized mean absolute error (NMAE), and coefficient of determination ( r 2 ), (3) for qualitative evaluation, the importance between model inputs and outputs is analyzed using SHAP values to see how well the model mimics the physical and chemical processes of CMAQ, and finally, (4) the computation time was evaluated.

3.1.1. Emulation Results

Figure 3 presents a comprehensive comparative analysis of the Conditional U-Net model’s inference results against CMAQ simulations, using both training and test data sets. The scatter plots are arranged side-by-side, with the left panel showing the model’s performance on the training data and the right panel demonstrating its capability on the unseen test data. Both scenarios show remarkable agreement with the CMAQ simulations, achieving an impressive r 2 value of 0.99. This exceptional correlation underscores the model’s ability to emulate CMAQ with near-perfect accuracy over a wide range of PM 2.5 concentration values. The consistently high performance across both training and test data sets not only validates the model’s robust learning, but also confirms its ability to effectively generalize to new scenarios. This consistency is particularly noteworthy because it eliminates concerns about overfitting, a common challenge for machine learning models. The tight clustering of points along the diagonal in both plots further illustrates the model’s precision in capturing the nuances of CMAQ simulations. This visual and statistical evidence reinforces the reliability of the Conditional U-Net as an efficient and accurate surrogate for CMAQ simulations, potentially revolutionizing the speed and scale at which air quality scenarios can be evaluated for policy-making purposes.
Table 2 presents a quantitative evaluation of the performance of the conditional U-Net model, which emulates the CMAQ simulation of PM 2.5 concentrations in South Korea in 2019. The accuracy of the model was evaluated using three main metrics: mean absolute error (MAE), normalized mean absolute error (NMAE), and coefficient of determination ( r 2 ). The results showed high accuracy of the conditional U-Net in reproducing CMAQ results, with significantly lower MAE values (training: 0.222 μ g / m 3 , test: 0.221 μ g / m 3 ) and NMAE values (training: 1.788%, test: 1.762%). The r 2 value of 0.996 for both the training and test sets is particularly impressive, indicating that the conditional U-Net explains 99.6% of the variance in the CMAQ predictions. This near-perfect correlation indicates an exceptional level of agreement between the surrogate model and CMAQ.
The consistency of the performance metrics between the training and test sets provides strong evidence that the model effectively learned the underlying patterns without overfitting. Overall, these results confirm the ability of the conditional U-Net to serve as a highly accurate and efficient surrogate model for CMAQ in predicting annual average PM 2.5 concentrations across Korea, and may provide important computational advantages for rapid policy scenario evaluation.

3.1.2. Input Contribution Analysis

The effect of input parameters on the output PM 2.5 concentration was analyzed using SHAP values to evaluate whether the emulators used in this study accurately reproduce the physical and chemical processes of CMAQ, as well as evaluating simple emulation performance. First, because our conditional U-Net structure has two major inputs, domestic emissions expressed in terms of emissions of the five precursors and international emissions expressed in terms of activity at the boundary, we evaluated domestic and international impacts on changes in PM 2.5 concentrations.
Figure 4 shows the percentage of domestic and international SHAP values for each region. The six regions shown here are the national average, Seoul, Daejeon, Gwangju, Ulsan, and Jeju. In the case of domestic impact assessment, the SHAP values for all input parameters representing domestic precursor emissions were obtained by summing the SHAP values. Figure 4 shows that international influence is greater than or nearly equal to domestic influence in every region. For the national average impact, boundary is rated as more influential, and for the localized results in other inland regions, domestic and boundary are rated similarly. In particular, Jeju Island, located in the southern part of the Korean peninsula far from the mainland, is known to be a particularly foreign-influenced region, with weak influence from land-based precursor emissions and low emission activity of its own, and the emulation model reflects this well [35]. These results are generally consistent with the analysis of factors affecting PM 2.5 concentrations in Korea reported in the study by N. Kumar et al. [8] despite differences in the target time period, indicating that the model successfully mimics the scientific process of CMAQ.
The second qualitative assessment examined the nonlinear response between PM 2.5 and precursor emissions. Since nonlinear behavior of PM 2.5 and its precursors, NH 3 and NO x , has been reported in previous studies based on simulations and actual measurements [36,37], it is necessary to confirm whether the emulator model we developed behaves in a similar nonlinear manner. Figure 5 presents a comprehensive visualization of the nonlinear response of PM 2.5 concentrations to variations in NH 3 and NO x emission activity ratios over South Korea and five major cities, as in the previous evaluation. The contour maps show different patterns for different regions, highlighting the spatial heterogeneity in PM 2.5 formation dynamics. The local trend shows that PM 2.5 concentrations generally increase with higher emission ratios of both NH 3 and NO x , ranging from about 14 to 17 μ g / m 3 . Seoul shows the highest PM 2.5 levels (19–25 μ g / m 3 ) with strong sensitivity to both precursors, while Daejeon and Gwangju show intermediate levels with a more pronounced response to NO x emissions. Ulsan shows a relatively uniform PM 2.5 distribution (13–16 μ g / m 3 ), and Jeju Island shows significantly lower concentrations (7–9 μ g / m 3 ) with minimal variation across emission scenarios.
These contour maps underscore the complexity of PM 2.5 chemistry and the importance of considering both NH 3 and NO x emissions in air quality management plans. The observed non-linear responses and regional variations highlight the need for tailored emission control strategies in different areas of South Korea. This detailed analysis provides valuable insights for policy makers and environmental scientists, enabling more targeted and effective approaches to reduce PM 2.5 concentrations and improve air quality in different urban and rural areas across the country. Importantly, these results are consistent with trends reported in previous studies, confirming the validity of our findings and reinforcing the established understanding of PM 2.5 formation dynamics in relation to precursor emissions in different geographical contexts in South Korea.

3.1.3. Computational Performance

Table 3 presents a comparative analysis of the computational time required for CMAQ simulations and the Conditional U-Net model, highlighting the significant efficiency gains offered by the latter. The CMAQ v4.7 [38] simulation running on one CPU requires approximately 24 h to complete a single scenario. In stark contrast, the Conditional U-Net demonstrates significantly faster performance in both the training and prediction phases. The Conditional U-Net model was implemented using a custom-developed code based on Tensorflow v2.14 [39], running on Python 3.10.13 within an Ubuntu 22.04 operating system environment, leveraging this powerful software stack to optimize performance and functionality.
During the training phase, the Conditional U-Net processes an epoch in about 10 s using a CPU, and this time is further reduced to about 1 s using a GPU, both with a batch size of 256. The most striking efficiency gain is observed in the prediction phase, where the Conditional U-Net can produce results for a single scenario in as little as 10 ms on a CPU and an impressive 1 ms on a GPU, with a batch size of 32.
It is important to note that while the Conditional U-Net offers significant time savings per scenario, it requires initial training on CMAQ simulation data. However, once trained, the model’s ability to generate predictions in milliseconds represents a transformative improvement in computational efficiency compared to the 24-h runtime of CMAQ simulations. This dramatic reduction in processing time could significantly increase the capacity for rapid scenario analysis and policy evaluation in air quality management.

3.2. Health Impact Assessment

Health benefits were evaluated separately for cardiovascular diseases (CVD) and respiratory diseases (RD), with premature mortality estimated under two boundary conditions: 100% and 50%. Figure 6 illustrates the projected premature mortality due to CVD and RD from 2019 to 2050. Assuming that the predicted PM 2.5 concentrations from the 2019 scenario remain constant over time, a continuous increase in premature mortality is observed.
Under the 100% boundary condition, the combined premature mortality for CVD and RD was estimated at 15,934 (95% CI: 10,460–21,855) in 2019, rising to 53,146 (95% CI: 34,216–73,392) by 2050. Under the 50% boundary condition, the combined premature mortality was 11,106 (95% CI: 7241–15,418) in 2019, increasing to 37,043 (95% CI: 23,679–51,803) by 2050. This represents an approximate 330% increase in premature mortality by 2050 compared to 2019, regardless of the boundary condition.
The substantial rise in premature mortality is largely attributed to the projected growth in the population segments with higher mortality rates, particularly among those aged 65 and older. Specifically, the population in the 65–69 age group is expected to increase by 153%, and the 80+ age group by 434% between 2019 and 2050.
In Figure 7, the projected differences in premature mortality from 2020 to 2050 are compared at five-year intervals, using the scenario with 100% pollutant emissions in 2019 as the baseline. Under scenarios where emissions are reduced to 10% of 2019 levels, premature mortality is projected to remain below the baseline under both the 100% and 50% boundary conditions until 2050. Excluding the 10% emission scenario, premature mortality under the 100% boundary condition surpasses the baseline across all other scenarios starting from 2030. Similarly, under the 50% boundary condition, the 25% emission scenario shows fewer premature deaths compared to the baseline until 2040, after which mortality begins to exceed the baseline. These findings indicate that reducing emission TO 10–25% is necessary to counteract the effects of population aging and maintain current premature mortality rates.
PM 2.5 reduction strategies are typically developed with five-year targets. Figure 8 outlines the required reduction targets for each region from 2025 to 2050, segmented into five-year intervals, to prevent an increase in premature mortality due to an aging population. To maintain baseline levels of premature mortality from 2019 to 2025, a reduction of 22–42%, averaging 31%, is necessary across 17 provinces. If the 2025 reduction target is met, an additional reduction of 11–35%, averaging 22%, will be required from 2025 to 2030. Similar reductions are needed for the subsequent periods: 24% from 2030 to 2035, 26% from 2035 to 2040, 19% from 2040 to 2045, and 11% from 2045 to 2050. These five-year reduction percentages are consistent under both the 100% and 50% boundary conditions.

4. Discussion and Conclusions

4.1. Development of a CMAQ Emulator Using a Conditional U-Net

In this study, we developed an emulation model that accurately and rapidly simulates the physicochemical processes of CMAQ using deep learning techniques with a conditional U-Net architecture. We also established and validated a series of pipelines that leverage this emulation model for health impact assessments and policy-making applications related to PM 2.5 precursor emissions. While CMAQ is one of the most reliable and widely used air pollution simulation tools, it requires extensive background knowledge and suffers from limitations due to its long computational times. These limitations hinder the assessment of air pollution across multiple scenarios and the formulation of effective policies for mitigating respiratory and other health impacts associated with air pollution. The emulation methodology presented here addresses these challenges by linking CMAQ with health impact assessments, thereby facilitating simpler, faster, and accurate air quality evaluations. This advancement supports more efficient policy-making and enhances the ability to assess health impacts and develop strategies to improve public health outcomes.
For the emulation model that predicts the annual average PM 2.5 concentration in Korea for a given precursor emission activity, the PM 2.5 concentration calculated from the simulation of CMAQ using the emissions, climate, and weather conditions of 2019 as the baseline for this study was used as training data, and a deep learning model with a conditional U-Net structure specialized in calculating the annual average pollutant concentration described in the text was employed. The emulation model predicts the annual average PM 2.5 concentration with an MAE of 0.221 μ g / m 3 , NMAE of 1.762%, and r 2 of 0.996 based on the test data. Also, as shown in Table 3, for the trained model predicting a scenario, based on the machine we used, we were able to perform high-speed computations on the order of 10 ms using CPU and 1 ms using GPU, which is a significant improvement considering the time required to precalculate the data needed for training with CMAQ. These results demonstrate that the emulation model and methodology we developed significantly contribute to overcoming the limitations of CMAQ and are well suited for the applications presented in this study.
Additional contribution analyses performed using SHAP values to increase the reliability of the emulation model, as well as an analysis of the relationship between input and output values, also show that the methodology does a good job of simulating the physicochemical computational processes behind CMAQ and their associated properties, giving policy makers more confidence in using the methodology we have developed. It is important to note that the SHAP value analysis is qualitative rather than quantitative. The conditional U-Net model uniquely accommodates both domestic precursor emissions and transboundary emissions as inputs, enabling the evaluation of their combined effects on the model’s annual PM 2.5 concentration output. Figure 4 illustrates the impacts of domestic and boundary emissions on annual PM 2.5 concentrations for both national and provincial averages. The evaluated domestic and boundary influences align well with findings from previous studies, validating that the emulation model’s computational processes are consistent with those observed in previous simulations and CMAQ results. Furthermore, the emulation model accurately reflects the nonlinear relationship between PM 2.5 and precursor emissions. Figure 5 highlights the nonlinear interactions between NH 3 and NO x emissions and PM 2.5 concentrations, which corroborates the nonlinear behavior reported in previous research. This validation supports the use of the conditional U-Net as an effective emulation model.

4.2. Health Impact Assessment and Policy Implications

The health impact assessment results, made possible by CMAQ’s rapid simulation of annual PM 2.5 concentrations, highlight the critical role of emission reductions in reducing premature mortality in South Korea. While premature deaths are expected to increase significantly from 2019 to 2050 due to an aging population, reducing emission TO 10–25% from current levels could counteract this increase. The study recommends an average reduction of 31% by 2025, followed by further reductions of 11% to 26% every five years until 2050. This innovative approach allows for the rapid and accurate evaluation of multiple emissions scenarios, providing policymakers with a powerful tool for real-time decision making. By integrating rapid CMAQ emulation with comprehensive health impact assessments, the methodology significantly advances environmental health research and policy planning. It also benefits society by facilitating more informed, timely, and effective air quality management strategies, potentially saving thousands of lives and improving the quality of life for millions. This multidisciplinary approach can be applied to air quality challenges around the world and represents a major step forward in the integration of environmental science, public health, and policy making.

4.3. Limitations and Future Research

The CMAQ emulator developed in this study successfully mimics the annual average PM 2.5 concentrations in Korea, but it has several challenges and limitations. First, when evaluating PM 2.5 concentrations and their impact on human health, it is important to consider not only annual, but also monthly and daily average concentrations, and while CMAQ is capable of simulating these time resolutions, our model is limited by its structural limitations in that it can only infer annual average concentrations. While CMAQ can simulate these time resolutions, our model is limited by its structural limitations in that it can only infer annual average concentrations. Also, because our model was designed to simply predict the annual average concentration of PM 2.5 from given precursor emissions and boundary conditions, the meteorological and geographic information necessary to simulate actual PM 2.5 concentrations is not used to predict concentrations. This is because the model predictions are not based on SHAP values. This means that although the causal relationship of the model’s prediction results from the SHAP values has been confirmed, the possibility that inferences are being made using a process that differs from the actual behavior of CMAQ cannot be ruled out.
Based on the above limitations, future research directions are as follows. First, the model could be improved to accommodate data with different temporal resolutions. Health impact assessments based on PM 2.5 concentrations are generally conducted at different time units, and this improvement is essential to enable more diverse assessments in the future. Further analysis of the input–output variables of the model is also needed. In particular, as mentioned above, unlike the original simulation of CMAQ, which is the object to be emulated, our model has a structure that does not take into account any meteorological or geographic information. A deeper investigation of whether the emulation results from this structure accurately mimic the CMAQ process would be essential to increase the credibility of our methodology. In addition, having meteorological and geographic information as input means that we will be able to comprehensively explore different scenarios, time periods, and years, which will allow us to develop a more general surrogate model for CMAQ. Finally, since our model can in principle predict concentrations of air pollutants other than PM 2.5 , such as O 3 and NO 2 , we will confirm the performance of the emulation and various case studies for these substances in future studies.

4.4. Overall Summary and Conclusions

This study introduces a novel methodology that bridges the gap between complex air quality modeling and practical policy evaluation. We developed a high-speed surrogate model based on a conditional U-Net architecture to predict PM 2.5 concentrations and integrated it with the BenMAP health impact assessment tool to evaluate health impacts efficiently. The surrogate model, trained using annual PM 2.5 data from South Korea, demonstrated impressive performance with an MAE of 0.221 μ g / m 3 , NMAE of 1.762%, and an r 2 of 0.996, significantly reducing computational time compared to traditional CMAQ simulations. This model allows rapid predictions and assessments of health impacts under various emission scenarios.
Furthermore, this research established a method for predicting the number of premature deaths reflecting population aging levels, and developed a technique to calculate the additional emission reductions needed to offset the health impacts of increased premature mortality due to aging. However, the study’s scope is limited to PM 2.5 , and future research should extend this approach to other pollutants such as NO 2 , O 3 , and SO 2 . Developing models to predict various metrics like daily averages and incorporating advanced dose-response models that consider age-specific sensitivities and threshold values will offer a more comprehensive evaluation of health impacts. Addressing these areas will enhance air quality management and public health protection, providing more robust tools for policymakers.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/atmos15101186/s1, S1. Visualization 2D PM2.5 annual concentration. S2. Relation between domestic activities and boundary conditions. S3. Contribution analysis of input variables with various SHAP value analyses. S4. Total premature mortality. S5. Health Benefits Equation. S6. PM2.5 concentration reduction compared to 2019. S7. 5-year precursor emission reduction.

Author Contributions

Y.L. conceived the study, developed the methodology, performed formal analysis, wrote the original draft, and created the visualizations. J.P. contributed to the methodology, conducted formal analysis, and reviewed and edited the manuscript. J.K. developed the software, collected resources, and performed validation. J.-H.W. conducted the investigation, collected resources, and administered the project. J.-H.L. performed formal analysis, reviewed and edited the manuscript, and supervised the study. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Environmental Technology Development Project (R&D) for the New Climate Regime and Development of a global integrated assessment modelling system for climate-air pollutants management focused on Northeast Asia projects of the Korea Institute of Environment and Industrial Technology. (Ministry of Environment R&D project number: RS-2022-KE002096).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and source code that support the findings of this study are available from the corresponding authors upon reasonable request.

Conflicts of Interest

Jong-Hyeon Lee, Yohan Lee, and Junghyun Park are employees of EH R&C Co., Ltd. The paper reflects the views of the scientists and not the company. Other authors declare no competing interests.

Abbreviations

The following abbreviations are used in this manuscript:
BenMAPEnvironmental Benefits Mapping and Analysis Program
CMAQCommunity Multiscale Air Quality Modeling System
CNNConvolutional Neural Networks
CPUCentral Processing Unit
CVDCardiovascular diseases
EPAEnvironmental Protection Agency
GPUGraphics Processing Unit
KOSISKorean Statistical Information Service
LHSLatin Hypercube Sampling
MAEMean Absolute Error
NH 3 Ammonia
NMAENormalized Mean Absolute Error
NO x Nitrogen oxides
PM 2.5 Particulate matter with diameter less than 2.5 μm
RDRespiratory diseases
SHAPSHapley Additive exPlanations
SO 2 Sulfur dioxide
VOCVolatile Organic Compounds
WHOWorld Health Organization

References

  1. World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide. 2021. Available online: https://apps.who.int/iris/handle/10665/345329 (accessed on 1 September 2024).
  2. European Environment Agency. Air Quality in Europe 2022. 2022. Available online: https://www.eea.europa.eu/publications/air-quality-in-europe-2022 (accessed on 1 September 2024).
  3. Yu, W.; Xu, R.; Ye, T.; Abramson, M.J.; Morawska, L.; Jalaludin, B.; Johnston, F.H.; Henderson, S.B.; Knibbs, L.D.; Morgan, G.G.; et al. Estimates of global mortality burden associated with short-term exposure to fine particulate matter (PM2.5). Lancet Planet. Health 2024, 8, e146–e155. [Google Scholar] [CrossRef] [PubMed]
  4. Han, S.; Park, Y.; Noh, N.; Kim, J.H.; Kim, J.J.; Kim, B.M.; Choi, W. Spatiotemporal variability of the PM2.5 distribution and weather anomalies during severe pollution events: Observations from 462 air quality monitoring stations across South Korea. Atmos. Pollut. Res. 2023, 14, 101676. [Google Scholar] [CrossRef]
  5. Byun, G.; Kim, S.; Choi, Y.; Kim, A.; AiMS-CREATE Team; Lee, J.T.; Bell, M.L. Long-term exposure to (PM2.5) and mortality in a national cohort in South Korea: Effect modification by community deprivation, medical infrastructure, and greenness. BMC Public Health 2024, 24, 1266. [Google Scholar] [CrossRef] [PubMed]
  6. Kumar, N.; Park, R.J.; Jeong, J.I.; Woo, J.H.; Kim, Y.; Johnson, J.; Yarwood, G.; Kang, S.; Chun, S.; Knipping, E. Contributions of international sources to PM2.5 in South Korea. Atmos. Environ. 2021, 261, 118542. [Google Scholar] [CrossRef]
  7. Allabakash, S.; Lim, S.; Chong, K.S.; Yamada, T.J. Particulate Matter Concentrations over South Korea: Impact of Meteorology and Other Pollutants. Remote Sens. 2022, 14, 4849. [Google Scholar] [CrossRef]
  8. Kumar, N.; Johnson, J.; Yarwood, G.; Woo, J.H.; Kim, Y.; Park, R.J.; Jeong, J.I.; Kang, S.; Chun, S.; Knipping, E. Contributions of domestic sources to PM2.5 in South Korea. Atmos. Environ. 2022, 287, 119273. [Google Scholar] [CrossRef]
  9. Cha, Y.; Song, C.K.; Jeon, K.H.; Yi, S.M. Factors affecting recent PM2.5 concentrations in China and South Korea from 2016 to 2020. Sci. Total Environ. 2023, 881, 163524. [Google Scholar] [CrossRef]
  10. National Assembly Budget Office. Analysis of Air Quality and Budget Allocation in South Korea. 2019. Available online: https://korea.nabo.go.kr/ (accessed on 23 August 2024).
  11. Special Act on the Reduction and Managementof Fine Dust Act of 2018. §33491. 2018. Available online: https://elaw.klri.re.kr/eng_mobile/viewer.do?hseq=50524&type=sogan&key=16 (accessed on 23 August 2024).
  12. US EPA Office of Research and Development. CMAQ: The Community Multiscale Air Quality Modeling System; US EPA Office of Research and Development: Research Triangle Park, NC, USA, 2022. Available online: https://www.epa.gov/cmaq (accessed on 1 September 2024).
  13. Gao, Z.; Zhou, X. A review of the CAMx, CMAQ, WRF-Chem and NAQPMS models: Application, evaluation and uncertainty factors. Environ. Pollut. 2024, 343, 123183. [Google Scholar] [CrossRef]
  14. Xing, J.; Wang, S.X.; Jang, C.; Zhu, Y.; Hao, J.M. Nonlinear response of ozone to precursor emission changes in China: A modeling study using response surface methodology. Atmos. Chem. Phys. 2011, 11, 5027–5044. [Google Scholar] [CrossRef]
  15. Wang, S.; Xing, J.; Jang, C.; Zhu, Y.; Fu, J.S.; Hao, J. Impact Assessment of Ammonia Emissions on Inorganic Aerosols in East China Using Response Surface Modeling Technique. Environ. Sci. Technol. 2011, 45, 9293–9300. [Google Scholar] [CrossRef]
  16. Xing, J.; Ding, D.; Wang, S.; Zhao, B.; Jang, C.; Wu, W.; Zhang, F.; Zhu, Y.; Hao, J. Quantification of the enhanced effectiveness of NOx control from simultaneous reductions of VOC and NH3 for reducing air pollution in the Beijing–Tianjin–Hebei region, China. Atmos. Chem. Phys. 2018, 18, 7799–7814. [Google Scholar] [CrossRef]
  17. Xing, J.; Zheng, S.; Ding, D.; Kelly, J.T.; Wang, S.; Li, S.; Qin, T.; Ma, M.; Dong, Z.; Jang, C.; et al. Deep Learning for Prediction of the Air Quality Response to Emission Changes. Environ. Sci. Technol. 2020, 54, 8589–8600. [Google Scholar] [CrossRef] [PubMed]
  18. Xu, J.Z.; Zhang, H.R.; Cheng, Z.; Liu, J.Y.; Xu, Y.Y.; Wang, Y.C. Approximating Three-Dimensional (3-D) Transport of Atmospheric Pollutants via Deep Learning. Earth Space Sci. 2022, 9, e2022EA002338. [Google Scholar] [CrossRef]
  19. Salman, A.K.; Choi, Y.; Park, J.; Mousavinezhad, S.; Payami, M.; Momeni, M.; Ghahremanloo, M. Deep learning based emulator for simulating CMAQ surface NO2 levels over the CONUS. Atmos. Environ. 2024, 316, 120192. [Google Scholar] [CrossRef]
  20. Sacks, J.D.; Lloyd, J.M.; Zhu, Y.; Anderton, J.; Jang, C.J.; Hubbell, B.; Fann, N. The Environmental Benefits Mapping and Analysis Program - Community Edition (BenMAP-CE): A tool to estimate the health and economic benefits of reducing air pollution. Environ. Model. Softw. 2018, 104, 118–129. [Google Scholar] [CrossRef]
  21. Sacks, J.; Coffman, E.; Rappold, A.G.; Anderton, J.; Amend, M.; Baker, K.; Fann, N. A Proof-Of-Concept Approach for Quantifying Multipollutant Health Impacts Using Joint Effects Models within the Open-Source BenMAP-CE Software Program. ISEE Conf. Abstr. 2018, 2018. [Google Scholar] [CrossRef]
  22. Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar]
  23. Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv 2021, arXiv:2112.10752. [Google Scholar]
  24. Ministry of the Interior and Safety, Korea. Ministry of Land, Infrastructure, and Transport Continental Map. 2017. Available online: https://www.data.go.kr/data/15056910/openapi.do (accessed on 30 April 2024).
  25. Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
  26. Vega García, M.; Aznarte, J.L. Shapley additive explanations for NO2 forecasting. Ecol. Inform. 2020, 56, 101039. [Google Scholar] [CrossRef]
  27. Stirnberg, R.; Cermak, J.; Kotthaus, S.; Haeffelin, M.; Andersen, H.; Fuchs, J.; Kim, M.; Petit, J.E.; Favez, O. Meteorology-driven variability of air pollution (PM1) revealed with explainable machine learning. Atmos. Chem. Phys. 2021, 21, 3919–3948. [Google Scholar] [CrossRef]
  28. Li, X.; Wu, C.; Meadows, M.E.; Zhang, Z.; Lin, X.; Zhang, Z.; Chi, Y.; Feng, M.; Li, E.; Hu, Y. Factors Underlying Spatiotemporal Variations in Atmospheric PM2.5 Concentrations in Zhejiang Province, China. Remote Sens. 2021, 13, 3011. [Google Scholar] [CrossRef]
  29. Li, T.; Zhang, Q.; Peng, Y.; Guan, X.; Li, L.; Mu, J.; Wang, X.; Yin, X.; Wang, Q. Contributions of various driving factors to air pollution events: Interpretability analysis from Machine learning perspective. Environ. Int. 2023, 173, 107861. [Google Scholar] [CrossRef] [PubMed]
  30. Wu, C.D.; Zhu, J.J.; Hsu, C.Y.; Shie, R.H. Quantifying source contributions to ambient NH3 using Geo-AI with time lag and parcel tracking functions. Environ. Int. 2024, 185, 108520. [Google Scholar] [CrossRef] [PubMed]
  31. Statistics Korea. Population Projections for Korea, Projected Population by Age (Province). 2024. Available online: https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1BPB001&conn_path=I2 (accessed on 12 August 2024).
  32. Statistics Korea. Causes of Death Statistics, Deaths and Death rates by Cause (104 Item), Sex, and Age (Five-Year Age): Province. 2022. Available online: https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1B34E11&conn_path=I2 (accessed on 12 August 2024).
  33. Hoek, G.; Krishnan, R.M.; Beelen, R.; Peters, A.; Ostro, B.; Brunekreef, B.; Kaufman, J.D. Long-term air pollution exposure and cardio- respiratory mortality: A review. Environ. Health 2013, 12, 43. [Google Scholar] [CrossRef]
  34. WHO. AirQ+: Software Tool for Health Risk Assessment of Air Pollution. 2018. Available online: https://www.who.int/europe/tools-and-toolkits/airq---software-tool-for-health-risk-assessment-of-air-pollution (accessed on 2 October 2018).
  35. Baek, J.Y. Contributions of Domestic and Foreign Emissions to High PM2.5 Concentrations in Urban and Background Areas in Korea. Master’s Thesis, Jeju National University, Jeju City, Republic of Korea, 2023. Available online: https://oak.jejunu.ac.kr/handle/2020.oak/24548 (accessed on 1 September 2024).
  36. Thunis, P.; Clappier, A.; Beekmann, M.; Putaud, J.P.; Cuvelier, C.; Madrazo, J.; de Meij, A. Non-linear response of PM2.5 to changes in NOx and NH3 emissions in the Po basin (Italy): Consequences for air quality plans. Atmos. Chem. Phys. 2021, 21, 9309–9327. [Google Scholar] [CrossRef]
  37. Kelly, J.T.; Jang, C.; Zhu, Y.; Long, S.; Xing, J.; Wang, S.; Murphy, B.N.; Pye, H.O.T. Predicting the Nonlinear Response of PM2.5 and Ozone to Precursor Emission Changes with a Response Surface Model. Atmosphere 2021, 12, 1044. [Google Scholar] [CrossRef]
  38. US EPA Office of Research and Development. CMAQv4.7.1; US EPA Office of Research and Development: Research Triangle Park, NC, USA, 2017. [CrossRef]
  39. Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org/ (accessed on 1 September 2024).
Figure 1. A graphical abstract of the pipeline that uses the model we developed. From top to bottom, the pipeline we present in this paper is as follows: (1) Compute the minimum data required to train the model via CMAQ simulation, (2) Train an emulator that mimics CMAQ by training a conditional U-Net model using the computed data. (3) Finally, we use this emulator to conduct and utilize various studies. The details of the model’s operation are explained in the Methods section.
Figure 1. A graphical abstract of the pipeline that uses the model we developed. From top to bottom, the pipeline we present in this paper is as follows: (1) Compute the minimum data required to train the model via CMAQ simulation, (2) Train an emulator that mimics CMAQ by training a conditional U-Net model using the computed data. (3) Finally, we use this emulator to conduct and utilize various studies. The details of the model’s operation are explained in the Methods section.
Atmosphere 15 01186 g001
Figure 2. Schematic diagram of conditional U-Net architecture used in this study. Each set of parameters listed next to each block in the figure represents the block dimensions set in this study, and these are hyperparameters that can be changed as appropriate, depending on the target region, other parameters to be entered, and the number of data used in the study.
Figure 2. Schematic diagram of conditional U-Net architecture used in this study. Each set of parameters listed next to each block in the figure represents the block dimensions set in this study, and these are hyperparameters that can be changed as appropriate, depending on the target region, other parameters to be entered, and the number of data used in the study.
Atmosphere 15 01186 g002
Figure 3. Scatter plots for emulation performance evaluation. (left): plot for training data, (right): plot for test data. The left plot is for the training data and the right plot is for the test data. Also, both the training and test data show the same performance, indicating that no overfitting has occurred.
Figure 3. Scatter plots for emulation performance evaluation. (left): plot for training data, (right): plot for test data. The left plot is for the training data and the right plot is for the test data. Also, both the training and test data show the same performance, indicating that no overfitting has occurred.
Atmosphere 15 01186 g003
Figure 4. Results of the graphical representation of the contribution of input variables to the concentration of PM 2.5 in two categories, national and international, using SHAP values. The results show that the impact of precursor emissions from outside the country is equal to or greater than the domestic impact, indicating that the simulation results and the actual situation on the Korean peninsula are properly mimicked. From top left to bottom right, the panels show the contribution of boundary and domestic emissions to annual PM 2.5 concentrations in the whole country, Seoul, Daejeon, Gwangju, Ulsan, and Jeju Island, respectively.
Figure 4. Results of the graphical representation of the contribution of input variables to the concentration of PM 2.5 in two categories, national and international, using SHAP values. The results show that the impact of precursor emissions from outside the country is equal to or greater than the domestic impact, indicating that the simulation results and the actual situation on the Korean peninsula are properly mimicked. From top left to bottom right, the panels show the contribution of boundary and domestic emissions to annual PM 2.5 concentrations in the whole country, Seoul, Daejeon, Gwangju, Ulsan, and Jeju Island, respectively.
Atmosphere 15 01186 g004
Figure 5. Contour map showing the non-linear response of PM 2.5 to NH 3 and NO x emissions, showing, from top left, the response of PM 2.5 concentrations for the entire country, Seoul, Daejeon, Gwangju, Ulsan, and Jeju Island.
Figure 5. Contour map showing the non-linear response of PM 2.5 to NH 3 and NO x emissions, showing, from top left, the response of PM 2.5 concentrations for the entire country, Seoul, Daejeon, Gwangju, Ulsan, and Jeju Island.
Atmosphere 15 01186 g005
Figure 6. Annual premature mortality due to various diseases and total diseases (cardiovascular diseases + respiratory diseases) based on PM 2.5 concentrations. Panel (a) illustrates the results under a boundary condition of 100%, while panel (b) shows the results under a boundary condition of 50%. Scenarios were applied where precursor emissions were reduced from 100% to 10%. Premature mortality for the entire population of South Korea was calculated annually from 2019 to 2050.
Figure 6. Annual premature mortality due to various diseases and total diseases (cardiovascular diseases + respiratory diseases) based on PM 2.5 concentrations. Panel (a) illustrates the results under a boundary condition of 100%, while panel (b) shows the results under a boundary condition of 50%. Scenarios were applied where precursor emissions were reduced from 100% to 10%. Premature mortality for the entire population of South Korea was calculated annually from 2019 to 2050.
Atmosphere 15 01186 g006
Figure 7. Differences in premature mortality relative to 2019 under scenarios of precursor emissions at 100%. Panel (a) shows results with a boundary condition of 100%, while Panel (b) shows results with a boundary condition of 50%. The scenarios applied involve reductions in precursor emissions from 100% to 10%. The differences in premature mortality for the entire population of South Korea from 2020 to 2050, calculated in five-year intervals, are shown relative to premature mortality at 100% precursor emissions in 2019.
Figure 7. Differences in premature mortality relative to 2019 under scenarios of precursor emissions at 100%. Panel (a) shows results with a boundary condition of 100%, while Panel (b) shows results with a boundary condition of 50%. The scenarios applied involve reductions in precursor emissions from 100% to 10%. The differences in premature mortality for the entire population of South Korea from 2020 to 2050, calculated in five-year intervals, are shown relative to premature mortality at 100% precursor emissions in 2019.
Atmosphere 15 01186 g007
Figure 8. Reduction percentages of precursor emissions across 17 provinces over five-year intervals to sustain premature mortality rates in 2019. The x-axis represents the average reduction percentages across the 17 provinces. The y-axis denotes the percentage reduction in precursor emissions required from the start year to the end year, based on the emission levels at the start year. This assumes that the emission reduction targets for the previous year were achieved.
Figure 8. Reduction percentages of precursor emissions across 17 provinces over five-year intervals to sustain premature mortality rates in 2019. The x-axis represents the average reduction percentages across the 17 provinces. The y-axis denotes the percentage reduction in precursor emissions required from the start year to the end year, based on the emission levels at the start year. This assumes that the emission reduction targets for the previous year were achieved.
Atmosphere 15 01186 g008
Table 1. Example of random control matrix scenario. Each scenario is sampled with the Latin Hyper Sampling method. Each column indicates the precursor emission activity of each emission source and each row indicates the activity of each region of Korea.
Table 1. Example of random control matrix scenario. Each scenario is sampled with the Latin Hyper Sampling method. Each column indicates the precursor emission activity of each emission source and each row indicates the activity of each region of Korea.
Region Name NO x SO 2 VOC NH 3 PM 2.5 Activity
Seoul0.5140.9270.9450.6921.109-
Incheon0.6111.0870.9490.5460.778-
Busan0.5041.3641.0461.1921.249-
Daegu1.2740.9510.7081.2470.786-
Gwangju0.8721.0690.6210.8401.497-
Gyeonggi-do0.5741.3791.4360.8421.177-
Gangwon-do1.4791.1670.5401.0981.173-
Chungbuk-do1.1340.7100.7251.4100.503-
Chungnam-do0.5200.5620.8121.0210.994-
Gyeongbuk-do1.0631.0731.1921.3431.045-
Gyeongnam-do0.5811.3371.0570.8110.671-
Jeonbuk-do0.7021.2860.5801.1501.063-
Jeonnam-do1.3111.1871.0011.2810.805-
Jeju-do1.0341.2241.3950.5200.752-
Daejeon1.1730.7181.3860.7620.864-
Ulsan1.1970.9190.5281.2840.534-
Sejong0.9961.3940.5490.7871.343-
Boundary-----1.000
Table 2. Quantitative comparison of the performance of Conditional U-Net emulation results. MAE, NMAE and r 2 metrics were evaluated here. The results indicate that the Conditional U-Net model emulates CMAQ with very high performance, and the comparison of the results for the training and test data shows that the model also learns successfully without overfitting problems.
Table 2. Quantitative comparison of the performance of Conditional U-Net emulation results. MAE, NMAE and r 2 metrics were evaluated here. The results indicate that the Conditional U-Net model emulates CMAQ with very high performance, and the comparison of the results for the training and test data shows that the model also learns successfully without overfitting problems.
Target YearTarget RegionMetrics [Units]Data CategoryScore
2019South KoreaMAE [μ g / m 3 ]Training set0.222
Test set0.221
NMAE [%]Training set1.788
Test set1.762
r 2 [-]Training set0.996
Test set0.996
Table 3. Time required to compute each method. Compared to the time required to simulate each scenario in CMAQ, the time saved by the conditional U-Net method is significantly shorter, regardless of the processor type. It should be noted, however, that the CMAQ simulation data must be computed in advance in order to train the conditional U-Net model.
Table 3. Time required to compute each method. Compared to the time required to simulate each scenario in CMAQ, the time saved by the conditional U-Net method is significantly shorter, regardless of the processor type. It should be noted, however, that the CMAQ simulation data must be computed in advance in order to train the conditional U-Net model.
MethodProcessorBatch SizeTime Consumed
CMAQ v4.7CPU (simulation)-∼24 h/scenario
Conditional U-NetCPU (training)256∼10 s/epoch
GPU (training)256∼1 s/epoch
CPU (prediction)32∼10 ms/scenario
GPU (prediction)32∼1 ms/scenario
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, Y.; Park, J.; Kim, J.; Woo, J.-H.; Lee, J.-H. Rapid PM2.5-Induced Health Impact Assessment: A Novel Approach Using Conditional U-Net CMAQ Surrogate Model. Atmosphere 2024, 15, 1186. https://doi.org/10.3390/atmos15101186

AMA Style

Lee Y, Park J, Kim J, Woo J-H, Lee J-H. Rapid PM2.5-Induced Health Impact Assessment: A Novel Approach Using Conditional U-Net CMAQ Surrogate Model. Atmosphere. 2024; 15(10):1186. https://doi.org/10.3390/atmos15101186

Chicago/Turabian Style

Lee, Yohan, Junghyun Park, Jinseok Kim, Jung-Hun Woo, and Jong-Hyeon Lee. 2024. "Rapid PM2.5-Induced Health Impact Assessment: A Novel Approach Using Conditional U-Net CMAQ Surrogate Model" Atmosphere 15, no. 10: 1186. https://doi.org/10.3390/atmos15101186

APA Style

Lee, Y., Park, J., Kim, J., Woo, J. -H., & Lee, J. -H. (2024). Rapid PM2.5-Induced Health Impact Assessment: A Novel Approach Using Conditional U-Net CMAQ Surrogate Model. Atmosphere, 15(10), 1186. https://doi.org/10.3390/atmos15101186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop