**Assessing the Impact of Ozone and Particulate Matter on Mortality Rate from Respiratory Disease in Seoul, Korea**

#### **Sun Kyoung Park**

School of ICT-Integrated Studies, Pyeongtaek University, Pyeongtaek 17869, Korea; skpark@ptu.ac.kr

Received: 13 October 2019; Accepted: 5 November 2019; Published: 7 November 2019

**Abstract:** The evidence linking ozone and particulate matter with adverse health impacts is increasing. The goal of this study was to assess the impact of air pollution on the mortality rate from respiratory disease in Seoul, Korea, between 2008 and 2017. The analysis was conducted using a decision tree model in two ways: using 24-h average concentrations and using 1-h maximum values to compare any health impacts from the different times of exposure to pollution. Results show that in spring an elevated level of ozone is one of the most important factors, but in summer temperature has a greater impact than air pollution. Nitrogen dioxide is one of the most important factors in fall, while high levels of particles less than 2.5 μm (PM2.5) and 10 μm in size (PM10) and cooler temperatures are key factors in winter. We checked the accuracy of our results through a 10-fold cross validation method. Error rates using 24-h average and 1-h maximum concentrations were in the ranges of 24.9–42% and 27.6–42%, respectively, indicating that 24-h average concentrations are slightly more directly related with mortality rate. These results could be useful for policy makers in determining the temporal scale of predicted pollutant concentrations for an air quality warning system to help minimize the adverse impacts of air pollution.

**Keywords:** ozone; PM2.5; PM10; nitrogen dioxide; respiratory disease; decision tree model

#### **1. Introduction**

Air pollution reduces visibility and has an adverse impact on human health [1]. Relatively high pollutant levels are often observed in industrialized cities. As a representative example of air pollutants, the tropospheric ozone (O3) is a secondary pollutant produced naturally by photochemical decomposition. O3 maintains an equilibrium concentration between production and removal, but artificially emitted nitrogen dioxide (NO2) and volatile organic carbons (VOCs) accelerate the production of O3 through a photochemical reaction. Accordingly, high O3 concentrations are common in cities, which damage ecosystems [2–4]. Prolonged exposure to high O3 concentrations causes or exacerbates cardiovascular disease and respiratory diseases, such as pneumonia, chronic obstructive pulmonary disease, asthma, and allergic rhinitis. Cases leading to death have also been reported [5–7].

Particulate matter is also a representative air pollutant with direct effects on human health. Particles less than 2.5 μm in size are referred to as PM2.5, and particles less than 10 μm in size referred to as PM10. Because inhaled fine particles can penetrate deep into the capillary vessels, particulate matter is known as a direct cause of cardiovascular disease. There have been multiple reports showing that exposure to high particle concentrations leads to increased fetal mortality. According to the World Health Organization (2014), by 2012, the global toll of premature deaths related to air pollution had reached 7 million people annually. As such, numerous research results showing air pollution's direct and indirect impacts on human health have been published [8].

Air pollution affects health in many ways [9–11]. The impact of air pollution on school students was studied for 3.5 years in Barcelona, Spain [12]. Results showed that an increase in ambient NO2

and particulate matter concentrations by one interquartile range deteriorated memory development in students by around 20%. Ljungman et al. (2018) analyzed the relationship between air pollution and arterial stiffness. Although the result showed no linkage between arterial stiffness and PM2.5, a higher probability of arterial stiffness was found in roadside residents. In other words, although the impact of a single type of air pollution was not clear, the results demonstrated that several air pollutants have complex effects on health [13]. Therefore, the influence of multiple air pollutants on the human body should be analyzed to accurately identify the effect of air pollution.

The impact of air pollution on human health also varies according to the analytical method used. Son et al. (2010) analyzed the impact of air pollution on pulmonary function by using PM10, O3, NO2, SO2, and CO obtained from 13 observatories in Ulsan, Korea, from 2003 to 2007 [11]. Four methods were used to calculate representative pollutant concentrations: simple averaging, nearest distance, inverse distance weighting, and kriging. Subsequent tests of the accuracy of the analysis determined that kriging was superior. As such, differences in research results may occur depending on the method selected. On the other hand, results may vary depending on the timescale of the observed air pollution. Lee et al. (2018) analyzed the impact of short-term exposure to air pollution (fewer than 8 days) and long-term exposure to air pollution (annual) on key inflammatory markers by using linear mixed effects models [14]. The results showed that although short-term exposure was related to increased fibrinogen and ferritin levels, long-term exposure was related to fibrinogen and white blood cell counts. Likewise, the impact of air pollution may vary depending on the temporal scale of data, so it is meaningful to compare epidemiological study results using data of different timescales.

The purpose of this study is two-fold: one reason is to find pollutant levels that determine the high probability of mortality from respiratory disease, and the other is to compare the accuracy of the results using 1-h maximum with that using 24-h average pollutant concentrations. In order to achieve these goals, we classified the dependent variable as days with high and low probability of mortality. In addition, the effect of temperature on health was also analyzed because temperature is known to have a direct influence on health [6,10]. The statistical model used was a decision tree algorithm. Among various statistical models, the decision tree algorithm was especially useful for finding factors with which to classify the dependent variable [15]. A brief description of the model is provided in the next section.

#### **2. Research Methods**

#### *2.1. Analytical Data*

Hourly air pollutant concentrations, temperature, and daily number of deaths in Seoul from 2008 to 2017 were used for this study. Air pollution data for SO2, CO, O3, NO2, PM10, and PM2.5 were measured from 25 monitoring stations operated by the Korea Environment Corporation (KECO) (Figure 1) [16]. Temperatures were collected from the Korea Meteorological Administration's (KMA) National Climate Data Center (NCDC) [17]. The number of deaths was based on the public microdata of the National Statistical Office (NSO) [18]. The number of deaths from respiratory disease was in the "J00–J99" category of the 10th International Classification of Diseases (ICD-10).

**Figure 1.** Air pollution monitoring stations in Seoul, Korea.

#### *2.2. Decision Tree Model*

Decision tree models can efficiently accommodate any data formats that are non-normal, a mix of continuous, discrete, and categorical formats, and cross- or auto-correlated formats. Moreover, the decision trees facilitate the interpretation of the final model because their output is a hierarchical structure that consists of a series of "if–then" rules to predict the outcome of the dependent variable [19]. This cannot be easily achieved using other time series regression models, such as distributed lag non-linear function. A decision tree model expresses rules appropriate for classifying or predicting dependent variables (i.e., number of deaths caused from respiratory disease) based on independent variables (i.e., air pollutant concentration, etc.). Here, independent variables were the 1-h maximum and 24-h average air pollutant concentrations and 24-h average temperatures. Dependent variables were the categorical values of days with high numbers of deaths (H) or days with low numbers of deaths (L) and were classified based on the median number of deaths from respiratory disease (Table 1).

A classification and regression tree (CART) was used to apply the decision tree model. Because detailed descriptions of CART models can be found in other literature, only a brief description is outlined in this paper [15]. First, CART makes classifications based on the most important independent variable (Figure 2). For example, let us assume independent variables are X, Y, and Z. Here, the first basis for classifying dependent variables as A or B is "X ≤ x". The second basis is "Y ≤ y" or "Z ≤ z". Under "X ≤ x", the dependent variable is classified as A if "Y ≤ y", and it is classified as B if "Y > y". Under "X > x", the dependent variable is classified as B if "Z ≤ z", and it is classified as A if "Z > z". Among the independent variables, only appropriate variables are used for the classification.

The classification criteria in CART maximize similarity and dissimilarity among groups based on the Gini index [15]. Because the branch divides repeatedly based on this method, a tree-shaped structure is hierarchically constructed. the visual expression is helpful in understanding and interpreting the results. The model has been widely used in many areas, including environmental sciences and epidemiological studies linking air pollution and human health [20–26].

**Figure 2.** Schematic diagram of classification. Independent variables are X, Y, and Z, and dependent variables are classified as A and B.

The dependent variable in the model was categorized as days with relatively high (H) and low (L) numbers of daily deaths compared with the median number of daily deaths from respiratory disease. The median number of daily deaths during the analysis period was seven deaths. Thus, "H" indicates days where the number of deaths from respiratory disease is higher than seven, and "L" indicates days where the number of deaths is equal to or lower than seven.

Independent variables included the 1-h maximum pollutant concentrations (SO2, CO, O3, PM10, PM2.5, and NO2) and the 24-h average temperature. Air pollutant concentrations one to three days before the deaths were included in the independent variables to observe the influence from pollution prior to death. Temperature is known to have a relatively long-term effect compared with air pollution [10]. Therefore, temperatures from four to 20 days before the deaths were also included in the independent variables. To decrease the number of independent variables, average temperatures from

four to 10 days and from 11 to 20 days were used instead of using temperatures on each day. A CART model's accuracy does not increase even though the number of independent variables is increased if there is correlation between the independent variables [15]. The 1-h air pollutant concentration n days before the deaths is abbreviated as "Pollutant name"\_max(n d) hereafter. For example, "O3\_max(1 d)" indicates the 1-h maximum O3 concentration, one day before death. The average temperatures from 4 to 10 days and from 11 to 20 days are represented as "T(4–10 d)" and "T(11–20 d)", respectively.

Taking into account the annual changes in air pollutant concentrations, an analysis was conducted for each month. For example, O3 concentration was especially high in May and June, and prolonged exposures to high ozone concentrations could be a direct cause of respiratory and eye diseases. On the other hand, particle concentrations were relatively high in winter. As such, it was difficult to accurately identify the factors causing mortality without adding seasonal distinctions into the analysis. Previous studies that used decision tree models to conduct studies related to air pollution also limited the analyses periods to several weeks and separate analyses were conducted for each season. For example, Chu et al. (2012) limited the analysis period to spring (from 28 April to 13 May 2009) to find factors that influenced ozone concentrations [24]. Park (2018) constructed independent models for each season to assess factors linked with cardiovascular disease [27].

#### **3. Status of Air Pollution, Temperature, and the Number of Deaths Caused by Air Pollutants**

#### *3.1. Air Pollution and Temperature*

Basic statistics of hourly SO2, CO, O3, NO2, PM10, and PM2.5 concentrations in Seoul, Korea, from 2008 and 2017 are illustrated as box plots in Appendix A (Figure A1). The plots show that SO2 and CO met the air quality standard at all times. Unsurprisingly, O3 exceeded the air quality standard from April to September; because strong sunlight accelerates O3 generation; as such, O3 generally increases in spring and summer [28–30]. The 24-h average NO2 exceeded the air quality standard from January to May and from October to December. Moreover, the 1-h average NO2 exceeded the standard regardless of the season. Much like NO2, the 24-h PM10 also exceeded the standard during the relatively cold seasons. One of the important causes of the higher PM10 concentrations during the cold seasons was the relatively low atmospheric mixing height because of a low ground temperature [31]. The year-round exceedance of PM2.5 shows how imperative it is to reduce PM2.5. Daily average temperatures showed clear seasonal changes; the highest temperature was 33.7 ◦C in August and the lowest value was −14.8 ◦C in January.

#### *3.2. Number of Deaths Caused by Respiratory Disease*

The number of deaths in Korea from 2008 to 2017 was around 2.6 million, so the annual average number of deaths was approximately 260,000. Differences in the number of deaths varied up to 15%, depending on the months. The mortality rate was relatively high in summer and winter, and relatively low in the spring and fall. Causes of death were cancers (28%), cardiovascular disease (22%), traffic accidents and suicides (11%), diabetes and liver disease (9%), respiratory disease (28%), and others (22%). DeLeon and Thruston (2003) found that the influence of air pollution on deaths was clear for the elderly, but less clear for others [32]. Accordingly, this study also focuses on persons aged 65 or older at time of death from respiratory disease (Table 1).

**Table 1.** Basic statistics of the daily number of deaths from respiratory disease for persons aged 65 or older at time of death from 2008 to 2017 in Seoul, Korea.


#### **4. Results and Discussion**

#### *4.1. Linkage of Air Pollution and Temperature with Mortality from Respiratory Disease*

Monthly average pollutant concentrations on days with a higher probability of deaths from respiratory disease were compared with those with lower probability of deaths to find the linkage between air pollution and deaths (Figures 3–7). Differences in O3 concentrations were relatively large from May to August when high O3 concentrations were observed (Figure 3). Studies have shown that the adverse health effects from O3 are often found in industrialized cities, in which the production of O3 is accelerated by NOx and VOCs emissions [4–7]. Because strong sunlight is crucial for O3 production, high O3 levels are often found in spring, as was true in this case (Figure 3).

The health impact of air pollution may appear within a few hours after exposure to pollution or several days afterward [5]. Consequently, a time delay may exist between the occurrence of high air pollution and death. To take this possibility into account, O3 concentrations up to five days before the day of death were analyzed; data in February, May, August, and November are presented as an example (Figure 3).

The O3 concentrations up to five days before recorded deaths were obviously higher than on days with lower numbers in May (Figure 3). The 1-h maximum O3 values in May on the days with high and low numbers of deaths were 86 ppb and 78.3 ppb, respectively, with a difference of 7.7 ppb (Figure 3a). Differences at 1, 2, 3, 4, and 5 days before death were 8.2 ppb, 8.9 ppb, 4.3 ppb, 6.8 ppb, and 5.6 ppb, respectively, indicating that the difference at 1–2 days before death was greater than that on the day of death. The 8-h average O3 values in May on days with high and low numbers of deaths were 72.9 ppb and 66.4 ppb, respectively, with a difference of 6.4 ppb (Figure 3b). Differences at 1, 2, 3, 4, and 5 days before death were 6.5 ppb, 6.6 ppb, 3.1 ppb, 5.0 ppb, and 4.1 ppb, respectively. The results indicated that O3 concentrations 1–2 days before death had a direct association with death, whereas O3 concentration 3–5 days before death had relatively less impact.

**Figure 3.** Ozone (O3) concentrations on days with a higher or lower probability of deaths from respiratory disease, and those from zero to 5 days before death in February, May, August, and November between 2008 and 2017: (**a**) 1-h and (**b**) 8-h maximum O3 concentrations.

PM2.5 concentrations did not show clear differences between the days with high and low numbers of deaths (Figure 4). Differences in PM2.5 concentrations in August did not consistently increase or decrease on days before death. However, differences 1 day and 2 days before death increased in February, May, and November, while differences decreased 3, 4, and 5 days before death. These results imply that high PM2.5 1 day or 2 days before death is associated with deaths from respiratory disease, which is consistent with an existing study showing that deaths from cardiovascular disease occur a few days after high PM2.5 concentrations [33].

**Figure 4.** Concentrations of particles less than 2.5 μm in size (PM2.5) on days with a higher or lower probability of deaths from respiratory disease, and those from zero to 5 days before deaths in February, May, August, and November between 2008 and 2017: (**a**) 1-h maximum and (**b**) 24-h average PM2.5 concentrations.

PM10 concentrations were relatively higher in winter and in spring (Figure 5). The high PM10 concentrations during relatively cold seasons are also related to the relatively low mixing height during the cold season, because pollutants can accumulate in the lower troposphere if the mixing height is low [34]. The period in which maximum PM10 concentrations were observed was consistent with days with an inflow of yellow dust from the west of Korea [35]. Differences in PM10 concentrations 1 day and 2 days before death were obvious in February, May, and November, which implied that PM10 concentrations one or two days before death had a large impact.

**Figure 5.** PM10 concentrations on days with a higher or lower probability of deaths from respiratory disease, and those from zero to 5 days before deaths in February, May, August, and November between 2008 and 2017: (**a**) 1-h maximum and (**b**) 24-h average PM10 concentrations.

Monthly average NO2 concentrations on days with high numbers of deaths did not show a big difference to those with low numbers of deaths (Figure 6). NO2 concentrations from one to five days before days with high numbers of deaths differed little from those with low numbers of deaths. Such results do not necessarily mean that NO2 concentration was not a direct cause of death from respiratory disease, because the recorded monthly average concentration alone is insufficient for a thorough analysis of its effects. Accordingly, a decision tree model was introduced to permit closer observations of the health impacts of air pollution.

**Figure 6.** Nitrogen dioxide (NO2) concentrations on days with a higher or lower probability of death from respiratory disease and those from zero to 5 days before deaths in February, May, August, and November between 2008 and 2017: (**a**) 1-h maximum and (**b**) 24-h average NO2 concentrations.

The average temperature on days from February to April with high numbers of deaths was lower than on days with a low number of deaths (Figure 7). This result is consistent with a previous study showing that high numbers of deaths by respiratory disease occurred on days with low temperatures in winter [36,37]. However, average temperatures on the days with high numbers of deaths were slightly higher in most cases than on the days with low numbers of deaths. This was partly because of the relatively low particle concentrations observed on cold days.

Air pollutant concentrations, especially those of secondary pollutants such as O3, are affected by meteorological conditions [29]. Moreover, when cold air masses move in from the relatively clean air of the northern polar area, particle concentrations tend to be low. Similar phenomena are observed in Vietnam as well. Hien et al. (2011) observed a reduced PM10 concentration from October to February in Hanoi, Vietnam, immediately before a cold surge occurred [38]. Similarly, temperature and air pollution are related. As a result, it is necessary to simultaneously analyze the effect of both air pollution and temperature on health to isolate the factors exacerbating respiratory disease.

**Figure 7.** Temperatures on days with a higher or lower probability of deaths from respiratory disease, and temperatures from zero to 5 days before deaths in May between 2008 and 2017.

#### *4.2. Factors that Impact the Number of Deaths Caused by Respiratory Disease*

#### 4.2.1. Influence of 1-h Maximum Pollutant Concentrations on the Number of Deaths from Respiratory Disease

The decision tree model was used to observe the effect of pollutant concentrations on the number of deaths caused by respiratory disease. Originally, the model was constructed using all data, and the accuracy of the model was checked by the 10-fold cross validation error. Results showed that the error rate was 45%, indicating that the model did not accurately classify factors affecting high and low probability of death from respiratory disease, partly due to seasonal variations of pollutant concentrations. Thus, the analyses were conducted separately for each month.

The impact of air pollution and temperature on respiratory disease as analyzed with a CART model could be interpreted as follows (Figure 8). Here, "T(11–20 d)" became the basis for the first branch in January. This signified that the most important factor that impacted the risk of death was "T(11–20 d)". Results showed that the risk of death was relatively low when "T(11–20 d)" was higher than 2.4 ◦C. When "T(11–20 d)" was 2.4 ◦C or lower, "PM2.5\_max(1 d)", the basis of the next branch, was checked to determine the risk of death. When "T(11–20 d)" was less than 2.4 ◦C and "PM2.5\_max(1 d)" was higher than 95.5 <sup>μ</sup>g·m−3, the risk of death was relatively high. If the "PM2.5\_max(1 d)" was 95.5 <sup>μ</sup>g·m−<sup>3</sup> or less, "PM10\_max(1 d)" was checked. If "PM10\_max(1 d)" exceeded 125.5 <sup>μ</sup>g·m<sup>−</sup>3, the risk of death was relatively high. Assuming that "PM10\_max(1 d)" was 125.5 <sup>μ</sup>g·m−<sup>3</sup> or less, if the risk of death was relatively high when "NO2\_max(1 d)" was higher than 83 ppb, the risk of death was relatively low if "NO2\_max(1 d)" was 83 ppb or less. Based on this, it was possible to analyze the linkage between temperature, PM10, PM2.5, and NO2 concentrations with the risk of death.

Here, "T(4–10 d)" was the most important factor in February. When "T(4–10 d)" exceeded 3.8 ◦C, the risk of death was relatively low. On the other hand, when "T(4–10 d)" was 3.8 ◦C or less, "T(3 d)" was checked to ascertain the risk of death. Even if "T(4–10 d)" was less than 3.8 ◦C, the risk of death was relatively low when "T(3 d)" was higher than 5.7 ◦C. However, when "T(4–10 d)" was less than 3.8 ◦C and "T(3 d)" was less than 5.7 ◦C, the risk of death differed depending on "PM10\_max(1 d)". Although the risk of death also increased when "PM10\_max(1 d)" was higher than 143 <sup>μ</sup>g·m<sup>−</sup>3, it was relatively low when "PM10\_max(1 d)" was 143 <sup>μ</sup>g·m−<sup>3</sup> or below. Such results showed that "T(4–10 d)" was the most direct factor and that it was also important in February.

High PM10 concentrations were frequently observed, partly because of yellow dust in March (Figure A1e) [39,40]. "T(2 d)", "PM10\_max(1 d)", and "NO2\_max(1 d)" were among the important factors in March. "NO2\_max(1 d)", "PM2.5\_max(1 d)", and "PM10\_max(1 d)" were related to the risk of death. Those results were consistent with previous studies that showed NO2 and particulate matter were directly related to deaths from respiratory disease. Dong et al. (2012) confirmed through a study in Shenyang, China, that the risk of death from respiratory disease increased when PM10 and NO2 concentrations were relatively high [41].

The 1-h maximum O3 concentrations before the deaths under study were the most important factors in May and in June. Burnet et al. (1997) illustrated the association between O3 concentrations one day before hospitalization and the number of hospitalized patients by analyzing patients in 16 cities in Canada from April 1981 to December 1991 [42]. Although "T(2 d)" was the most important factor in July, "O3\_max(1 d)" was also closely related with the deaths. Reid et al. (2012) found that the risk of death increased when both O3 concentrations and temperatures were high [6].

The temperatures 1, 2, and 3 days before the deaths were linked with the cause of death in August (Figure 8). This result was consistent with a previous study that showed that the risk of death increased with higher temperatures in August [43]. "NO2\_max(1 d)" was the most important factor relating to risk of death in September and in October. High PM10 and PM2.5 concentrations were frequently observed in November and in December, and "T(11–20 d)", "PM10\_max(1 d)", and "PM2.5\_max(1 d)" were important factors that determined the risk of death from respiratory disease.

The accuracy of results was ascertained through 10-fold cross validation of errors [44,45] (Table 2). Error rates were 27.6–42%, with the highest value in August. Errors were also relatively higher than for a similar study that analyzed factors influencing the number of deaths caused by cardiovascular disease [27]. The risk of death from respiratory disease was determined by only the temperature before deaths in August because pollutant levels were relatively low (Figure A1). One of the causes of the low pollutant concentrations in summer was the dilution of air pollution by the elevated mixing height [31]. In addition, the relatively high precipitation in summer inhibited the photochemical formation of O3 and accelerated the wet deposition of particles [46,47]. Accordingly, the risk of death was determined by only temperature in August, so the accuracy of the decision tree model may have been reduced.

**Figure 8.** Hourly maximum pollutant concentrations and temperature affecting the high (H) and low (L) probability of death from respiratory disease.

4.2.2. Influence of 24-h Average Pollutant Concentrations on the Number of Deaths from Respiratory Disease

The health impact of air pollution varied depending on how long people were exposed [13,14]. Zanobett et al. (2003) analyzed the influence of PM10 up to 40 days after exposure [48]. Results showed that when PM10 concentrations increased by 10 <sup>μ</sup>g·m−3, the risk of death from respiratory disease increased by 0.74%. The risk of death increased by five times if the exposure to air pollution lasted for more than a month. Therefore, to compare the effect of different exposure times, we analyzed the influence of 24-h average pollutant concentrations on deaths. Thus, the impact of the 24-h average pollutant levels was compared with that of the 1-h maximum pollutant concentrations analyzed in the previous section (Figure 9). The 24-h pollutant concentration n days before death is abbreviated as "Pollutant name"\_avg(N d) hereafter. For example, "O3\_avg(1 d)" indicates the 24-h average O3 concentration one day before death.

The first branch that classified the high and low risk of death in January was "T(11–20 d)". The second branch was classified based on "PM10\_avg(2 d)". The risk of death was relatively low when "T(11–20 d)" was 2.4 ◦C or lower. When "T(11–20 d)" was higher than 2.4 ◦C, the risk of death was high if "PM10\_avg(2 d)" was higher than 36.5 <sup>μ</sup>g·m−3. When "PM10\_avg(2 d)" was less than 36.5 <sup>μ</sup>g·m−3, the risk of death was relatively high if "PM2.5\_avg(2 d)" exceeded 29.4 <sup>μ</sup>g·m−3. When "PM2.5\_avg(2 d)" was less than 29.4 <sup>μ</sup>g·m−3, the risk of death was relatively high if "NO2\_avg(2d)" was higher than 52 ppb.

"T(4–10 d)" was directly related in February with the risk of death because of respiratory disease. When "T(4–10 d)" exceeded 3.8 ◦C, the risk of death was relatively low. When "T(4–10 d)" was less than 3.8 ◦C, the risk of death was relatively low only if "PM10\_avg(2 d)" exceeded 52.8 <sup>μ</sup>g·m−3. When "PM10\_avg(2 d)" was less than 52.8 <sup>μ</sup>g·m<sup>−</sup>3, the risk of death was classified as relatively high if "PM2.5\_avg(2 d)" exceeded 25 <sup>μ</sup>g·m<sup>−</sup>3, but the risk was relatively low if "PM2.5\_avg(2 d)" was lower than 25 <sup>μ</sup>g·m−3. Likewise, "T(4–10 d)", "PM10\_avg(2 d)", and "PM2.5\_avg(2 d)" were major factors in February.

"T(2 d)", "PM10\_avg(2 d)", and "PM2.5\_avg(2 d)" were factors in March that influenced the number of deaths from respiratory disease. Factors associated with the risk of death in April included "NO2\_avg(2 d)", "PM10\_avg(1 d)", and "PM2.5\_avg(1 d)". High O3 concentrations were often observed in May. "O3\_avg(1 d)" and temperatures before death were among the important factors. "O3\_avg(2 d)" and "T(1 d)" were important in determining the risk of death.

Although "T(2d)" was the most important factor in July, "T(1 d)" and "O3\_avg(1 d)" were also associated with deaths as well. The risk of death increased along with the higher temperatures in August. "NO2\_avg(2 d)" was the most important factor related to the risk of death in September and in October. However, "T(11–20 d)" and "PM10\_avg(1 d)" were the important risk factors in November and in December.

The 10-fold cross validation errors resulted in 24.1–42% of errors, depending on the month (Table 2). The errors were greater than those in a similar study that analyzed the relation between air pollution and cardiovascular disease [27]. We also conducted the analysis using a 3-day cumulative pollutant. Factors linked with the death from respiratory disease using a 3-day cumulative pollutant were similar to those using separate data on each day. However, the accuracy of the model using a 3-day cumulative pollutant was slightly worse than that using daily pollutant concentrations.

Errors analyzed using the 24-h average concentrations were slightly less than those analyzed using the 1-h maximum concentrations. The result indicated that the 24-h average concentrations were more directly related to the risk of death than 1-h maximum concentrations. Jerrett et al. (2004) confirmed differences in degrees of exposure leading to differences in mortality through research conducted in Hamilton, Canada [49]. If the subject of the analysis did not engage in outdoor activities during the period in which the 1-h maximum concentration was observed, a direct association between high pollutant concentrations and the risk of death did not occur, even though the 1-h maximum concentration was directly associated with health. Therefore, the results of this study alone should not be interpreted as proving that short-term exposure to extreme levels of pollution was relatively less hazardous to health than long-term exposure to elevated levels of air pollution.

**Figure 9.** *Cont*.

**Figure 9.** Daily average pollutant concentrations and temperatures affecting the high (H) and low (L) probability of deaths from respiratory disease.


**Table 2.** Ten-fold cross validation errors from using a decision tree model to predict higher or lower probability of death from respiratory disease.

<sup>1</sup> Values in parenthesis are errors using 3-day cumulative data instead of using data on each day.

#### **5. Conclusions**

The impact of air pollution on the risk of death from respiratory disease was analyzed. The analysis was conducted separately for each month to take into account the seasonal variability of air pollutant concentrations. The independent variables were 1-h maximum and 24-h average O3, PM2.5, PM10, NO2, SO2, and CO concentrations and temperatures. The dependent variables were classified into days with high (H) and low (L) numbers of deaths caused by respiratory disease. The results showed that a higher risk of mortality was observed on days from November to March, in which PM10/PM2.5 concentrations were relatively high and temperatures were low. NO2 was a crucial factor that influenced deaths from April to October. O3 was the most important factor in May and in June. The risk of death increased with the high temperatures in July and August.

The accuracy of results was validated through 10-fold cross validation errors. Although error rates using the 1-h maximum pollutant concentrations were 27.6–42%, those using the 24-h average pollutant concentrations slightly decreased to 24.9–42%. Thus, the 24-h average pollutant concentrations were found to be more directly related with mortality from respiratory disease.

The results obtained from this study may be used to establish policies to minimize the adverse health effects of air pollution. For example, when pollutant concentrations are forecast and are communicated to the public, daily average concentrations should be emphasized more than the hourly maximum concentrations. In addition, the results could be used to guide the public to refrain from outdoor activities when pollutant levels are elevated.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A**

Basic statistics of hourly SO2, CO, O3, NO2, PM10, and PM2.5 concentrations in Seoul, Korea, from 2008 and 2017 are illustrated as a box plot (Figure A1). The top and bottom of the box represented the 75th percentile (Q3) and 25th percentile (Q1), respectively. The tail's upper most value expressed the smaller one between the maximum value and Q3 + 1.5 × (Q3 − Q1). On the other hand, the tail's lower most value is the larger one between the minimum value and Q1 − 1.5 × (Q3 − Q1).

**Figure A1.** Box plot of 1-h maximum, 8-h maximum, and 24-h average pollutant concentrations, and 24-h average temperatures between 2008 and 2017. (**a**) SO2, (**b**) CO, (**c**) O3, (**d**) NO2, (**e**) PM10 and (**f**) PM2.5, and (**g**) temperature.

#### **References**


© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Atmosphere* Editorial Office E-mail: atmosphere@mdpi.com www.mdpi.com/journal/atmosphere

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18