Daily PM2.5 and Seasonal-Trend Decomposition to Identify Extreme Air Pollution Events from 2001 to 2020 for Continental Australia Using a Random Forest Model

Borchers-Arriagada, Nicolas; Morgan, Geoffrey G.; Van Buskirk, Joseph; Gopi, Karthik; Yuen, Cassandra; Johnston, Fay H.; Guo, Yuming; Cope, Martin; Hanigan, Ivan C.

doi:10.3390/atmos15111341

Open AccessArticle

Daily PM_2.5 and Seasonal-Trend Decomposition to Identify Extreme Air Pollution Events from 2001 to 2020 for Continental Australia Using a Random Forest Model

by

Nicolas Borchers-Arriagada

^1,2,*

,

Geoffrey G. Morgan

^2,3,4,5

,

Joseph Van Buskirk

^3,6

,

Karthik Gopi

^2,3,4,5,

Cassandra Yuen

^2,3

,

Fay H. Johnston

^1,2,4

,

Yuming Guo

^2,7,

Martin Cope

^2,8 and

Ivan C. Hanigan

^2,4,9,*

¹

Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS 7000, Australia

²

Centre for Safe Air, NHMRC Centre for Research Excellence, 17 Liverpool Street, Hobart, TAS 7000, Australia

³

Sydney School of Public Health, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia

⁴

Healthy Environments and Lives (HEAL) National Research Network, Canberra, ACT 2601, Australia

⁵

University Centre for Rural Health, University of Sydney, Lismore, NSW 2480, Australia

⁶

Public Health Unit, Sydney Local Health District, Sydney, NSW 2050, Australia

⁷

Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia

⁸

CSIRO Oceans and Atmosphere, Aspendale, VIC 3195, Australia

⁹

WHO Collaborating Centre for Climate Change and Health Impact Assessment, School of Population Health, Faculty of Health Science, Curtin University, Perth, WA 6102, Australia

^*

Authors to whom correspondence should be addressed.

Atmosphere 2024, 15(11), 1341; https://doi.org/10.3390/atmos15111341

Submission received: 7 October 2024 / Revised: 29 October 2024 / Accepted: 3 November 2024 / Published: 8 November 2024

(This article belongs to the Section Air Quality)

Download

Browse Figures

Versions Notes

Abstract

:

Robust high spatiotemporal resolution daily PM_2.5 exposure estimates are limited in Australia. Estimates of daily PM_2.5 and the PM_2.5 component from extreme pollution events (e.g., bushfires and dust storms) are needed for epidemiological studies and health burden assessments attributable to these events. We sought to: (1) estimate daily PM_2.5 at a 5 km × 5 km spatial resolution across the Australian continent between 1 January 2001 and 30 June 2020 using a Random Forest (RF) algorithm, and (2) implement a seasonal-trend decomposition using loess (STL) methodology combined with selected statistical flags to identify extreme events and estimate the extreme pollution PM_2.5 component. We developed an RF model that achieved an out-of-bag R-squared of 71.5% and a root-mean-square error (RMSE) of 4.5 µg/m³. We predicted daily PM_2.5 across Australia, adequately capturing spatial and temporal variations. We showed how the STL method in combination with statistical flags can identify and quantify PM_2.5 attributable to extreme pollution events in different locations across the country.

Keywords:

air pollution; smoke; bushfire; wildfire; wood heaters; machine learning; particulate matter

1. Introduction

Outdoor air pollution fine particulate matter (PM_2.5—particulate matter with an aerodynamic diameter of equal to or less than 2.5 µm) is one of the largest environmental risk factors affecting people’s health around the globe with an estimate 4.2 (95% CI, 3.7–4.8) million premature deaths per year attributable to this hazard [1]. Exposure to PM_2.5 concentrations has been associated with increases in cardiovascular and respiratory morbidity and mortality, as well as a wide range of other health conditions [2], even at low concentrations [3].

While efforts have been made to decrease PM_2.5 exposure around the world, more than half the world’s population still experienced increases in annual PM_2.5 exposure in the last decade [4,5], and some sources such as PM_2.5 from fire smoke pollution are increasing in places like the Western US [6] and Southeastern Australia [5]. Exposure to fire smoke PM_2.5 has been linked to various health outcomes [7], and recent large epidemiological studies have found positive association between fire smoke PM_2.5 and all-cause, cardiovascular and respiratory mortality [8] and hospital admissions [9], along with other health outcomes.

Temperature extremes, drought, and fire weather conditions are expected to increase with climate change [10,11]. This will likely lead to increases in the contribution of bushfires—also known as wildfires or landscape fires—to total PM_2.5 exposure affecting populations worldwide [12]. This is particularly relevant for Australia, the driest inhabited continent on the planet that was hit by catastrophic bushfires during the 2019–2020 bushfire season (~October 2019 to March 2020) [13]. Controlled landscape vegetation fires (also called prescribed burns, controlled burns, or hazard-reduction burns) are conducted in Australia to reduce the risk of uncontrolled bushfires, and these controlled burns can produce large amounts of smoke [14], and are also categorized as landscape fires.

Extreme air pollution events (i.e., high pollution events that deviate from normal expected values) in Australia can be related to smoke from bushfires, smoke from controlled burns, smoke of domestic wood heaters, and dust storms [15]. Robust high spatiotemporal resolution daily PM_2.5 estimates, as well as the PM_2.5 component due to extreme pollution events, are necessary to undertake epidemiological studies of health and total PM_2.5 and fire, the smoke related PM_2.5 component, as well as assessments of the health burden attributable to total PM_2.5 and the extreme pollution event component of PM_2.5.

High-quality air pollution monitoring is costly and often limited to a sparse network of monitors at locations considered representative of population-level air pollution [16,17]. Air pollution can vary across space and time, and different methods have been employed to predict air pollution concentrations at finer spatial scales and across large spatial extents in locations where air pollution monitoring stations do not exist. These methods include, but are not limited to, (1) spatial interpolation such as inverse distance weighting or kriging using only available monitoring data [18,19], (2) geographically weighted regression [20], (3) generalized linear regression models that include other covariates such as satellite-derived aerosol optical depth (AOD) and other land use and geographical variables [21,22], (4) chemical transport models that rely on an accurate estimate of source emissions [23], and more recently, (5) machine learning methods using a broad range of spatial and spatiotemporal predictors [17,24,25].

One regularly used machine learning algorithm is the Random Forest [26,27]. This method has been widely applied in multiple disciplines, including environmental health applications. Reid et al. [24], Schneider et al. [25], and Stafoggia et al. [17] have implemented the Random Forest algorithm to estimate daily PM_2.5 across the United States (US), Great Britain, and Sweden, respectively. Chen et al. [28] compared a series of models (linear regression, regularization, and machine learning) to predict annual fine particles and nitrogen oxides, and found that two machine learning algorithms based on the classification and regression tree framework (Random Forest and boosted machine) performed best compared to other models (18 different individual or ensemble algorithms). Enebish et al. [29] compared the performance of different machine learning methods to predict ambient PM_2.5 in Ulaanbaatar, Mongolia, and found that the Random Forest algorithm performed best out of six different machine learning models. These previous studies have shown that the Random Forest algorithm offers several strengths compared to other approaches, some of which are its natural resistance to overfitting, reduced sensitivity to outliers in the training data, and efficient parallel implementation capabilities that enable faster training on large data sets.

Various methods have been applied to estimate the component of total PM_2.5 attributable to extreme pollution events such as bushfires. Some studies have used machine learning to estimate the fire smoke-related PM_2.5 component directly [30]. Others have relied on dispersion or chemical transport models [8,9,31] to define the PM_2.5 component due to fire smoke events. Other analytical methods using a combination of satellite imagery to identify smoke days, positive PM_2.5 anomalies from ground-based observations to estimate fire smoke PM_2.5 at monitor locations, and different spatiotemporal variables (meteorology, fire variables, counts of HYSPLIT trajectory points, aerosol measurements, and AOD predictions, among others) to predict wildfire smoke have also been implemented [32,33]. Other studies have taken a two-step approach that first identifies days affected by an extreme air pollution episode (such as 95th percentile of the daily PM_2.5 distribution), often related to a specific emissions source such as bushfire smoke, and then estimates the portion of total PM_2.5 attributable to the episode [34,35]. Unusual air pollution events due to smoke plumes from landscape fires have been verified using independent reports or satellite imagery [15], including the National Oceanic and Atmospheric Administration’s Hazard Mapping System [36] or the Navy Aerosol Analysis and Prediction System (NAAPS) [37,38]. Recent studies have relied on a method known as seasonal-trend decomposition with loess (STL) [39], which enables decomposition of daily PM_2.5 time series into three components (seasonal, trend, and remainder), and use these to estimate daily background (i.e., the expected concentration given ordinary seasonal and trend conditions) and non-background (i.e., component attributed to extreme events) PM_2.5 components [35,40].

In this analysis, we connected STL with Random Forest modelling to achieve two objectives: (1) use a Random Forest algorithm to estimate daily PM_2.5 at a 5 km spatial resolution across Australia between 1 January 2001 and 30 June 2020, and (2) implement an STL methodology combined with statistical flags to identify extreme event days and estimate the PM_2.5 component of total PM_2.5 related to the extreme air pollution episode. For the purposes of this study, we define extreme PM_2.5 events as any exceptionally high levels which may be due to different non-seasonal causes, including fires, dust storms, industrial accidents, or short-term meteorological changes, among others.

2. Materials and Methods

2.1. Scope of Study

Continental Australia includes mainland Australia, the island state of Tasmania, and various other smaller islands. It has a land extension of 7,633,565 square kilometers and is composed of six states (New South Wales, Victoria, Queensland, Western Australia, South Australia, and Tasmania) and two territories (the Northern Territory and the Australian Capital Territory). Weather varies greatly across the country. The north is characterized by a tropical monsoonal climate, with a dry season that runs between May and October, usually with high landscape fire activity and air pollution levels. Eastern Australia has a temperate climate, which varies along the coastal region from Northern Queensland and the southern states coast. The climate is characterized by a hot humid summer in the north, shifting into a warm humid summer, a warm summer with a cool winter, and a mild warm summer with a cold winter in some parts of the south. The southern island of Tasmania is characterized by a cooler climate, with a mild warm summer and a cold winter. High pollution days can also be experienced during winter due to the use of wood heaters in the southern states and the central and southern inland regions [41].

2.2. Data

2.2.1. PM₁₀ and PM_2.5 Daily Observations

For this study, we calculated daily PM_2.5 and PM₁₀ at locations with fixed sited monitors from the Centre for Safe Air’s (CSA) National Air Pollution Monitor Database, which combines hourly and daily regulatory and field monitor data from state and territory governments monitoring networks [42,43]. We used the following procedure: (1) We obtained hourly raw data, and rounded large negative values (<−20 μg/m³) to −20 μg/m³—temporary negative readings may occur due to different factors, but with proper calibration and correction, the data remains reliable; (2) we prioritized data from regular monitoring stations (TEOMs and BAMs) rather than field monitors (DustTracks and Partisol) where they were collocated and there were more than one observation for the same location-hour; (3) we linearly imputed for missing hourly data where there was no more than one consecutive missing hour using the “na_interpolation” function from the “imputeTS” package in R version 4.2.2. [44]; (4) we averaged hourly data by station to daily averages where at least 60% of hourly data were available (including imputed hourly data); and (5) we extracted existing daily data from the regulatory state and territory daily data and aggregated to the daily averages from the hourly data. Where there is overlap, priority is given to the hourly data, and the daily values are discarded. Overlap occurs because there have been periods where multiple monitors have been collocated.

2.2.2. Spatial Predictors

For our analysis, we used a series of spatial predictors, which represent variables that vary in space but not time. These were grouped into four broad categories: (1) position and elevation, (2) land cover, (3) land use, and (4) emissions sources. The position and elevation category considered elevation, distance to ocean, longitude, and latitude. Nine variables were included for the land cover and land use categories, and data was extracted at different buffer sizes ranging from 50 to 10,000 meters. Specific buffers for each variable are presented in Supplementary Table S1. The land cover category considers the following five variables: tree cover (%), normalized difference vegetation index (NDVI), impervious surfaces (%), ratio of water bodies coverage, and ratio of parkland coverage. The land use category considers four variables, each representing the land coverage of a specific land use type (residential, commercial, industrial, or open areas) within the different buffer sizes. Finally, we developed multiple variables to represent emissions of point sources, roads, traffic, and wood heaters.

More detail on data sources, variable definitions, and buffers included for each variable are presented in Supplementary Table S1.

2.2.3. Spatiotemporal Predictors

Spatiotemporal predictors are those that vary both in space and time, regardless of whether this variation is daily, monthly, or yearly. For this study we considered variables that were grouped in the following broad categories: (1) satellite-based air pollution estimates, (2) emission sources, (3) population, and (4) weather. The satellite-based air pollution estimates consisted mainly of outputs from the Global Modelling and Assimilation Office’s (GMAO) Modern Era-Retrospective Analysis for Research and Analysis, version 2 (MERRA-2) reanalysis data set [45]. These data are available at a 0.5° × 0.625° spatial resolution and hourly temporal resolution. We considered the following variables: total aerosol scattering aerosol optical thickness (AOT) at 550 nm, black carbon surface mass concentration, dust surface mass concentration PM_2.5, organic carbon surface mass concentration PM_2.5, SO₄ surface mass concentration, and sea salt surface mass concentration PM_2.5. For this study, we additionally estimated MERRA-2 PM_2.5 following recommendations by Buchard et al. [46]. Spatiotemporal emission sources represented landscape fire emissions through two variables calculated for various buffers ranging from 10,000 to 500,000 meters (details in Supplementary Table S1): (1) total burned vegetation area ratio, and (2) active fire density, using the active fire products from the Moderate Resolution Imaging Spectroradiometer (MODIS) [47,48]. Population density was calculated for each year and various buffers (50 to 10,000 meters) using 1 km grids developed by the Australian Bureau of Statistics [49]. Finally, weather variables including daily mean temperature, total rainfall, planetary boundary layer height, and solar exposure, among others, were obtained from the Bureau of Meteorology [50] and the fifth-generation European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5-Land data set [51]. Drought was represented through the Standardized Precipitation Evapotranspiration Index (SPEI) [52,53] and its duration in months, and calculated using the SPEI package in R [54]. In all cases, we calculated daily means or totals at UTC + 10 time zone, to reflect the mean time zone for Australia, and then aggregated at a monthly and yearly level for most spatiotemporal variables.

More detail on data sources, variable definitions, temporal aggregations, and buffers included for each spatiotemporal variable are presented in Supplementary Table S1.

2.2.4. Other Predictors

Additionally, we considered the following temporal predictors/variables: date, year, month number of the year, day number of the month, week number of the year, wday (day name), yday (day number of the year—julian day) and dow (day number of the week).

2.3. Methods

We estimated daily total PM_2.5 and the PM_2.5 component associated with extreme pollution episodes for the period 1 January 2001 to 30 June 2020 across the Australian continent at a 5 km × 5 km spatial resolution (equivalent to 307,468 grid cells), using a four-stage process, as summarized in Figure 1. In Stage 1 we imputed missing PM_2.5 using PM₁₀ data from collocated and cotemporally available monitoring sites and other predictor variables. In Stage 2 we applied a Random Forest machine learning algorithm [26,27] to model daily PM_2.5 by using a four-step process starting with default configuration parameters, considering all available predictor variables, and ending with a trimmed and tuned final model. In Stage 3 we predicted daily PM_2.5 across Australia using a 5 km × 5 km spatial resolution. Finally, in Stage 4, we estimated the daily PM_2.5 component associated with extreme pollution episodes by using an STL decomposition in combination with statistical outlier tests calculated for each modelled grid cell.

2.3.1. Stage 1: Data Cleaning and PM_2.5 Imputation

We imputed missing PM_2.5 values from monitoring-site observations using PM₁₀ and other available variables. This was only done for collocated and cotemporally available PM₁₀ observations (i.e., PM₁₀ was within the date range of available PM_2.5 data for each monitoring site). For this, we used a Random Forest to model (and then predict) PM_2.5 based on the following variables: PM₁₀, station, state, longitude, latitude, elevation, year, month, and weekday. We initially randomly split the data into training (90% of data) and testing (10% of data) sets. We developed the model using the training set and performed hyperparameter (mtry: the number of variables randomly sampled; and num_trees: the number of trees in the Random Forest) tuning to find the model with the highest out-of-bag R-squared. The Random Forest model was implemented using the ranger package in R [55,56]. The out-of-bag R-squared is calculated within the ranger package, which implements the jackknife-after-bootstrap or infinitesimal jackknife for bagging [57]. After PM_2.5 imputation, for the following stages we only used monitoring stations that had at least 1 full year of data.

2.3.2. Stage 2: PM_2.5 Modelling

We used a Random Forest algorithm to model daily PM_2.5 as a function of a series of spatial and spatiotemporal predictors, in a four-step process detailed as follows and summarized in Figure 1. Step 1 developed a base model using default hyperparameters and including all predictor variables. Step 2 excluded highly correlated variables and retained the most important parameter from the group (e.g., variables calculated at different buffers were considered within the same group, see Supplementary Table S2 for variables considered within each group). First, a correlation matrix for variables within groups was calculated using the ‘cor’ function in R, and then the ‘findCorrelation’ function from the caret package [58] with a cutoff of 0.8 was used to identify variables that needed to be removed to reduce pair-wise correlations. Step 3 applied hyperparameter tuning varying values for mtry (10 to 30 in steps of 1) and num_trees (300 to 700 in steps of 100) and selected the model that yielded the highest out-of-bag R-squared. Additionally, we identified those variables that had an importance higher or equal to the mean importance score using the permutation test across all variables to identify the most important ones, following a method supported by Pedregosa et al. [59], as implemented in the widely used scikit-learn tool. Finally, in Step 4, our final model considers only these most important variables and tuned hyperparameters from Step 3.

Throughout the model-development process, we used the out-of-bag R-squared as a measure of model performance. We calculated the performance of all models (Step 1–Step 4) across 10 spatial folds (spatial cross-validation) to test how these models performed on data from “unseen” monitoring sites.

2.3.3. Stage 3: Prediction of Daily PM_2.5

We used the Random Forest model developed during Stage 2 to predict daily PM_2.5 concentrations at a 5 km × 5 km spatial resolution across Australia. We reviewed the output of this estimation and corrected where needed. For example, whenever we identified a pixel with negative values, we flagged them and used neighboring pixels to interpolate a positive PM_2.5 concentration. To validate our predictions, we compared our results as follows:

We compared (R-squared, Pearson’s correlation, and correlation plots) daily, monthly, and annual PM_2.5 concentrations with those values estimated for monitoring sites.
We calculated the normalized mean bias (NMB) and the normalized mean absolute error (NMAE) for each monitoring site [60].
We compared PM_2.5 concentration (using correlation plots) at a daily, monthly, and annual level with other available PM_2.5 models, such as Van Donkelaar [61], the CAMS global reanalysis (EAC4) [62], and the MERRA-2 aerosol reanalysis [45].

2.3.4. Stage 4: Seasonal-Trend Decomposition and Estimates of PM_2.5 from Extreme Pollution Days

After predicting daily PM_2.5 we used the following procedures to first decompose daily PM_2.5 into two components: background (i.e., the expected concentration under ordinary conditions with absence of extreme pollution) and non-background (i.e., the component attributed to extreme events that deviate from the normal background). We then flagged extreme/pollution days and calculated the PM_2.5 component attributable to an extreme pollution episode.

STL decomposition: we used a method known as seasonal and trend decomposition using loess (STL) [39] on our daily time-series using the ‘stl’ function from the stats package in R [55]. We sought to optimize the removal of season and trend influence by minimizing the autocorrelation in the remainder. To do this we conducted a grid search over candidate seasonal window parameters 15, 25, 35, and 45, and selected the value which minimized the residual autocorrelation estimated as the sum of the absolute value of the partial autocorrelation function (PACF) of the remainder (with PACF maximum lag = 38). Thus, we used a seasonal window of 15 days and a trend window of 2 years (365.25 × 2 to account for leap years). Given that Australia was struck by unprecedented bushfires between the end of 2019 and the beginning of 2020, we excluded 2020 from the STL to avoid adding the extreme variability of that season. This method decomposes a time series into three components—seasonal, trend, and irregular (remainder) components.
Identification of days affected by potential extreme pollution episodes: for each pixel-day, we calculated the following flags (1 = yes, 0 = no), which can be used independently or in combination to identify days affected by extreme pollution episodes:
- flag_p95: This flag indicates if predicted daily PM_2.5 is above the 95th percentile calculated for each pixel across the whole period. This method has been previously used to identify fire smoke affected days [15,18,63].
- flag_2SD_remainder: This flag indicates if the remainder component of the STL decomposition is larger than two times the standard deviation of the remainder (i.e., remainder > 2 × SD_remainder). A similar method has been used to identify statistical outliers, using three times the standard deviation to classify corresponding dates as one with an extreme pollution event [40].
Calculation of PM_2.5 component attributable to an extreme pollution episode: The PM_2.5 component attributable to an extreme air pollution episode corresponds to the remainder from the STL decomposition whenever flag_p95 and/or flag_2SD equals 1. The sum of the season plus trend components on such days estimates the expected magnitude of PM_2.5 had the extreme event not occurred (i.e., the difference between the season plus trend and the predicted daily PM_2.5 from the model is the attributable component).

This method has been previously used for the identification of extreme air pollution episodes [40] and has been used to calculate the portion of PM_2.5 attributable to bushfire smoke in epidemiological studies [35] and to calculate the mortality burden attributable to extreme PM_2.5 air pollution events in Australian cities [64]. On days identified as having extreme pollution, the remainder component gives an estimate of the PM_2.5 magnitude attributable to the extreme pollution episodes on that day.

The flag_p95 flag is focused on identifying days with extreme absolute magnitude daily PM_2.5 compared to the long-term average of daily PM_2.5 for that location. The flag_2SD_remainder flag is focused on identifying days with unusually high seasonal PM_2.5 compared to the long-term seasonal trend of daily PM_2.5 for that location. Locations that experience substantial seasonal variation in daily PM_2.5 may have a different set of days identified as extreme events using the two flags, while locations that experience limited seasonal variation in daily PM_2.5 will likely have more similar sets of days identified as extreme events using the two flags.

3. Results

3.1. Stage 1: Data Cleaning and PM_2.5 Imputation

Between 2000 and 2020, and after the data-cleaning process (Figure 2), a total of 147 PM₁₀ and 185 PM_2.5 monitoring stations had at least one complete year of data, starting with 18 and 17 stations measuring PM₁₀ and PM_2.5, respectively, in 2000, with 126 and 165 PM₁₀ and PM_2.5 stations, respectively, in 2020. The jurisdiction (state or territory) with the highest number of PM_2.5 monitoring stations was New South Wales with 84, while the Northern Territory and the Australian Capital Territory had the least number of stations with only three. Most monitoring stations are located mainly in three climate zones: (1) Hot dry summer, cold winter, (2) Warm summer, cool winter, and (3) Mild warm summer, cold winter. The location of monitoring sites within jurisdictions and climate zones is presented in Figure 2, while descriptive statistics of the PM data used for this study are shown in Table 1. After imputing for missing PM_2.5 values, the total number of observations available for modelling increased from 394,886 to 511,426 (~22% missing values).

The Random Forest model used to impute for missing PM_2.5 values had a very strong performance overall with an OOB R-squared of 81.1%, a testing R-squared of 80%, and a training R-squared of just under 95%. Model summaries, variable importance, and scatter plot of predictions vs. observations on testing set are presented in Supplementary Table S3 and Figures S1 and S2.

3.2. Stage 2: PM_2.5 Modelling

Overall, the models developed in all steps had a very high performance with an OOB R-squared of above 66%, testing R-squared above 64%, and training R-squared above 93%. The model improved substantially from step 1 (259 predictors and default hyperparameters) to step 4 (37 predictors) with OOB R-squared increasing from 66.7% to 71.5%, testing R-squared from 64.3% to 69.5%, and training R-squared from 93.8% to 95% (Supplementary Table S4). The 37 final predictor variables and the variable importance are presented in Supplementary Figure S3. The MERRA-2 satellite-based variables (organic carbon, aerosol optical depth), together with the planetary boundary layer height and fire density (500 km buffer) were amongst the five most important variables for our model. Across the 10 spatial folds, the model performance was marginally lower, as would be expected with a mean R-squared of 58.1% for the final model (Supplementary Table S5). Similarly, model performance improved from step 1 to step 4. Figure 3 shows the R-squared values achieved with the final Random Forest.

3.3. Stage 3: Prediction of Daily PM_2.5

Figure 4 shows the correlation (Pearson’s) between predicted and observed PM_2.5 for each monitoring site, and a very high correlation was achieved for most sites (cor > 0.86) except for one site with a moderately high correlation (Moolawatana, located in Northeast South Australia, cor = 0.65). Descriptive statistics of observed vs. modelled daily PM_2.5 predictions by State/Territory are presented in Supplementary Table S6. Also, Supplementary Figures S4 and S5 present the estimated NMB and NMAE by monitoring station. For most monitoring stations the NMB and NMAE are close to zero, indicating that in general, there is a low level of under- or over-prediction and that the model’s predictions align well with observed data. Correlation plots (daily, monthly, and annual) between observed data and our predictions and other PM_2.5 models [45,61,62] are presented in Supplementary Figures S6–S8.

Figure 5 shows some of the results we estimated during the third stage of our modelling framework (Prediction of daily PM_2.5). Mean daily PM_2.5 concentrations vary greatly across Australia (Figure 5a), with the south (i.e., Tasmania) having low PM_2.5 (<5 µg/m³), mid-range values across southeastern Australia (i.e., eastern Victoria and eastern New South Wales) with PM_2.5 concentrations between 6–8.5 µg/m³, and higher values in the northern top end of the country (i.e., Northern Territory) with PM_2.5 concentrations above 12 µg/m³. Figure 5b shows a relatively low standard deviation across most of the country, except for the southeastern part of the country (eastern Victoria and southeastern New South Wales), which was affected by devastating bushfires during the 2019–2020 fire season. The mean PM_2.5 by jurisdiction for each financial year (July to June) in Figure 5c shows similar patterns, with lower mean concentrations in Tasmania, mid concentrations in New South Wales/Queensland/Victoria, and higher concentrations in the Northern Territory. Likewise, extreme fire seasons stand out with higher mean PM_2.5 concentrations during the 2003–04 and the 2019–2020 fire seasons in the Australian Capital Territory. A high interannual variation can also be seen in the Northern Territory, likely related to differences in burning patterns across the Northern Savannas. Supplementary Figure S9 shows these results excluding the 2019–2020 financial year, where the southeastern part of the country presents a lower standard deviation although still higher than the rest of the country. Other results such as higher mean concentrations in the Northern Territory remained the same.

3.4. Stage 4: Estimates of PM_2.5 from Extreme Pollution Days

Table 2 shows the mean and population-weighted (popw) mean number of days flagged as extreme pollution days for each year by each flag (flag_p95, flag_2SD_remainder) and both flags combined. Detail by State/Territory, year, and month is presented in Supplementary Figures S10–S12.

Case Studies

Our analysis includes four cases studies, including three cities across Australia (Launceston, Darwin, and Sydney), and one region in the remote mountains of Victoria with no PM_2.5 monitoring stations, to better illustrate the strengths and limitations of applying the methods from stage 4 to identify extreme PM_2.5 pollution days and temporal and spatial patterns. Figure 6 shows an example application in the three selected cities. These cities were selected to illustrate different characteristics: (1) Launceston, a small city (2020 pop = 71,000) located in the southern island of Tasmania, with a mild warm summer and cold winter affected by seasonal woodsmoke and sporadic bushfires and controlled landscape fires to reduce the risk of uncontrolled bushfires, (2) Darwin, a city (2020 pop = 158,000) located in the Northern Territory, with a hot humid dry season affected by seasonal landscape fires (mainly savanna burning) between May to October (Darwin), and a wet season from November to April and (3) Sydney, a large metropolitan city (2020 pop = 4.9 million) with a humid subtropical climate and a broad range of air pollution sources [63], seasonally affected by woodsmoke and sporadically affected by bushfires and controlled landscape fires to reduce the risk of uncontrolled bushfires, but with minimal seasonal variation. Different time periods that show how these cities are affected by extreme pollution have been selected: (1) Launceston: 2005–2007, (2) Darwin 2011–2013, and (3) Sydney: July 2017–June 2020. We also did a case study in a location with no PM_2.5 monitoring stations (i.e., the Alpine National Park) in Victoria to assess temporal and spatial patterns of PM_2.5 during the well-defined Eastern Victoria Great Divide Bushfires of 2006–2007 [65].

As can be seen in Figure 6, both event threshold identification methods (flag_p95 and flag_2SD_remainder) can detect unusually high air pollution days in the three selected cities, although with some differences. In general, flag_2SD_remainder (panels B, D, and F) seems to be more sensitive to capturing days with unusually high seasonal pollution concentrations that are outliers compared to flag_p95, especially in highly seasonal locations such as Launceston and Darwin. On the other hand, and as expected, flag_p95 only captures very extreme values. In extreme situations, such as the Black Summer bushfires (panels E and F), both flags seem to perform similarly.

Supplementary Figure S13 shows the number of extreme pollution days detected by both flags for these three case study locations (Launceston, Darwin, and Sydney) by year and month between 2001 and 2020. Supplementary Tables S7–S9 present an estimate of the mean PM_2.5 extreme event component based on each flag and case study location by year. In the case of Launceston (Supplementary Table S7), we estimate that the mean PM_2.5 extreme component between 2005–2007, during extreme pollution days at that location, ranges between 5.8–7.6 µg/m³ (19–25 days) with flag_p95, and between 7.6–9.1 µg/m³ (13–19 days) with flag_2SD_remainder. We estimate these extreme event days contribute 0.27–0.30 µg/m³ (3.6–6.8%) and 0.27–0.47 µg/m³ (3.6–5.2%) to annual average PM_2.5 when using flag_p95 and flag_2SD_remainder, respectively. It is probable that most of the extreme PM_2.5 component during that period (2005–2007) may be attributed to the use of wood heater-related event days (for example, due to temperature inversions or protracted cold periods with light winds) during winter [66], but as can be seen in Figure 6A,B, these values are also influenced by bushfire episodes in the summer of 2006/2007 [15].

In Darwin (Supplementary Table S8), between 2011–2013, the mean PM_2.5 extreme component (during extreme pollution days) ranges between 8.4–8.8 µg/m³ (10–38 days) for flag_p95 and between 9.9–12.3 µg/m³ (8–24 days) with flag_2SD_remainder. Represented as annual PM_2.5, this is equivalent to 0.24–0.91 µg/m³ and 0.22–0.81 µg/m³ when using flag_p95 and flag_2SD_remainder, respectively. Savanna burning usually takes place during the early and late dry seasons (May to October), which coincides with these episodes. Similar to the case of Launceston, the dashed blue line in Figure 6A shows how increases in PM_2.5 during the dry season (May–October) is considered part of the usual pattern of background air pollution during this period (i.e., not attributable to extreme air pollution episodes). Once again, the seasonal decomposition is still able to identify unusually high or extreme air pollution events during these seasonal increases, likely due to metorological conditions, or particularly high smoke events due to anomalies in the regular savanna burning in the region.

For Sydney (Supplementary Table S9), which is affected by air pollution from a wide range of emissions sources, we estimate that the mean PM_2.5 extreme component is between 9–10.7 µg/m³ (15–17 days) with flag_p95 and 10.4–11.8 µg/m³ (10–14 days) with flag_2SD_remainder for the period 2017–2018, likely primarily due to prescribed burning events in autumn (March–May) and spring (September–November) and the use of wood heaters during winter (June–August). During late 2019 and early 2020 southeastern Australia was affected by devastating bushfires, and the mean PM_2.5 extreme component (on extreme pollution days) in Sydney is estimated at 22.9 and 24.3 µg/m³, respectively, for flag_p95 and flag_2SD_remainder for 2019. We estimate that extreme pollution events in Sydney contributed 3.57 and 3.5 µg/m³ to annual PM_2.5 in 2019, using flag_p95 and flag_2SD_remainder, respectively. The diverse range of air pollution sources in Sydney and the resulting minimal seasonal variation in PM_2.5 means that the flags do not define the PM_2.5 components on all days affected by the primary seasonal sources of domestic wood heaters, hazard-reduction burns, and bushfires, but instead identify extreme event days related to these sources.

For illustration purposes, Supplementary Figures S14–S16 show a closer look at how these two flags can be used to identify extreme pollution days and how the extreme pollution PM_2.5 component can be calculated as the difference between daily PM_2.5 and the seasonal + trend values. For these case study locations and times, flag_p95 identifies a larger number of extreme pollution days compared to flag_2SD_remainder in Darwin and Launceston, likely due to the high degree of seasonality in PM_2.5 in these locations. While for Sydney, both flags identify similar days, likely due to the lower PM_2.5 seasonality in Sydney.

With respect to our case study in the remote mountains of Victoria, Supplementary Figure S17 shows an analysis of temporal and spatial patterns of modelled probable fire smoke PM_2.5 in the Alpine National Park and for the State of Victoria. Supplemetary Figure S17A,B shows daily PM_2.5 at the Alpine National Park using both event threshold identification methods (flag_p95 and flag_2SD_remainder), while Supplementary Figure S17C,D shows mean PM_2.5 during days flagged with probable smoke. Our model identified extremely high PM_2.5 during this period and across the region. This was verified by the results of an independent report that drew on the results of a chemical transport model (CTM), which also assessed this event (Cope, Martin, CSIRO. 2023. e-mail message to author, October 5).

4. Discussion

This study is the first to estimate daily PM_2.5 at a high spatial resolution (5 km × 5 km grid cells) across continental Australia (for a 20-year period). The study also applies a seasonal decomposition to the daily PM_2.5 estimates and develops an approach to partition these estimates into a seasonal PM_2.5 component and a PM_2.5 component attributable to extreme air pollution events. We illustrate the application of this approach to four distinctive locations: (a) Launceston, a medium-sized town located in the north of Tasmania that is seasonally affected by woodsmoke during the cooler months as well as sporadic summer bushfires [67]; (b) Darwin, a city in the north of the country affected by seasonal savanna burning during the dry season (May to October) [68]; (c) Sydney, a large metropolitan city with a diverse range of air pollution sources that includes seasonal wood smoke and intermittent controlled burning (to reduce the risk of uncontrolled bushfires), and was heavily affected by extreme PM_2.5 due to the unprecedented Black Summer Bushfires (2019–2020) [69]; and (d) the remote mountains of Victoria, which were affected by the Eastern Victoria Great Divide Bushfire (1 December 2006 to 6 February 2007) [65]. The utility of the approach we defined to identify the specific source of air pollution responsible for the extreme event at a particular location is influenced by the usual seasonal pattern of air pollution exposure and the usual mix of air pollution emission sources at the location, as well as the nature of the extreme event. For example, major well-publicized bushfires are easier to identify as the source of the event than smoke from a localized controlled burn (to reduce the risk of uncontrolled bushfires).

4.1. Main Findings and Comparison to Other Studies

We estimated daily PM_2.5 with a Random Forest algorithm, achieving a model performance that is comparable to other international experiences [17,24,25]. Schneider et al. [25] developed a Random Forest model (using a multi-stage process) to estimate daily PM_2.5 concentrations at a 1 km grid across Great Britain between 2008–2018 and calculated a mean spatial CV R-squared of 0.658 and a mean temporal CV R-squared of 0.795. Stafoggia et al. [17] also used a Random Forest algorithm to estimate daily particulate matter (PM₁₀ and PM_2.5), nitrogen dioxide, and ozone in Sweden at a 1 km resolution (200 m resolution for Stockholm County) between 2005–2016 and obtained an out-of-bag (OOB) R-squared of 0.69. Reid et al. [24] used an ensemble machine learning model to estimate daily PM_2.5 (at county, zip code, and census tract level) in 11 western states of the United States between 2008–2018 and reported a spatial CV R-squared of 0.66 and a random CV R-squared of 0.73. Our approach, using 37 predictors (selected out of an initial set of 259 predictors), yielded an OOB R-squared of 0.72, a test R-squared of 0.70, a train R-squared of 0.95, and a spatial CV R-squared of 0.58.

We also compared our estimates with other global PM models that include the Australian continent [45,61,62], and with observed PM at monitoring sites in Australia. Our model predictions have a low correlation with other models [45,62] at a daily level (≤0.45), moderate correlation with all three models at a monthly level (>0.54), and a relatively good correlation with a global model by Van Donkelaar et al. [61] at an annual level (0.66). The daily prediction from our model presented here and then aggregated at a monthly and annual level achieved far greater agreement than those of other models when compared to observed PM_2.5 monitor data for Australia (see Supplementary Figures S6–S8).

Our assessment of the PM_2.5 component from extreme pollution days at the four case study locations highlights some limitations of the STL approach to quantifying the PM_2.5 component. This is due to the fact this approach focusses solely on extreme events by design. Our results for Launceston (dominated by emissions from seasonal domestic wood heaters during the cooler months) and Darwin (dominated by emissions from seasonal savanna burning during the dry season) differed from those estimated in some other studies for these locations [41,68,70], but were similar to previous estimates for Sydney (major metropolitan city with a broad range of emissions sources including seasonal wood smoke and sporadic bushfires and prescribed burning activity) [69].

In the case of Launceston, a previous study estimated for the island of Tasmania (where Launceston is the second largest city), that wood heaters contribute an average of 2.7 µg/m³ to annual PM_2.5 (~54% total annual PM_2.5) while landscape fire smoke contributes on average 0.86 µg/m³ (~17% of total PM_2.5) [70]. A more recent study estimated that wood heater emissions contribute 1.24 µg/m³ in Launceston [41]. In this study, we calculated that the PM_2.5 component from extreme events for Launceston, largely influenced by wood heater use, ranges between 0.27–0.47 µg/m³, which is lower than previous estimates. This is to be expected mainly for two reasons. Firstly, the focus of the methods presented here is on extreme pollution events. Secondly, given that wood heater smoke seems to be the primary source of seasonal variation in Launceston, the baseline (seasonal + trend components) contains a substantial seasonal component of that source.

We found a similar outcome with the city of Darwin, located in the Northern Territory. As shown by Jones et al. [68] and from what can be seen from Figure 6B, mean PM_2.5 concentrations during the 6-month dry season (May–Oct) are at least 5 µg/m³ higher than those experienced during the wet season (Nov–April), suggesting that the contribution of seasonal savanna burning to annual PM_2.5 is around 2.5 µg/m³ per year and so is a major source of PM_2.5 in this location. Despite this limitation, Figure 6B shows how extreme pollution days are identified with the methods presented here and how the magnitude of this varies by year (e.g., during 2012 there were more days with higher pollution compared to 2011 and 2013).

For New South Wales, a previous study estimated an average of 10.8 µg/m³ during landscape fire smoke (LFS)-affected days (n = 58 days) for the 2019–2020 fire season, equivalent to 3.97 µg/m³ of mean annual PM_2.5 attributable to LFS [69]. This was an unprecedented fire season, with fire activity that lasted several weeks and blanketed large parts of southeast Australia with smoke, something not experienced in that region in the preceding 20 years [19]. Here, we estimate that during 2019 the annual contribution of extreme pollution episodes, largely dominated by the Black Summer Bushfires, was 3.52 and 3.57 µg/m³ (n = 53 and 57 days) when using flag_p95 and flag_2SD_remainder, respectively. This represents more than half the mean annual total PM_2.5 concentration estimated for Sydney and New South Wales in previous studies [63,71].

Finally, the methods presented here were also able to identify and estimate temporal and spatial variations in PM_2.5 during a bushfire event where there are no monitoring stations. During the summer of 2006–2007, the remote mountains of Victoria were struck by thunderstorms that started various fires across the region [65]. Results from our model show how daily PM_2.5 values surpassed 100 µg/m³, and how PM_2.5 varied across the State, with higher values concentrated around the Alpine National Park. These results were verified with previous results of a CTM (Cope, Martin, CSIRO. 2023. e-mail message to author, October 5).

4.2. Strengths and Limitations

The methods and analyses presented here have some limitations that must be acknowledged. Firstly, we relied on monitoring data managed by governmental entities. In most states, except for New South Wales and Tasmania, we had data for a low number of monitors. Furthermore, during early years it was more common to monitor PM₁₀ and not PM_2.5. To overcome this limitation, we imputed for missing PM_2.5 data in Stage 1, which substantially increased our available data for modelling (~395,000 to >511,000 data points). This could be further improved by exploring the inclusion of non-official monitoring data, such as that from low-cost sensors [72]. Nevertheless, a strength of our study during this stage is that we used data that is publicly available, and therefore the analysis presented here can be replicated and updated to incorporate more data for subsequent years as it becomes available and validated.

There are also some limitations from our modelling and prediction stages (Stage 2 and Stage 3) that should be noted. In Stage 2 (modelling), we used some satellite-derived predictors with relatively high spatial resolution (e.g., population density at a 1 km resolution) but others with coarser spatial resolution (e.g., MERRA-2 products with 0.5° × 0.625° spatial resolution). This could be improved by the inclusion of other high-spatial-resolution (e.g., 1 km) products such as the Multi-Angle Implementation of Atmospheric Correction (MAIAC) aerosol optical depth (AOD). Nevertheless, given the large number of missing values (i.e., due to cloud and/or snow) for the MAIAC AOD, imputation methods need to be explored [73,74] to be able to further improve the spatial resolution of predictions (Stage 3). Other approaches that could be investigated would be to increase spatial resolution in specific locations. For example, Stafoggia et al. [17] predicted pollution concentrations at a 1 km × 1 km spatial resolution with a nested finer grid of 200 m × 200 m for Stockholm County. In Australia, this same approach has been used when running a nested domain chemical transport model with a 27 km grid covering Australia, and 9 km and 3 km grids focusing on regions of higher population density [75]. Therefore, a finer 1 km grid could be used to predict daily PM_2.5 in high-population metropolitan areas. Finally, the predictive performance could potentially be improved through the use of more sophisticated and novel methods, such as ensemble-models or neural networks, among others [57,76,77,78].

During Stage 4, we used a statistical method (STL decomposition) to break down daily PM_2.5 into two components: background PM_2.5 (due to usual seasonally varying emissions) and PM_2.5 attributable to extreme pollution episodes. Nevertheless, this method is not linked to pollution emission sources (i.e., landscape fire activity, dust storms, domestic wood heater use). To overcome this limitation, we created two flags (flag_p95 and flag_2SD_remainder) that identify if daily PM_2.5 and the remainder component (from the STL decomposition) are within the 95th percentile (flag_p95) or two times the standard deviation of the remainder component (flag_2SD_remainder). These flags can identify if the predicted PM_2.5 during that pixel-day is affected by an extreme pollution episode. This identification stage could be further improved by using other flags that help identify the presence of a bushfire or when a location is primarily affected by seasonal emissions from a single source (e.g., wood heater use). Nevertheless, we should emphasize that this method is relatively simple to implement, can be applied to very long-time series even when missing values are present [39], and has proven to be effective for the identification of extreme pollution episodes and the estimation of PM_2.5 attributable to extreme pollution episodes.

Finally, the statistical approach presented here is only able to capture extremes and may not effectively capture the total proportion of PM_2.5 pollution attributable to source-specific emissions, particularly in locations where that source is the main reason for the seasonal trend. In our case studies we observed how these methods adequately capture PM_2.5 deviations during extreme pollution events (i.e., the Eastern Victoria Great Divide Bushfires and the 2019–2020 fire season). In cases where pollution sources are seasonal (i.e., seasonal savanna burning and winter wood heating smoke), this approach is limited. It is therefore essential that users of these data carefully consider how their specific research question or application of the data links the exposure assessment options available in the seasonal decomposition and extreme event flags we provide. Alternatively, users may decide to apply their own flags for focused decomposition for their specific study region/s.

4.3. Policy Implications and Future Research

The outputs of this project have far-reaching potential impacts beyond what is presented here. For example, every year jurisdictions across Australia are required to report a summary of the air pollution measured across different monitoring sites, but many Australian jurisdictions have a very low monitoring density. Predictions from models like ours can help further identify locations across the country that have been presenting high pollution levels but are not being monitored. Also, environmental epidemiological research can greatly benefit from the exposure assessment options provided by the predictions based on this analysis. Across Australia, different research groups have a great interest on further studying the effects of air pollution on a broad range of health outcomes. This includes analyses of short-, medium-, and long-term exposure, and analyses on total population or different sub-groups (e.g., children, elders, indigenous communities, populations with lower socioeconomic status, etc.). The 20 years of daily predictions presented here can greatly support this research. The approach presented here to identify extreme pollution days and calculate the attributable PM_2.5 component will help researchers and policymakers undertake different epidemiological analyses, health impact analysis, and assessments of interventions to reduce air pollution exposure and associated health impacts during extreme events (i.e., the 2019–2020 fire season).

Further development of our modelled PM_2.5 predictions should be considered in the next few years, such as including other predictors with finer spatial scales (e.g., MAIAC AOD), exploring the use of blended approaches that can incorporate outputs from more detailed modelling (e.g., chemical transport models), and the development of nested finer spatial resolutions for higher-population-density locations like the main metropolitan cities (e.g., Sydney, Melbourne). Also, following more recent trends in groundbreaking data-driven research, other alternatives to the STL should be explored. This will allow researchers and users to not only identify days with extreme pollution, but also identify (and calculate the attributable PM_2.5 component) for seasonal pollution events (e.g., wood heater emissions, savanna burning) and for sporadic events that generate pollution within the levels not detected by the STL decomposition (i.e., prescribed burning activities). Recent studies have already started to use machine learning methods to estimate the fire smoke PM_2.5 component [32,79,80]. Finally, it is necessary to have systems, methods, and protocols that can have up-to-date data sets so that new PM_2.5 surfaces can be estimated with low time lags (e.g., at most, 1 year behind).

5. Conclusions

This study predicted daily PM_2.5 and extreme events across continental Australia at a 5 km spatial resolution for a 20-year period using a four-stage process that connected seasonal and trend decomposition using loess (STL) and the Random Forest algorithm. We started out with 259 spatial and spatiotemporal predictors and ended up with a trimmed (37 predictors) and tuned model. Our model showed high model performance (OOB R-squared 71.5%, spatial CV R-squared 58.1%), that increased in the different modelling stages, and compared favorably to results from recent studies using similar methods. We also presented a method—the STL—to decompose daily PM_2.5 into background and PM_2.5 components attributable to extreme pollution episodes, such as bushfires, dust storms, or high-pollution days from wood heater use. We illustrated the use of this method in four distinct case study locations across Australia. The daily PM_2.5 predictions presented here are currently, to our knowledge, the best estimates available for Australia and will help health researchers investigate the short- and long-term health effect of air pollution, including that associated with extreme pollution events, such as bushfires.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/atmos15111341/s1, Table S1. Summary of variables and datasets used for modelling and predicting; Table S2. Groups of variables considered for excluding highly correlated variables; Table S3. PM_2.5 imputation model summary; Figure S1. Variable importance of PM_2.5 imputation model; Figure S2. R-squared on testing set (10% left out) of PM_2.5 imputation model; Figure S3. Variable importance of final model in Stage 2: PM_2.5 modelling (37 variables); Table S4. Daily PM_2.5 model performance summaries for steps 1-4; Table S5. Daily PM_2.5 model performance for steps 1-4 across spatial folds; Table S6. Descriptive statistics of observed (including imputed values) PM2.5 monitoring data vs. modelled PM_2.5 for 2001-2020; Figure S4. Normalised mean bias (NMB) between predicted and observed PM_2.5 for each monitoring site; Figure S5. Normalised mean absolute error (NMAE) between predicted and observed PM_2.5 for each monitoring site; Figure S6. Correlation plots with other PM_2.5 models – daily; Figure S7. Correlation plots with other PM_2.5 models – monthly; Figure S8. Correlation plots with other PM_2.5 models – annual; Figure S9. PM_2.5 prediction results; Figure S10. Mean # of extreme pollution days by year and month calculated for the Northern Territory, Queensland, and Western Australia using flag_p95 and flag_2SD_remainder; Figure S11. Mean # of extreme pollution days by year and month calculated for New South Wales, the Australian Capital Territory, and South Australia using flag_p95 and flag_2SD_remainder; Figure S12. Mean # of extreme pollution days by year and month calculated for Victoria and Tasmania using flag_p95 and flag_2SD_remainder; Figure S13. # of extreme pollution days by year and month calculated for Launceston, Darwin and Sydney using flag_p95 and flag_2SD_remainder; Figure S14. Example of extreme air pollution days between October 2006 – December 2007 in Launceston identified with: (a) 95th percentile and (b) 2SD remainder flags; Figure S15. Example of extreme air pollution days between January 2012 – December 2012 in Darwin identified with: (a) 95th percentile and (b) 2SD remainder flags; Figure S16. Example of extreme air pollution days between July 2019 – June 20 in Sydney identified with: (a) 95th percentile and (b) 2SD remainder flags; Figure S17. Temporal and spatial example of extreme air pollution during the 2006-07 Eastern Victoria Great Divide bushfires in Victoria.

Author Contributions

Conceptualization, N.B.-A., G.G.M. and I.C.H.; methodology, N.B.-A., G.G.M., J.V.B. and I.C.H.; software, N.B.-A., J.V.B., K.G., C.Y. and I.C.H.; validation, M.C. and I.C.H.; formal analysis, N.B.-A. and I.C.H.; investigation, N.B.-A. and I.C.H.; resources, I.C.H.; data curation, N.B.-A., K.G., C.Y. and I.C.H.; writing—original draft preparation, N.B.-A., G.G.M. and I.C.H.; writing—review and editing, N.B.-A., G.G.M., J.V.B., K.G., C.Y., F.H.J., Y.G., M.C. and I.C.H.; visualization, N.B.-A., C.Y., and I.C.H.; supervision, G.G.M. and I.C.H.; project administration, G.G.M. and I.C.H.; funding acquisition, G.G.M. and I.C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was undertaken with the assistance of resources from the Clean Air Research Data and Analysis Tools platform (CARDAT), which is supported by funds from The Centre for Safe Air (CSA; https://safeair.org.au/, accessed on 28 October 2024), which is funded by the National Health and Medical Research Council (2015584), the Curtin WHO Collaborating Centre for Climate Change and Health Impact Assessment, and the Australian Research Data Commons (ARDC) AirHealth Data Bridges project (https://doi.org/10.47486/PS022, accessed on 28 October 2024). The Bushfire Smoke Exposure project received seed funding project support from the CSA, as well as the ARDC Bushfire Data Challenges project (https://ardc.edu.au/project/assessing-the-impact-of-bushfire-smoke-on-health/, accessed on 28 October 2024) and the Australian National Health and Medical Research Council (APP2004514) Ideas Grant (https://www.nhmrc.gov.au/funding/find-funding/ideas-grants, accessed on 28 October 2024) - Bushfire smoke exposure during pregnancy and epigenetic changes in offspring. Nicolas Borchers-Arriagada was supported by a Sohn Hearts and Mind Research Fellowship and a Postdoctoral Research Fellowship from the Menzies Institute for Medical Research, University of Tasmania, Australia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the Clean Air Research Data and Analysis Tools platform (CARDAT), at https://doi.org/10.17605/OSF.IO/WQK4T, accessed on 28 October 2024.

Data Citation

Centre for Safe Air, Bushfire Smoke Team, 2023. Bushfire-specific PM_2.5 output from v1.3 based on satellite and other land use and other predictors for Australia 2001–2020 produced for the CAR Bushfire Smoke Exposures project. Downloaded from the Clean Air Research Data and Analysis Tools (CARDAT) platform (https://cardat.github.io, accessed on 28 October 2024).

Acknowledgments

We acknowledge the HEAL (Healthy Environments and Lives) National Research Network, which receives funding from the National Health and Medical Research Council (Grant No. 2008937) and the Centre for Safe Air, NHMRC Centre for Research Excellence for their support of this research (https://ror.org/04ccf0j10, accessed on 28 October 2024). We also acknowledge the use of data and/or imagery from NASA’s Fire Information for Resource Management System (FIRMS) (https://earthdata.nasa.gov/firms, accessed on 28 October 2024), part of NASA’s Earth Observing System Data and Information System (EOSDIS).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cohen, A.J.; Brauer, M.; Burnett, R.; Anderson, H.R.; Frostad, J.; Estep, K.; Balakrishnan, K.; Brunekreef, B.; Dandona, L.; Dandona, R.; et al. Estimates and 25-Year Trends of the Global Burden of Disease Attributable to Ambient Air Pollution: An Analysis of Data from the Global Burden of Diseases Study 2015. Lancet 2017, 389, 1907–1918. [Google Scholar] [CrossRef] [PubMed]
World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide; World Health Organization: Geneva, Switzerland, 2021; ISBN 9789240034228. [Google Scholar]
Dominici, F.; Zanobetti, A.; Schwartz, J.; Braun, D.; Sabath, B.; Number, X.W. Assessing Adverse Health Effects of Long-Term Exposure to Low Levels of Ambient Air Pollution: Implementation of Causal Inference Methods. Res. Rep. Health Eff. Inst. 2022, 2022, 211. [Google Scholar]
Shaddick, G.; Thomas, M.L.; Mudu, P.; Ruggeri, G.; Gumy, S. Half the World’s Population Are Exposed to Increasing Air Pollution. npj Clim. Atmos. Sci. 2020, 3, 23. [Google Scholar] [CrossRef]
Yu, W.; Ye, T.; Zhang, Y.; Xu, R.; Lei, Y.; Chen, Z.; Yang, Z.; Zhang, Y.; Song, J.; Yue, X.; et al. Global Estimates of Daily Ambient Fine Particulate Matter Concentrations and Unequal Spatiotemporal Distribution of Population Exposure: A Machine Learning Modelling Study. Lancet Planet. Health 2023, 7, e209–e218. [Google Scholar] [CrossRef]
O’Dell, K.; Ford, B.; Fischer, E.V.; Pierce, J.R. Contribution of Wildland-Fire Smoke to US PM 2.5 and Its Influence on Recent Trends. Environ. Sci. Technol. 2019, 53, 1797–1804. [Google Scholar] [CrossRef]
Cascio, W.E. Wildland Fire Smoke and Human Health. Sci. Total Environ. 2018, 624, 586–595. [Google Scholar] [CrossRef] [PubMed]
Chen, G.; Guo, Y.; Yue, X.; Tong, S.; Gasparrini, A.; Bell, M.L.; Armstrong, B.; Schwartz, J.; Jaakkola, J.J.K.; Zanobetti, A.; et al. Mortality Risk Attributable to Wildfire-Related PM2·5 Pollution: A Global Time Series Study in 749 Locations. Lancet Planet. Health 2021, 5, e579–e587. [Google Scholar] [CrossRef] [PubMed]
Ye, T.; Guo, Y.; Chen, G.; Yue, X.; Xu, R.; Coêlho, M.d.S.Z.S.; Saldiva, P.H.N.; Zhao, Q.; Li, S. Risk and Burden of Hospital Admissions Associated with Wildfire-Related PM2·5 in Brazil, 2000–2015: A Nationwide Time-Series Study. Lancet Planet. Health 2021, 5, e599–e607. [Google Scholar] [CrossRef]
Vicedo-Cabrera, A.M.; de Schrijver, E.; Schumacher, D.L.; Ragettli, M.S.; Fischer, E.M.; Seneviratne, S.I. The Footprint of Human-Induced Climate Change on Heat-Related Deaths in the Summer of 2022 in Switzerland. Environ. Res. Lett. 2023, 18, 074037. [Google Scholar] [CrossRef]
Jan Van Oldenborgh, G.; Krikken, F.; Lewis, S.; Leach, N.J.; Lehner, F.; Saunders, K.R.; Van Weele, M.; Haustein, K.; Li, S.; Wallom, D.; et al. Attribution of the Australian Bushfire Risk to Anthropogenic Climate Change. Nat. Hazards Earth Syst. Sci. 2021, 21, 941–960. [Google Scholar] [CrossRef]
Xu, R.; Yu, P.; Abramson, M.J.; Johnston, F.H.; Samet, J.M.; Bell, M.L.; Haines, A.; Ebi, K.L.; Li, S.; Guo, Y. Wildfires, Global Climate Change, and Human Health. N. Engl. J. Med. 2020, 383, 2173–2181. [Google Scholar] [CrossRef] [PubMed]
Filkov, A.I.; Ngo, T.; Matthews, S.; Telfer, S.; Penman, T.D. Impact of Australia’s Catastrophic 2019/20 Bushfire Season on Communities and Environment. Retrospective Analysis and Current Trends. J. Saf. Sci. Resil. 2020, 1, 44–56. [Google Scholar] [CrossRef]
Storey, M.A.; Price, O.F. Comparing the Effects of Wildfire and Hazard Reduction Burning Area on Air Quality in Sydney. Atmosphere 2023, 14, 1657. [Google Scholar] [CrossRef]
Hanigan, I.C.; Morgan, G.G.; Williamson, G.J.; Salimi, F.; Henderson, S.B.; Turner, M.R.; Bowman, D.M.J.S.; Johnston, F.H. Extensible Database of Validated Biomass Smoke Events for Health Research. Fire 2018, 1, 50. [Google Scholar] [CrossRef]
Narayana, M.V.; Jalihal, D.; Shiva Nagendra, S.M. Establishing A Sustainable Low-Cost Air Quality Monitoring Setup: A Survey of the State-of-the-Art. Sensors 2022, 22, 394. [Google Scholar] [CrossRef] [PubMed]
Stafoggia, M.; Johansson, C.; Glantz, P.; Renzi, M.; Shtein, A.; de Hoogh, K.; Kloog, I.; Davoli, M.; Michelozzi, P.; Bellander, T. A Random Forest Approach to Estimate Daily Particulate Matter, Nitrogen Dioxide, and Ozone at Fine Spatial Resolution in Sweden. Atmosphere 2020, 11, 239. [Google Scholar] [CrossRef]
Horsley, J.A.; Broome, R.A.; Johnston, F.H.; Cope, M.; Morgan, G.G. Health Burden Associated with Fire Smoke in Sydney, 2001-2013. Med. J. Aust. 2018, 208, 309–310. [Google Scholar] [CrossRef] [PubMed]
Johnston, F.H.; Borchers-Arriagada, N.; Morgan, G.G.; Jalaludin, B.; Palmer, A.J.; Williamson, G.J.; Bowman, D.M.J.S. Unprecedented Health Costs of Smoke-Related PM2.5 from the 2019–20 Australian Megafires. Nat. Sustain. 2021, 4, 42–47. [Google Scholar] [CrossRef]
Chen, X.; Li, F.; Zhang, J.; Zhou, W.; Wang, X.; Fu, H. Spatiotemporal Mapping and Multiple Driving Forces Identifying of PM2.5 Variation and Its Joint Management Strategies across China. J. Clean. Prod. 2020, 250, 119534. [Google Scholar] [CrossRef]
Knibbs, L.D.; Hewson, M.G.; Bechle, M.J.; Marshall, J.D.; Barnett, A.G. A National Satellite-Based Land-Use Regression Model for Air Pollution Exposure Assessment in Australia. Environ. Res. 2014, 135, 204–211. [Google Scholar] [CrossRef]
Pereira, G.; Lee, H.J.; Bell, M.; Regan, A.; Malacova, E.; Mullins, B.; Knibbs, L.D. Development of a Model for Particulate Matter Pollution in Australia with Implications for Other Satellite-Based Models. Environ. Res. 2017, 159, 9–15. [Google Scholar] [CrossRef] [PubMed]
Matthias, V.; Arndt, J.A.; Aulinger, A.; Bieser, J.; Denier van der Gon, H.; Kranenburg, R.; Kuenen, J.; Neumann, D.; Pouliot, G.; Quante, M. Modeling Emissions for Three-Dimensional Atmospheric Chemistry Transport Models. J. Air Waste Manag. Assoc. 2018, 68, 763–800. [Google Scholar] [CrossRef] [PubMed]
Reid, C.E.; Considine, E.M.; Maestas, M.M.; Li, G. Daily PM2.5 Concentration Estimates by County, ZIP Code, and Census Tract in 11 Western States 2008–2018. Sci. Data 2021, 8, 112. [Google Scholar] [CrossRef] [PubMed]
Schneider, R.; Vicedo-Cabrera, A.M.; Sera, F.; Masselot, P.; Stafoggia, M.; de Hoogh, K.; Kloog, I.; Reis, S.; Vieno, M.; Gasparrini, A. A Satellite-Based Spatio-Temporal Machine Learning Model to Reconstruct Daily PM2.5 Concentrations across Great Britain. Remote Sens. 2020, 12, 3803. [Google Scholar] [CrossRef] [PubMed]
Biau, G.; Scornet, E. A Random Forest Guided Tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, J.; de Hoogh, K.; Gulliver, J.; Hoffmann, B.; Hertel, O.; Ketzel, M.; Bauwelinck, M.; van Donkelaar, A.; Hvidtfeldt, U.A.; Katsouyanni, K.; et al. A Comparison of Linear Regression, Regularization, and Machine Learning Algorithms to Develop Europe-Wide Spatial Models of Fine Particles and Nitrogen Dioxide. Environ. Int. 2019, 130, 104934. [Google Scholar] [CrossRef]
Enebish, T.; Chau, K.; Jadamba, B.; Franklin, M. Predicting Ambient PM2.5 Concentrations in Ulaanbaatar, Mongolia with Machine Learning Approaches. J. Expo. Sci. Environ. Epidemiol. 2021, 31, 699–708. [Google Scholar] [CrossRef]
Ryan, R.G.; Silver, J.D.; Schofield, R. Air Quality and Health Impact of 2019-20 Black Summer Megafires and COVID-19 Lockdown in Melbourne and Sydney, Australia. Environ. Pollut. 2021, 274, 116498. [Google Scholar] [CrossRef]
Hutchinson, J.A.; Vargo, J.; Milet, M.; French, N.H.F.; Billmire, M.; Johnson, J.; Hoshiko, S. The San Diego 2007 Wildfires and Medi-Cal Emergency Department Presentations, Inpatient Hospitalizations, and Outpatient Visits: An Observational Study of Smoke Exposure Periods and a Bidirectional Case-Crossover Analysis. PLoS Med. 2018, 15, e1002601. [Google Scholar] [CrossRef]
Childs, M.L.; Li, J.; Wen, J.; Heft-Neal, S.; Driscoll, A.; Wang, S.; Gould, C.F.; Qiu, M.; Burney, J.; Burke, M. Daily Local-Level Estimates of Ambient Wildfire Smoke PM2.5for the Contiguous US. Environ. Sci. Technol. 2022, 56, 13607–13621. [Google Scholar] [CrossRef] [PubMed]
Larsen, A.; Hanigan, I.; Reich, B.J.; Qin, Y.; Cope, M.; Morgan, G.; Rappold, A.G. A Deep Learning Approach to Identify Smoke Plumes in Satellite Imagery in Near-Real Time for Health Risk Communication. J. Expo. Sci. Environ. Epidemiol. 2020, 31, 170–176. [Google Scholar] [CrossRef]
Cleland, S.E.; Serre, M.L.; Rappold, A.G.; West, J.J. Estimating the Acute Health Impacts of Fire-Originated PM2.5 Exposure During the 2017 California Wildfires: Sensitivity to Choices of Inputs. GeoHealth 2021, 5, e2021GH000414. [Google Scholar] [CrossRef] [PubMed]
Jegasothy, E.; Hanigan, I.C.; Van Buskirk, J.; Morgan, G.G.; Jalaludin, B.; Johnston, F.H.; Guo, Y.; Broome, R.A. Acute Health Effects of Bushfire Smoke on Mortality in Sydney, Australia. Environ. Int. 2023, 171, 107684. [Google Scholar] [CrossRef]
Magzamen, S.; Gan, R.W.; Liu, J.; O’Dell, K.; Ford, B.; Berg, K.; Bol, K.; Wilson, A.; Fischer, E.V.; Pierce, J.R. Differential Cardiopulmonary Health Impacts of Local and Long-Range Transport of Wildfire Smoke. GeoHealth 2021, 5, e2020GH000330. [Google Scholar] [CrossRef] [PubMed]
Augusto, S.; Ratola, N.; Tarín-Carrasco, P.; Jiménez-Guerrero, P.; Turco, M.; Schuhmacher, M.; Costa, S.; Teixeira, J.P.; Costa, C. Population Exposure to Particulate-Matter and Related Mortality Due to the Portuguese Wildfires in October 2017 Driven by Storm Ophelia. Environ. Int. 2020, 144, 106056. [Google Scholar] [CrossRef] [PubMed]
Kollanus, V.; Tiittanen, P.; Niemi, J.V.; Lanki, T. Effects of Long-Range Transported Air Pollution from Vegetation Fires on Daily Mortality and Hospital Admissions in the Helsinki Metropolitan Area, Finland. Environ. Res. 2016, 151, 351–358. [Google Scholar] [CrossRef]
Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
Morawska, L.; Zhu, T.; Liu, N.; Amouei Torkmahalleh, M.; de Fatima Andrade, M.; Barratt, B.; Broomandi, P.; Buonanno, G.; Carlos Belalcazar Ceron, L.; Chen, J.; et al. The State of Science on Severe Air Pollution Episodes: Quantitative and Qualitative Analysis. Environ. Int. 2021, 156, 106732. [Google Scholar] [CrossRef]
Borchers-Arriagada, N.; Vander Hoorn, S.; Cope, M.; Morgan, G.; Hanigan, I.; Williamson, G.; Johnston, F.H. The Mortality Burden Attributable to Wood Heater Smoke Particulate Matter (PM2.5) in Australia. Sci. Total Environ. 2024, 921, 171069. [Google Scholar] [CrossRef]
Centre for Safe Air, 2021. National Air Pollution Monitoring Database, Derived from Regulatory Monitor Data from NSW DPE, Vic EPA, Qld DES, SA EPA, WA DWER, Tas EPA, NT EPA, and ACT Health. Downloaded from the Centre for Safe Air. Available online: https://cardat.github.io/data_inventory/cars_national_air_pollution_database.html (accessed on 15 October 2020).
Riley, M.; Kirkwood, J.; Jiang, N.; Ross, G.; Scorgie, Y. Air Quality Monitoring in NSW: From Long Term Trend Monitoring to Integrated Urban Services. Air Qual. Clim. Change 2020, 54, 44–51. [Google Scholar]
Moritz, S. ImputeTS: Time Series Missing Value Imputation. R J. 2017, 9, 207. [Google Scholar] [CrossRef]
Buchard, V.; Randles, C.A.; da Silva, A.M.; Darmenov, A.; Colarco, P.R.; Govindaraju, R.; Ferrare, R.; Hair, J.; Beyersdorf, A.J.; Ziemba, L.D.; et al. The MERRA-2 Aerosol Reanalysis, 1980 Onward. Part II: Evaluation and Case Studies. J. Clim. 2017, 30, 6851–6872. [Google Scholar] [CrossRef] [PubMed]
Buchard, V.; da Silva, A.M.; Randles, C.A.; Colarco, P.; Ferrare, R.; Hair, J.; Hostetler, C.; Tackett, J.; Winker, D. Evaluation of the Surface PM2.5 in Version 1 of the NASA MERRA Aerosol Reanalysis over the United States. Atmos. Environ. 2016, 125, 100–111. [Google Scholar] [CrossRef]
Giglio, L.; Boschetti, L.; Roy, D.P.; Humber, M.L.; Justice, C.O. The Collection 6 MODIS Burned Area Mapping Algorithm and Product. Remote Sens. Environ. 2018, 217, 72–85. [Google Scholar] [CrossRef]
NASA_FIRMS MODIS Collection 61 NRT Hotspot/Active Fire Detections MCD14DL Distributed from NASA FIRMS. 2021. Available online: https://Earthdata.Nasa.Gov/Firms (accessed on 15 October 2020).
Australian Bureau of Statistics Regional Population. Available online: https://www.abs.gov.au/statistics/people/population/regional-population/latest-release (accessed on 25 October 2022).
Bureau of Meteorology Gridded Climatology Data. Available online: http://www.bom.gov.au/climate/averages/climatology/gridded-data-info/gridded_datasets_summary.shtml (accessed on 15 October 2020).
Muñoz Sabater, J. (2019): ERA5-Land Hourly Data from 1950 to Present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). Available online: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land?tab=overview (accessed on 15 October 2020).
Beguería, S.; Vicente-Serrano, S.M.; Reig, F.; Latorre, B. Standardized Precipitation Evapotranspiration Index (SPEI) Revisited: Parameter Fitting, Evapotranspiration Models, Tools, Datasets and Drought Monitoring. Int. J. Climatol. 2014, 34, 3001–3023. [Google Scholar] [CrossRef]
Vicente-Serrano, S.M.; Beguería, S.; López-Moreno, J.I. A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index. J. Clim. 2010, 23, 1696–1718. [Google Scholar] [CrossRef]
Beguería, S.; Vicente-Serrano, S.M. SPEI: Calculation of the Standardized Precipitation-Evapotranspiration Index. Available online: https://spei.csic.es (accessed on 15 September 2024).
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. [Google Scholar]
Wright, M.N.; Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef]
Wager, S.; Hastie, T.; Efron, B. Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife. J. Mach. Learn. Res. 2014, 15, 1625. [Google Scholar]
Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Casciaro, G.; Cavaiola, M.; Mazzino, A. Calibrating the CAMS European Multi-Model Air Quality Forecasts for Regional Air Pollution Monitoring. Atmos. Environ. 2022, 287, 119259. [Google Scholar] [CrossRef]
Van Donkelaar, A.; Hammer, M.S.; Bindle, L.; Brauer, M.; Brook, J.R.; Garay, M.J.; Hsu, N.C.; Kalashnikova, O.V.; Kahn, R.A.; Lee, C.; et al. Monthly Global Estimates of Fine Particulate Matter and Their Uncertainty. Environ. Sci. Technol. 2021, 55, 15287–15300. [Google Scholar] [CrossRef]
Inness, A.; Ades, M.; Agustí-Panareda, A.; Barré, J.; Benedictow, A.; Blechschmidt, A.-M.; Dominguez, J.J.; Engelen, R.; Eskes, H.; Flemming, J.; et al. The CAMS Reanalysis of Atmospheric Composition. Atmos. Chem. Phys. 2019, 19, 3515–3556. [Google Scholar] [CrossRef]
Broome, R.A.; Powell, J.; Cope, M.E.; Morgan, G.G. The Mortality Effect of PM2.5 Sources in the Greater Metropolitan Region of Sydney, Australia. Environ. Int. 2020, 137, 105429. [Google Scholar] [CrossRef]
Hertzog, L.; Morgan, G.G.; Yuen, C.; Gopi, K.; Pereira, G.F.; Johnston, F.H.; Cope, M.; Chaston, T.B.; Vyas, A.; Vardoulakis, S.; et al. Mortality Burden Attributable to Exceptional PM2.5 Air Pollution Events in Australian Cities: A Health Impact Assessment. Heliyon 2024, 10, e24532. [Google Scholar] [CrossRef]
Victoria. Department of Sustainability and Environment The Victorian Great Divide Fires 2006-07/A Narrative Prepared by: David Flinn, Kevin Wareing and David Wadsley for Fire and Emergency Management, Department of Sustainability and Environment. Available online: https://nla.gov.au/nla.cat-vn4668582 (accessed on 28 October 2024).
Reisen, F.; Meyer, C.P.M.; Keywood, M.D. Impact of Biomass Burning Sources on Seasonal Aerosol Air Quality. Atmos. Environ. 2013, 67, 437–447. [Google Scholar] [CrossRef]
Johnston, F.H.; Hanigan, I.C.; Henderson, S.B.; Morgan, G.G. Evaluation of Interventions to Reduce Air Pollution from Biomass Smoke on Mortality in Launceston, Australia: Retrospective Analysis of Daily Mortality, 1994-2007. BMJ 2013, 346, e8446. [Google Scholar] [CrossRef]
Jones, P.J.; Furlaud, J.M.; Williamson, G.J.; Johnston, F.H.; Bowman, D.M.J.S. Smoke Pollution Must Be Part of the Savanna Fire Management Equation: A Case Study from Darwin, Australia. Ambio 2022, 51, 2214–2226. [Google Scholar] [CrossRef]
Borchers-Arriagada, N.; Bowman, D.M.J.S.; Price, O.; Palmer, A.J.; Samson, S.; Clarke, H.; Sepulveda, G.; Johnston, F.H. Smoke Health Costs and the Calculus for Wildfires Fuel Management: A Modelling Study. Lancet Planet. Health 2021, 5, e608–e619. [Google Scholar] [CrossRef]
Borchers-Arriagada, N.; Palmer, A.J.; Bowman, D.M.J.S.; Williamson, G.J.; Johnston, F.H. Health Impacts of Ambient Biomass Smoke in Tasmania, Australia. Int. J. Environ. Res. Public Health 2020, 17, 3264. [Google Scholar] [CrossRef] [PubMed]
Hanigan, I.C.; Broome, R.A.; Chaston, T.B.; Cope, M.; Dennekamp, M.; Heyworth, J.S.; Heathcote, K.; Horsley, J.A.; Jalaludin, B.; Jegasothy, E.; et al. Avoidable Mortality Attributable to Anthropogenic Fine Particulate Matter (Pm2.5) in Australia. Int. J. Environ. Res. Public Health 2021, 18, 254. [Google Scholar] [CrossRef] [PubMed]
Bi, J.; Wildani, A.; Chang, H.H.; Liu, Y. Incorporating Low-Cost Sensor Measurements into High-Resolution PM2.5 Modeling at a Large Spatial Scale. Environ. Sci. Technol. 2020, 54, 2152–2162. [Google Scholar] [CrossRef] [PubMed]
Pu, Q.; Yoo, E.H. A Gap-Filling Hybrid Approach for Hourly PM2.5 Prediction at High Spatial Resolution from Multi-Sourced AOD Data. Environ. Pollut. 2022, 315, 120419. [Google Scholar] [CrossRef]
Pu, Q.; Yoo, E.H. Ground PM2.5 Prediction Using Imputed MAIAC AOD with Uncertainty Quantification. Environ. Pollut. 2021, 274, 116574. [Google Scholar] [CrossRef]
Boulter, P.; Cope, M.; Hanigan, I.; Chaston, T.; Morgan, G.; Kulkarni, K.; Noonan, J.; Vander Hoorn, S. Towards the Regulation of Non-Road Diesel Emissions in Australia—A National Impact Pathway Model. Air Qual. Clim. Change 2023, 57, 16–23. [Google Scholar]
Di, Q.; Amini, H.; Shi, L.; Kloog, I.; Silvern, R.; Kelly, J.; Sabath, M.B.; Choirat, C.; Koutrakis, P.; Lyapustin, A.; et al. An Ensemble-Based Model of PM2.5 Concentration across the Contiguous United States with High Spatiotemporal Resolution. Environ. Int. 2019, 130, 104909. [Google Scholar] [CrossRef]
de Hoogh, K.; Héritier, H.; Stafoggia, M.; Künzli, N.; Kloog, I. Modelling Daily PM2.5 Concentrations at High Spatio-Temporal Resolution across Switzerland. Environ. Pollut. 2018, 233, 1147–1154. [Google Scholar] [CrossRef]
Huang, C.; Hu, J.; Xue, T.; Xu, H.; Wang, M. High-Resolution Spatiotemporal Modeling for Ambient PM2.5Exposure Assessment in China from 2013 to 2019. Environ. Sci. Technol. 2021, 55, 2152–2162. [Google Scholar] [CrossRef]
Aguilera, R.; Corringham, T.; Gershunov, A.; Benmarhnia, T. Wildfire Smoke Impacts Respiratory Health More than Fine Particles from Other Sources: Observational Evidence from Southern California. Nat. Commun. 2021, 12, 1493. [Google Scholar] [CrossRef]
Raffuse, S.; Neill, S.O.; Schmidt, R. A Model for Rapid Wildfire Smoke Exposure Estimates Using Routinely-Available Data—Rapidfire v0.1.3. EGUsphere 2023, 1–26. [Google Scholar] [CrossRef]

Figure 1. Summarized methods.

Figure 2. Australian State/Territories with 185 PM monitoring stations (black dots), and climate (temperature/humidity) zones as defined by the Australian Bureau of Meteorology (http://www.bom.gov.au/climate/maps/averages/climate-classification/ (accessed on 15 September 2024)). NT = Northern Territory, QLD = Queensland, NSW = New South Wales, ACT = Australian Capital Territory, TAS = Tasmania, VIC = Victoria, SA = South Australia, WA = Western Australia.

Figure 3. R-squared between predicted and observed PM_2.5: (A) out of bag—daily, (B) testing set—daily, (C) training set—daily, (D) complete dataset—daily, (E) complete dataset—monthly, and (F) complete dataset—annual. NOTE: dashed red line represents the identity function.

Figure 4. Correlation (Pearson’s) between predicted and observed PM_2.5 for each monitoring site.

Figure 5. PM_2.5 prediction results (μg/m³): (a) mean PM_2.5 concentrations Jan 2001–June 2020, (b) standard deviation (SD) of PM_2.5 concentrations Jan 2001–June 2020, (c) Population-weighted mean PM_2.5 concentration by financial year 2001–2020 and State/Territory. (*) A financial year starts on July 1 and ends on June 30 (i.e., the 2001 financial year runs from 1 July 2001 to 30 June 2002).

Figure 6. Example of extreme air pollution days identified with 95th percentile and 2SD remainder flags for: (A,B) Launceston 2005–2007 showing three winter smoke seasons and the 2006/2007 bushfire season, (C,D) Darwin 2011–2013 showing impact on smoke during three dry seasons, and (E,F) Sydney July 2017 to June 2020 showing three winter smoke seasons and the devastating 2019/2020 Black summer bushfires. NOTES: (1) For comparison purposes, panels (E,F) y-axis do not show values above 50 µg/m³. (2) For illustration purposes “seasonal + trend” and “2SD remainder + seasonal + trend” time series have been smoothed, and probable smoke days flagged with these.

Table 1. Descriptive statistics of PM₁₀ and PM_2.5 monitoring data for 2000–2020.

State/Territory	PM₁₀				PM_2.5				PM_2.5 (Including Imputed Values)
State/Territory	N Sites	N Obs	p50	p5–p95	N Sites	N Obs	p50	p5–p95	N Sites	N Obs	p50	p5–p95
ACT	3	8236	9.77	3–25.5	3	8073	5.43	1.6–20.2	3	10,492	5.33	1.7–19.4
NSW	84	193,291	16.03	5.7–38	84	130,899	6.13	1.4–16.4	84	195,390	6.11	1.9–15.2
NT	3	6949	17.66	6.9–40.2	3	6948	6.06	0.9–22.6	3	6953	6.06	0.9–22.6
QLD	31	72,611	14.78	6.4–32.9	31	56,639	5.00	1.7–13.2	31	73,027	5.33	1.9–12.9
SA	8	11,556	17.25	8.1–36.3	8	15,195	6.50	3.1–12	8	16,337	6.35	3–11.9
TAS	1	2043	14.30	7.3–33.9	35	111,279	2.70	0–17.4	35	112,635	2.80	0–17.3
VIC	9	40,262	15.71	7.6–34.9	13	28,159	5.71	1.9–14.4	13	52,285	6.05	2.5–13.9
WA	8	33,887	15.65	8–30.7	8	37,694	7.20	3.8–14	8	44,307	7.25	3.9–14
National	147	368,835	15.58	6.1–35.9	185	394,886	5.29	0.9–15.8	185	511,426	5.56	1.1–15

p5 = 5th percentile, p50 = 50th percentile, p95 = 95th percentile.

Table 2. # of days flagged as extreme pollution by year and month.

Year	(95 Pct)		(2SD Remainder)		(95 Pct + 2SD Remainder)
Year	# of Extreme Pollution Days (Mean)	# of Extreme Pollution Days (Popw Mean)	# of Extreme Pollution Days (Mean)	# of Extreme Pollution Days (Popw Mean)	# of Extreme Pollution Days (Mean)	# of Extreme Pollution Days (Popw Mean)
2001	9.6	13.2	13.2	15.0	14.0	16.5
2002	8.9	22.6	12.7	23.3	13.1	26.3
2003	6.1	22.0	8.1	23.2	8.5	25.1
2004	6.7	9.8	8.5	10.8	9.3	12.5
2005	2.9	7.6	4.9	10.2	5.3	11.4
2006	7.8	15.1	11.0	19.1	11.5	19.9
2007	5.5	7.9	8.0	9.8	8.5	11.1
2008	2.8	4.9	4.1	6.9	4.4	7.5
2009	8.4	22.0	12.4	25.4	12.8	26.9
2010	2.6	5.7	4.0	8.9	4.2	9.6
2011	16.9	10.3	22.8	14.2	23.4	15.1
2012	18.7	9.2	23.4	12.7	24.5	13.7
2013	6.6	13.1	9.0	15.6	9.6	17.4
2014	6.9	10.8	9.6	14.0	10.2	15.3
2015	6.4	9.6	10.2	13.2	10.6	14.2
2016	2.9	11.1	4.5	12.7	4.7	14.2
2017	5.9	16.0	8.5	17.0	8.7	19.1
2018	6.8	16.3	8.8	17.5	9.4	19.6
2019	15.8	38.5	18.9	40.2	20.0	43.6
2020 (*)	5.8	18.5	6.4	19.3	6.9	20.3

(*) PM_2.5 values were only calculated until 30 June 2020. Popw = population weighted. NOTE: Results presented in this table are descriptive statistics only, and no statistical comparisons have been made.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Borchers-Arriagada, N.; Morgan, G.G.; Van Buskirk, J.; Gopi, K.; Yuen, C.; Johnston, F.H.; Guo, Y.; Cope, M.; Hanigan, I.C. Daily PM_2.5 and Seasonal-Trend Decomposition to Identify Extreme Air Pollution Events from 2001 to 2020 for Continental Australia Using a Random Forest Model. Atmosphere 2024, 15, 1341. https://doi.org/10.3390/atmos15111341

AMA Style

Borchers-Arriagada N, Morgan GG, Van Buskirk J, Gopi K, Yuen C, Johnston FH, Guo Y, Cope M, Hanigan IC. Daily PM_2.5 and Seasonal-Trend Decomposition to Identify Extreme Air Pollution Events from 2001 to 2020 for Continental Australia Using a Random Forest Model. Atmosphere. 2024; 15(11):1341. https://doi.org/10.3390/atmos15111341

Chicago/Turabian Style

Borchers-Arriagada, Nicolas, Geoffrey G. Morgan, Joseph Van Buskirk, Karthik Gopi, Cassandra Yuen, Fay H. Johnston, Yuming Guo, Martin Cope, and Ivan C. Hanigan. 2024. "Daily PM_2.5 and Seasonal-Trend Decomposition to Identify Extreme Air Pollution Events from 2001 to 2020 for Continental Australia Using a Random Forest Model" Atmosphere 15, no. 11: 1341. https://doi.org/10.3390/atmos15111341

APA Style

Borchers-Arriagada, N., Morgan, G. G., Van Buskirk, J., Gopi, K., Yuen, C., Johnston, F. H., Guo, Y., Cope, M., & Hanigan, I. C. (2024). Daily PM_2.5 and Seasonal-Trend Decomposition to Identify Extreme Air Pollution Events from 2001 to 2020 for Continental Australia Using a Random Forest Model. Atmosphere, 15(11), 1341. https://doi.org/10.3390/atmos15111341

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu