1. Introduction
Air pollution is affecting the global climate, ecosystems and human health [
1,
2]. It is responsible for millions of deaths all over the world [
3]. Atmospheric pollution impacts human health, particularly in urban environments [
4]. The concentration of pollutants corresponds to the population distribution among areas due to human activities [
5,
6]. One of the most important atmospheric components that has a direct relationship with pollution is nitrogen dioxide (NO
2), which is released mainly from diesel and petrol engines as reported in [
7], with road transportation contributing approximately 40% of the land-based NOx emissions in European countries. Nitrogen dioxide (NO
2) is one of the most active gaseous pollutants emitted in the industrial era and is highly correlated with human industrial activities.
Two main meteorological components, wind speed and wind direction, are directly influencing the dispersion of highly concentrated pollutants, which are emitted within the atmosphere. Thus, low and uniform wind speeds are favorable conditions for gaseous pollutant accumulation near their source [
8], while high and turbulent winds are responsible for gaseous pollutant dispersion.
Nitrogen oxide (NOx) is the generic name of a gaseous mixture normally containing various amounts of nitrogen and oxygen and represents a “family” of compounds (N
2O, NO
2, NO, N
2O
3, N
2O
2, N
2O
4, N
2O
5) [
9] with NO
2 and NO as the main components. The main NOx quantity found in the atmosphere is produced by human activity, and their main action path is by actively participating in acid rain formation through interacting with atmospheric gasses and forming HNO
3. Moreover, these rains actively contribute to the accumulation of nitrates in the soil [
10]. Another action path of the NOx is its contribution to smog formation and in decreasing water quality.
The main components of NOx are usually monitored separately, since both of them (NO and NO
2) cause specific issues. Thus, NO is a precursor for O
3 depletion within the upper layers of the atmosphere, and the main sources of NO emissions are jet engines. Moreover, even though NO does not directly affect the environment, it participates in the formation of nitric acid and particulate nitrate [
11].On the other hand, NO
2 acts as a “catalyst” between NO and O
3 and the main precursor for nitric acid formation. NO
2 is mainly produced by NO oxidation within the atmospheric layers. There are some sources of NO
2 formation, such as the burning of fossil fuels and biomass and diesel engines. Within urban areas, NO
2 concentrations vary between 0.1 and 0.25 ppm [
12]. It is to be mentioned that NO
2 is four times more toxic than NO and children are primarily affected by it. NO
2 is very toxic for living creatures [
13]. During exposure to NO
2, the lungs are drastically affected and high concentrations may be fatal. People exposed to low NO
2 concentrations but for long periods of time may suffer from respiratory issues [
14]; therefore, NO
2 levels within the atmospheric environment are regulated and drastically monitored [
15,
16].
Tracking NO2 emissions and predicting their concentrations represent important steps toward controlling pollution and setting rules to protect people’s health indoors, such as in factories and in outdoor environments.
Air pollution forecasting techniques include numerical models and statistical models [
17]. The numerical models achieve the simulation of the transformation and diffusion of air pollutants and reflect their change law. However, they are based on a large amount of meteorological information, air pollutant discharge source data and atmospheric monitoring data. The models need to master the mechanism of pollution change, and the calculation time is long [
18].
NO
2 concentration prediction is a nonlinear, multivariable problem with strong coupling between predictors, so NO
2 numerical forecasting will be an extraordinarily complex systems engineering problem. Statistical models are widely used in operational prediction, due to the advantages of their easy calculation, low data requirements and high precision. Nevertheless, most statistical models align with linear regression theory; assuming that there is a nonlinear relationship between pollutant concentration and weather conditions, linear regression is difficult to be applied to nonlinear, strongly coupled systems [
19]. For air quality prediction, LSTM-based models can produce better performance than statistical models such as the ordinary least-squares regression and the Bayesian ridge regression, or machine learning models such as support vector regression, multilayer perceptron and random forest regression [
20]. The same superiority of the LSTM is also with respect to ARIMA [
21]. When comparing the most important recurrent neural network architectures (standard, LSTM and GRU)for air quality prediction, it is found that GRU exceeds the performance of LSTM and standard networks [
22].Moreover, a survey on machine learning algorithms used for air quality prediction showed that LSTM and multilayer perceptron are the most used models for such tasks [
23].
Artificial intelligence (AI) techniques have been extensively applied in a variety of research areas [
24,
25].
Regarding the use of machine learning (ML) for air quality prediction, there are many studies on how related techniques are used. The studies discuss various types of pollutants such as PM
10, PM
2.5, NO, NO
2, etc. Another approach is to build hybrid models consisting of core statistics and machine learning-based models such as WANN, whereby the wavelet transform is applied prior to feeding the data into an artificial neural network [
26].
In [
27], a model using artificial neural networks (ANNs) was developed to forecast the pollutant concentration of PM
10, PM
2.5, NO
2, and O
3 for the current day and subsequent 4 days in a highly polluted region (32 different locations in Delhi). The model was trained using meteorological parameters and hourly pollution concentration data for the year 2018, and then used for generating air quality forecasts in real-time. In [
28], the authors developed new machine learning models, namely random forest (RF) and support vector regression (SVR), to estimate PM2.5 concentrations across Malaysia for the first time, covering the years 2018 and 2019.
For gaseous pollutants such as NO or NO
2, modeling should accommodate high levels of variability and nonlinearity. Therefore, the concept of artificial neural networks (ANNs) is used. The authors of [
29] had developed the so-called multiplayer perception (MLP) artificial network in order to model NO and NO
2 pollution in London, and the main conclusion was that the pollutant’s variations can be modeled by using the time of day and the day of the week as input variables. Moreover, the effectiveness of ANN and MLP was assessed by performing a sensitivity analysis following three predefined scenarios [
30]. The main conclusion of the study was that the calculated values within all three scenarios were similar to the values measured onsite.
MLP was also used to model NO [
31], NO
2 [
32] and O
3 [
33] within a port area (Shanghai port) and in a city area (Zagreb). The obtained results were similar with the measured concentrations of the above mentioned gaseous pollutants.
Another useful characteristic of MLP is that it can allow forecasts concerning gaseous pollution to be drafted, as shown in [
34], where a three-day forecast regarding NO
2 and O
3 pollution was drafted for the city of Athens. A study that compared the performances of MLP and linear regression [
35] was drafted for NO
2 and O
3. The obtained results proved to be very good in terms of predicting pollutions, using the vector regression as support, as demonstrated within [
36,
37]. Other studies also searched for the best model for series forecasting. They utilized various types of tools from support vector regression (SVR), time series fuzzy inference system (TSFIS) and MLP for the prediction of NOx and O
3. Other researchers used generalized regression neural networks (GRNN), SVR, MLP and radial basis function (RBF) neural networks for predicting the NO
2 in urban areas [
38,
39,
40].
Complex studies such as that of [
26] used a mixture of three methods and one test. The method consists of interquartile range (IQR), isolation forest and local outlier factor (LOF), while the test is the generalized extreme standardized deviate (GESD) test. The models built within paper [
26] are autoregressive integrated moving average (ARIMA), generalized regression neural networks (GRNN) and hybrid ARIMAGRNN, and their processing was possible after the removal of aberrant values. The main results of the study emphasized that the first approach produced the best performance results in terms of statistic modeling, but, nevertheless, the best model was obtained by the hybrid ARIMAGRNN.
The scope of this paper is to conceptualize and build a machine learning-based model to predict the hourly levels of NO2 in one selected location in Bucharest where historical data are available. Among the existing machine learning techniques, artificial neural networks represent one of the best solutions for this type of predictive task. LSTM and GRU are architectures designed for time series data, having in place the right mechanisms for capturing long-term and short-term dependencies in data. To ensure that the approach and the results are meaningful in terms of performance, but also for the professionals and scientific community, the theoretic fundamentals and model evaluation are conducted in such a way that they can be replicated or compared with other similar research.
2. Methodology
Bucharest [
41,
42], the capital of Romania, is the largest Romanian city and is the country’s main political, administrative, economic, financial, educational, scientific and cultural center. It is located in the SE part of the country, on the banks of the Dâmbovița River, less than 60 km (37.3 mi) North of the Danube River and the Bulgarian border. Bucharest has either a continental or a humid subtropical climate, with hot, humid summers and cold, snowy winters. Due to its position on the Romanian Plain, the city’s winters can be windy, although some of the winds are mitigated due to urbanization. Winter temperatures often drop below 0 °C, sometimes even to −20 °C. During summer, the average temperature is 23 °C (the average for July and August). Temperatures frequently reach 35 to 40 °C in midsummer in the city center. Although the average precipitation and humidity during the summer are low, occasional heavy storms occur. During spring and autumn, daytime temperatures vary between 17 and 22 °C, and precipitation during spring tends to be higher than in summer, with more frequent yet milder periods of rain. Bucharest has a relatively developed industrial area at its suburbs, and household heating is still dependent on large thermo-energetic plants even though a percentage of households have their own heating system. Traffic is the most important source of air quality degradation in the capital, according to the Research Report on the State of the Environment in Bucharest. More specifically, 80% of air pollution in the metropolis comes from traffic. The contribution made by road traffic is 90% of carbon monoxide emissions, 59% of nitrogen oxide emissions, 45% of volatile organic compounds and 95% of lead emissions, according to the report recently released by the Environment Platform for Bucharest. It is not surprising, given that there are approximately 1.84 million vehicles in the city, of which 1.5 million are registered in Bucharest. A total of 80% of these are cars, of which more than half are more than 12 years old. Less than a quarter of personal cars can be considered new—less than four years old—and 43% are diesel, according to data from the same study [
43].
Overall, all these factors contribute to a significant increase in pollution, especially during the cold season.
2.1. Air Quality Data
There are 41 centers where the National Air Quality Monitoring Network from Romania collects data which is then transmitted and validated at the Air Quality Assessment Centre of National Agency for Environmental Protection. Specific laws regulate the gaseous pollutants’ concentrations and allows the classification of agglomerations within 3 different classes (A, B or C) based on pollution measurements and assessment. The measured concentrations obtained from the measuring stations of the above mentioned network are mathematically modeled in order to assess the dispersion of the gaseous pollutants.
Law 104/2001 [
35,
44] sets the limits for various pollutants as follows: NOx/NO
2, alert threshold—400 µg/m
3; hourly limit for human health protection: 200 µg/m
3; annual average limit for human health protection: 40 µg/m
3; annual average limit for vegetation protection: 30 µg/m
3.
Following the worldwide trend stated in [
45], the air quality of the largest Romanian cities, i.e., Bucharest, Cluj-Napoca, Timisoara, etc., has been decreasing each year [
46,
47,
48,
49,
50].
Figure 1 shows the air quality monitoring stations around Bucharest. It is to be mentioned that only stations B3 and B6 are traffic-type stations.
The monitoring stations within Bucharest’s administrative area, which measure NOx, are the following: B-1 urban background stations /urban, B-2 industrial/urban, B-3 traffic/urban, B-4 industrial/urban, B-5 industrial/urban, B-6 traffic/urban and B-9 urban background stations/urban; only B-3 and B-6 are traffic measuring stations. Measuring station B-9 has not recorded any NOx values within the last 5 years, so station B-9 was not taken into account. Only the stations B-1–B-6 and B-9 monitor the levels of NOx within the Bucharest area.
Table 1 shows the average measured values of the last 5 years.
As it can be observed in
Table 1, traffic measuring stations have recorded average values above the enforced limit of 40 µg/m
3 except for the year 2022. This may be associated with the post-pandemic period and the enforced regulations regarding the vehicles’ movements and the improvement of Bucharest’s air quality. It is well known that several laws and regulations have been adopted by the municipality in order to improve air quality within the city [
44,
51]. Other aspects that may have influenced the low value registered for 2022 can be the higher winter temperatures, which led to a decrease in residential fossil-fuel use, an increased usage of public transport, etc.
Since the highest values were recorded at station B-6, this station was used within this paper. The dataset used consists of 8760 records representing one year of hourly data between 1 August 2021 to 31 July 2022. The dependent variable B-6 is available online on
www.calitateaer.ro, accessed on 12 September 2022.
Upon analyzing the variation of NO
2 levels at station B-6 within the entire dataset (
Figure 2), it can be seen that there was a decrease in pollution beginning from December of 2021.
2.2. Meteorological Data
The independent variables representing hourly weather data are available via the visual crossing weather application programming interface (API) [
52]. The meteorological station is the Filaret station and it is located within Bucharest city, a few hundred meters from the air-quality-monitoring station B-6.
As can be seen in
Table 2, there are missing values for both dependent and some of the independent variables.
The missing data can be explained in two ways. The first one, in relation to variables such as snow depth or solar energy and based on the logic implemented in the data extract, the absence of the transform and load flow is sensible since it is not expected to snow throughout the entire year or for solar energy to be used every day for 24 h. Second, in the case of B-6, the missing data cannot be explained by any phenomenon other than miscommunication that likely led to data loss. For this second scenario, the missing data are filled using polynomial interpolation.
The variables used and their abbreviations are as follows: temp = ambient temperature [°C], feelslike = real feel (temperature) [°C], dew = dew point [°C], humidity = relative humidity [%], precip = precipitation [mm], precipprob = precipitation chance [%], snow = snow [mm], snowdepth = snow depth [mm], windgust = wind gust [km/h], windspeed = wind speed [km/h], winddir = wind direction [degrees], sealevelpressure = sea level pressure [mb], cloudcover = cloud cover [%], visibility = visibility [km], solarradiation = solar radiation [W/m2], solarenergy = solar energy [MJ/m2], uvindex = UV index, severerisk = severe risk.
As part of the data preprocessing step out of the timestamp, there are extracted features representing the month, day of the month, day of week, hour and year. From the perspective of traffic peaks, their impact on the air quality of the day of the week as well as the hour are important for the feature list. For the temperature and wind speed, new features are generated as the rolling average over the previous 6, 12 and 24 h. Apart from the numerical data described in
Table 1, there are two additional categorical variables representing precipitation type and cloud cover. Even these two might be redundant as the data preprocessing step are encoded and used further. To avoid the bias induced by multicollinearity, all the features with a Pearson correlation coefficient higher than 0.9 were removed. At the end of the preprocessing step, 42 features were kept. The features removed due to high correlation are: “feels like”, “solar energy”, “UV-index” and “year”. The number of features is important in selecting the number of neurons on the hidden layers. The three selected candidates are calculated as n/2, n, 2n + 1, where n is the number of input features.
2.3. Machine Learning Recurrent Neural Network (RNN) Models
Artificial neural networks (ANNs) represent an area in which concepts derived from other major knowledge domains such as biology, mathematics, programming, engineering, statistics, or informatics are merged with the aim to mimic the way human neurons function. The neuron is the main structural element of the brain. The human brain is composed of neurons varying between 1 × 1011 and 2 × 1011.
By default, ANNs are considered capable to generalize very well in very specific, well-defined use cases. Moreover, they are expected to be capable of modeling nonlinear data (no direct relationship between independent and dependent variables), scalability, rational and contextualized outcome.
In a pollution-prediction problem where historical data are known, it can be considered as falling under the scope of supervised learning. By definition, supervised learning means that a model is trained with both independent and dependent variables available in the training dataset. During the training, the prediction made by the trained model during the intermediate steps is compared to the known actual values. Based on the error between the prediction and actual values, the model parameters are adjusted as part of the training. When the error is reasonable in relation to the studied problem, then the training is stopped. The last configuration of the parameters is kept as the final one.
Therefore, the main known biological neuron data processing and propagation mechanisms are implemented in ANNs. Artificial neurons are for ANNs what biological neurons are for the human brain (
Figure 3). It is common for neurons to have multiple inputs and one output. Neuron inputs are signals coming from the outside environment or other neurons of the network, while the output is the signal the neuron that is propagating back to the environment or to another neuron of the network. Each connection between neurons has its own synaptic weight attached, where the information is stored. The synaptic weight represents, roughly, how important an input is for the neuron. These weights are adjusted during the training until the error is minimized according to the defined criteria.
For the use cases where the data available consist of a time series, then the type of neural networks used must be suited for this type of data. Of course, the use of feedforward neural networks can provide reasonable performance, but other aspects, mostly time-dependency-related, must be taken into consideration.
Recurrent neural networks (RNNs)represent a popular choice among professionals for time series-based problems. There are multiple types of RNNs. The standard RNNs [
53] have the simplest mechanisms for processing the input data and delivering predictions while trying to minimize errors [
54].
The mechanisms are simple and straightforward, but the main two issues of the standard RNNs are gradient exploding or disappearing and data morphing. However, it is not mandatory for any of them to occur [
55,
56].
2.3.1. Long Short-Term Memory Networks (LSTM)
A more complex RNN architecture has been proposed in order to overcome the weakness of the standard RNN (
Figure 4). This new architecture is called the long short-term memory (LSTM).
Additionally, to the standard RNN, a new mechanism for keeping and taking into consideration the short-term and long-term dependencies within the data has been implemented. This new system consists of three logic gates that govern the way information flows through the network. Concretely, the relevant information is kept and the irrelevant information is discarded. The term cell is coined and incorporates the new mechanisms. At each time step, there will be three inputs (input data at time step t, hidden state at time step t − 1, cell state at time step t − 1) and two outputs (hidden state at time step t, cell state at time step t).This new architecture is called the long short-term memory (LSTM) [
57]. The logic gate system consists of an input gate, an output gate and a forget gate. The input gate takes into consideration the information coming from the current time step’s assigned input vector
and the previous step’s hidden state vector
. Both have assigned their own synaptic weights vector,
for the current step input and
for the previous state. After each dot, the product is computed then the results are summed. At the end of the bias,
, is also added. On top of this, a sigmoid function is applied. The sigmoid function retains the values between 0 and 1 (Equation (1)):
As part of the input gate, a new candidate for the cell state,
, is calculated based on the current time step’s input,
, and the previous step’s hidden state,
. This layer has its own synaptic weight and bias. On top of the computed values, a tanh activation function is applied, as in Equation (2):
In a similar way to the input gate, the forget gate decides if information coming from the previous hidden state and current state’s input should be forgotten (Equation (3)). As expected, the weights
,
and the bias,
, belong to this gate.
The relevant information is passed through the input and forget gates, and then, taking into consideration the previous cell state,
, the new cell state for the time step t is calculated using Equation (4):
The two outputs are computed using Equations (5) and (6):
where
represents the output value for the current time step and
represents the current time step’s hidden state. The time dependencies are kept in the cell state, designed for long-term memory, and the hidden state, which is designed for short-term memory. With the gating system in place, the network predicts at timestep t using relevant information gained upstream starting from step t − 1.
2.3.2. Gated Recurrent Unit (GRU)
Another type of recurrent neural network inspired by the standard one is the gated recurrent unit (GRU). This architecture (
Figure 5) rapidly became popular in 2014 when it was presented for the first time [
58].
Similar to the LSTM, the information flow within the GRU is governed by a gate system, but with two gates instead of three. The notion of the hidden state is kept, while the notion of cell state is discarded from the design of GRU compared to LSTM. These decisions lead to a shorter training time due to the reduced computational load.
The reset gate is processing data from a short-term perspective. The functionality of this gate is similar to the LSTM’s forget gate and is governed by Equation (7):
The update gate is used for the purpose of long-term memory and is implemented by Equation (8):
The same activation function, sigmoid, is used for both the forget and update gates. The difference is in the weight matrices and bias. As expected, the closer the values in the weight matrices are to 1, the more relevant are the data.
The hidden state,
, is calculated in two steps. The first step, Equation (9), is used to calculate a new candidate hidden state for the time step t,
. The key in understanding GRU’s mechanism used for the information moving upstream is the way the previous hidden state,
, is multiplied by reset gate vector,
. All precious acquired information are discarded if the values equal to 0, and are kept if the values equal to 1.
The second step is to calculate the hidden state for the time step t as in Equation (10). The information passing through the update gate,
, the hidden state candidate,
, and the previous hidden state, h
t−1, are used to modulate the output of the hidden state at the time step t.
For both LSTM and GRU, while the new time steps are added, the equations above are recomputed. LSTM and GRU are sophisticated designs suited for time series data. The gating systems implemented in both provide the much-needed mechanisms to capture time dependencies to avoid data morphing and overall information loss.
2.3.3. Model Performance
Model performance is analyzed from two perspectives. The first perspective is where the model generalization capability is assessed using KPIs based on error calculation. The second is represented by the learning curves that provide learning while the model converges. A rigorous approach to KPI setting must provide the opportunity to understand the magnitude of the error relative to the data used, but must also be independent to the dataset to provide the opportunity for other researchers to compare the results of their work with the current ones. The mean absolute error (MAE) is a measure of error between actual and predicted values as the average of absolute errors (Equation (11)).
is a measure of how well the independent variables can explain the variance of the dependent variable (Equation (12)). MAE is relative to the dataset, while
is independent.
where
and
are the actual and predicted values. Additionally, the training time will also be added to the performance-related KPIs.
Being in a supervised learning paradigm, the initial dataset is split into two distinct parts. The training subset representing 70% of the initial dataset is used for training purposes only, and the remaining 30% representing the testing subset is only used for testing. None of the records in the testing subset were part of the training. The initial dataset is split into training and testing sets while keeping the temporal dependencies. All the entries are ordered by date, which means that in the testing subset, there are no data that occurred earlier than the latest in the training subset.
The selected metrics are computed for the testing subset only. The only relevant performance metrics for this kind of modeling is on the datasets that are not used during the training.
To identify the best model for the use-case, various configurable parameters had to be tested before deciding on the final model. Neural network-based models are highly configurable, so the best model selection can become time and computationally expensive. For both LSTM and GRU, the parameters in
Table 3 are tested. The model configurations are built as unique configurations of each of the parameters listed below. A total number of 5832 models were trained and tested.
3. Results and Discussion
In terms of generalization capabilities, both LSTM and GRU have maximum
values below 0.8. As can be seen in
Figure 6 and
Figure 7, most of the tested models achieve peak performance. This leads to the conclusion that used neural-based models show potential in providing generalization capabilities in various configurations of parameters. Having such a large number of models performing at maximum is a confirmation of the versatility of LSTM and GRU, but also builds the confidence in the final selected model.
To narrow the list, the three best LSTM and the three best GRU models are selected and plotted in
Figure 8. The performances of all selected models are very close, resulting in the decision of the final model to be dependent on the other criteria rather than generalization capabilities.
The six models compared in
Figure 8 have the parameters presented in
Table 4.
In terms of training time, as expected, there are important differences between the considered models (
Figure 9). Training time variation within the model type can be explained by batch size and epochs. The smaller the batch size, the higher the training time. The larger the number of epochs, the higher the training time. Of course, other parameters have an impact as well, but can be negligible in the overall context.
Another aspect mentioned in the theoretical fundamentals is that when assessing GRU and LSTM, there will be a difference due to the smaller number of equations that govern GRU compared to LSTM.
An approach to assess how the models converge is by plotting learning curves. This type of visualization shows how the loss evolves epoch by epoch during training. The visualization is made based on the epochs since this hyperparameter is the only one in the list controlling the degree with which the models are trained. The learning curves, where the blue line is for learning on the training subset and the green line is for testing the model on its subset, can provide information if over-fitting or under-fitting should occur. For the selected LSTM (
Figure 10) and GRU models (
Figure 11), the solution converged correctly without spikes or multiple intersections of the graphs. Thus, the training of the convergence of the GRU model is smoother compared to LSTM.
Once the predictive capability is revealed by computing the metrics, another way to assess the predictions with respect to the actual values is to visualize them. Three time frames were selected for visualization purposes, one extracted from the testing subset and two from outside the modeling dataset. The three time frames which are not overlapping are also for different days of the week. The first interval extracted for the testing subset contains predictions for 7–8 July 2022, Thursday and Wednesday (
Figure 12). The second interval contains data for 10–11 October 2022, Monday and Tuesday (
Figure 13). The third interval for 12–13 November address the predictions for a weekend (
Figure 14). The graphs were produced with GRU_2.
As can be seen in all three visualizations, the predicted values are close to the actual measured values. The model selected can predict when the NO2 level will increase or decrease, but in some cases, the inaccuracy is given by the magnitude of it. In this way, the tendency of the model shifts from overestimation to underestimation and vice versa. However, for all three evaluated time intervals, there are also subintervals in which the predicted values match the exact measured values.
Finally, a better understanding of the prediction results can be achieved by assessing the feature importance. As can be seen in
Figure 15, the most important features are wind speed, temperature, dew point, humidity and cloud cover. Feature importance is calculated by permuting the features. Based on the impact which the permutation makes, the importance of the feature can be assessed. When the feature is not important, the model performance is then not much altered. When the feature is important, the model performance is then altered in a perceptible way.
The remaining features are summed up in the last category: other. It is expected to have wind speed and temperature as top contributors, being in line with what other researchers found. Wind speed is important in dispersing the pollution; the higher the wind speed, the faster the dispersion. In this way, the polluting particles are moved away from the source. On the other hand, the impact of temperature has an explanation that can be provided by basic physics. Throughout the specified area, convection transports pollutants from ground level to higher altitudes, thus reducing the measured pollution. However, the impacts of wind speed and temperature must be studied in more detail due to their complex interactions with the environment.