1. Introduction
The surge in global economic growth has undeniably bolstered various sectors; however, this prosperity has been accompanied by a grave environmental repercussion—escalating concentrations of pollutants in the atmosphere. Human activities associated with urbanization, industrialization, and economic development have significantly contributed to this burgeoning ecological crisis. While these advancements have propelled society forward, they have concurrently triggered the production of harmful pollutants, adversely impacting the environment and human health. The resulting air pollution has emerged as a multifaceted global concern, encompassing social, economic, political, and legislative dimensions [
1,
2].
In the United States, the acknowledgement of the pressing nature of air quality concerns is evident in the mandate for the US Environmental Protection Agency (EPA) to reassess the National Ambient Air Quality Standards (NAAQS) every five years. This regulatory obligation underscores recognizing the dynamic nature of air pollution challenges and the necessity for adaptive strategies to mitigate its adverse effects. Similarly, in Europe, air pollution stands as a significant threat to environmental health, precipitating respiratory ailments, cardiovascular diseases, and premature deaths among populations. The staggering statistics depicting the correlation between morbidity and mortality rates and air pollution are alarming. Approximately nine million deaths annually are linked [
3,
4,
5,
6,
7].
WHO figures show that most of the world’s population (99%) inhales air that exceeds setting guideline limits containing high levels of pollutants [
8]. Global air pollution has increased by 8% from 2008 to 2013, with low- and middle-income countries showing the highest urban air pollution levels [
8]. Nevertheless, pollution levels in some European cities exceed the limit values for pollutant concentrations [
6]. Emissions through transport, industrial facilities, fires, and storms due to climate issues contribute to environmental degradation and impact public and individual health [
1,
7].
The activity of the transport sector emits air pollutants and increases levels of air pollution [
1]. There is mounting evidence that emissions from aviation grow faster than any other mode of transport [
2]. Aircraft are releasing carbon dioxide (CO
2), carbon monoxide (CO), hydrocarbons (HC), nitrogen oxides (NO
x), suspended particulate matter (PM), and sulfur oxides (SO
x) [
1,
2]. Aviation emissions contribute as much to the global climate change [
1]. However, air pollution is associated with numerous adverse health effects and many diseases [
1].
COVID-19 restrictive measures were imposed internationally in the transport field to limit the spread of the virus. During this pandemic, restrictions and limited anthropogenic activities impacted the gloomy picture of air quality. Many studies were produced on this matter, reporting the reduction in air pollution during the pandemic due to the abovementioned measures [
9,
10]. In this vein, nitrogen oxide (NO
x) concentrations and particle concentrations (PM
2.5, PM
10) were significantly reduced, while ground-level ozone (O
3) levels rose [
11,
12]. Carbon monoxide (CO), as well as sulfur dioxide (SO
2), showed abatement during the restriction period, but it has not been steady [
13,
14].
However, population-limited mobility and social distancing policies led to a downward trend in COVID-19 cases due to the measures taken [
10]. Evaluating the aftermath of the COVID-19 pandemic on society and the economy is essential to providing responses and adapting governmental measures and policies to be applied [
15]. Thus, the United Nations globally engaged its 131 country teams serving 162 countries to support governments and develop effective public health preparedness and responsiveness policies against the COVID-19 pandemic [
15].
Still, socio-economic changes are observed related to gender inequalities, as women’s work increased due to both childcare as well as remote professional work at home [
16]. Vulnerable groups in society have been strongly affected by the COVID-19 pandemic [
17] following a study by NIVEL (Netherlands Institute for Health Services Research). The study is based on records from general practitioners and data collected from the Statistics Netherlands (CBS) organization. Low-income families or disadvantaged social groups appeared to be more vulnerable. Also, mental health seems to be seriously affected in several population groups as well as in people having pre-existing mental health problems [
17,
18]. Subsequently, according to the WHO (World Health Organization), the COVID-19 pandemic caused discriminatory behavior due to the social stigma against several disadvantaged ethnic groups [
19] and people affected by the SARS-CoV-2 virus.
The selection of the given airport was based on the traffic, configuration, and area of the airport, as well as the variety of aircraft types operating at the airport. It will also be compared with prevailing environmental conditions, air pollution, regional development, and public health policy-making. Although various prediction models have been proposed by scientists in the field [
20], there is still a need for more accurate models to develop effective prevention and control strategies in cases where threshold values rise to unacceptable levels for public health.
Having calculated the pollutant emissions, obtaining an image of their concentration in the areas of interest will be appropriate in two ways: by atmospheric dispersion calculation models and on-site measurements. Thus, we can have snapshots of the concentration of pollutants in the atmosphere at a specific location and time.
As stated previously, our study aims to record the current air pollution in the airport and assess the factors that influence the existing management model, intending to optimize it through a decision support system. While there is a wealth of research on air pollution, there needs to be more information on air pollution issued by airports. The existing studies in our country have mainly focused on “Eleftherios Venizelos”, the largest airport in Greece, which is closer to the data of a standard European airport. The lack of data and data recording in the regional airports, especially those in Eastern Macedonia and Thrace, aroused our interest in the present study. However, the restrictive policy due to the pandemic offers us an ideal model for comparative studies of the impact of pollution on air quality levels.
To summarize, this study aims to assess the air pollution at Alexandroupolis Regional airport in Greece and the influence of meteorological parameters on the dispersion of pollutants related to air transport operations. Furthermore, this study applies machine learning techniques to develop a methodological approach for predicting air pollutants and identifying critical environmental conditions that affect air pollution. Specifically, it will be assessed whether aviation emissions contribute significantly to local air pollution, with NOx, CO2, and PM2.5 being the dominant pollutants. Furthermore, we assume that meteorological conditions may impact the concentration levels of contaminants. Lastly, we present various machine learning models intending to increase the predictive ability for estimating pollutant levels, offering a valuable tool for air quality management, mainly at regional airports.
2. Materials and Methods
As we stated previously, our interest was focused on a study of air traffic in a regional Greek airport of Eastern Macedonia and Thrace, which is the airport of Alexandroupolis (
Figure 1).
The Alexandroupolis “Democritus” civil airport is approximately 7.0 km east of Alexandroupolis in Evros Prefecture, Thrace (Northeastern Greece). The airport pays tribute to the ancient atomic philosopher Democritus, who hails from Avdira near Xanthi in Thrace.
Completed in 2011, the airport comprises terminal and administrative buildings covering over 8500 m
2. Its coordinates place it at Latitude: 40°51′21″ North and Longitude: 25°57′22″ East. It comprises one terminal building, an administration building, a control tower, and a fire brigade station. At the same time, it holds a Category VI (6) rating for Airport Fire Fighting, providing four (4) parking positions tailored for medium-sized aircraft (
http://www.ypa.gr/en/our-airports/kratikos-aerolimenas-alejandroypolhs-dhmokritos-kaald, accessed on 1 December 2023).
Our study was conducted from January 2019 to December 2020. Due to the containment measures, the impact of aircraft pollutants during the COVID-19 global pandemic, as a reduction in flight numbers, was registered. The Hellenic Civil Aviation Authority (CAA) collected all air traffic and fleet composition data.
Air pollutants from aircraft operations disperse based on several meteorological factors, including wind speed, temperature, and atmospheric stability. The dispersion patterns determine the extent to which pollutants such as NOx, CO, SOx, and PM
2.5 reach populated areas near airports. Wind direction and speed significantly affect the transport of contaminants, while temperature inversions can trap emissions near the ground, leading to higher exposure levels in nearby communities. Manisalidis (2023) highlights that exposure to these pollutants is directly linked to respiratory and cardiovascular diseases, increased hospital admissions, and long-term health complications [
21].
The emissions have been calculated using each aircraft’s emission factors, following the standard LTO emission factor methodology and the analytical methodology incorporated in the
EMEP/EEA Air Pollutant Emission Inventory Guidebook, which includes emissions released at ground level and up to an altitude of 3000 feet, following the International Civil Aviation Organization (ICAO) guidelines. Specifically, pollutants such as NOx, CO, and PM
2.5 were primarily assessed at ground level, where aircraft taxiing, takeoff, and landing emissions occur. However, some dispersion of pollutants into the lower atmosphere is expected, influenced by meteorological conditions such as wind speed, precipitation, temperature, and atmospheric stability [
1]. By incorporating these factors, the study comprehensively assesses aviation emissions and their potential impact on local air quality.
Briefly, the emissions in this study were calculated using each aircraft’s emission factors according to the standard LTO emission factor methodology and the methodological approach described in the
EMEP/EEA Air Pollutant Emission Inventory Guidebook [
22]. The total emissions
Em,a,p,I of a given pollutant
p from aircraft type
I at airport
a over a specific period
T were estimated using the simplified approach:
where:
Em,a,p,I = Emissions of pollutant p from aircraft type i at airport a for time period T (t/T);
EFp,i = Emission factor for pollutant p for aircraft type i (g/LTO);
Δa,i = Number of LTO cycles for aircraft type i at the airport a (LTO/T).
Factor 10
−6 is applied to convert emissions from grams (g) to metric tons (t), ensuring compliance with international emission reporting standards. This activity-based approach ensures that emissions are estimated based on real-time aircraft operations, particularly within the Landing and Take-Off (LTO) cycle, which includes approach, taxi-in, taxi-out, takeoff, and climb-out up to 3000 feet. By adopting this standardized methodology, the study provides a robust and internationally recognized framework for evaluating aviation-related air pollution [
23,
24].
Meteorological parameters: Weather parameters, comprising monthly average temperature (temperature °C), rain (mm), average sunshine duration (INST), and maximum wind (Beaufort), were acquired from the nearby meteorological station (
https://w1.meteo.gr/Gmap.cfm accessed on 1 April 2023). The automatic airport station measured all basic meteorological parameters in the area and represented the weather conditions and the respective climate data from the airport area.
The dataset consists of monthly aviation emissions and meteorological data collected for Alexandroupolis airport in 2019 and 2020. It contains the following fields: Year: the calendar year of data collection; Total Traffic: the total number of aircraft movements recorded monthly; Aircraft Type: the specific aircraft models operating during the month; Month: the corresponding month of data collection; CO2 (kg): the total carbon dioxide emissions from aircraft operations; NOx (kg): the nitrogen oxide emissions from aircraft engines; CO (kg): the volume of carbon monoxide emissions; HC (kg): hydrocarbon emissions from aircraft fuel combustion; SOx (kg): sulfur oxide emissions attributed to aviation activities; Fuel Consumption (kg): the total fuel burned during operations; PM2.5 (kg): fine particulate matter (PM2.5) emissions from aircraft operations. Meteorological parameters (temperature, rainfall, sunshine, and wind speed) are monthly averages representing prevailing weather conditions; the emissions data are based on average emission factors for different aircraft types, calculated according to monthly total aircraft traffic.
The aviation emissions dataset consists of monthly cumulative values, meaning that for each pollutant (e.g., CO
2, NOx, PM
2.5), the total monthly emissions are recorded based on the sum of emissions from all aircraft operations within the month. Accordingly, the meteorological data consist of monthly averages, with temperature, wind speed, precipitation, and sunshine duration averaged over the corresponding month for 2019 and 2020. This distinction ensures emissions reflect the total aviation activity while meteorological data represent prevailing atmospheric conditions. To enhance clarity,
Supplementary Materials Table S1 presents a sample of the dataset used in the study, demonstrating how emissions and meteorological parameters are structured for analysis.
This research’s set of measurements consists of 168 measurements for two years, 2019 and 2020. This dataset records aviation emissions as monthly cumulative values and meteorological parameters as monthly averages. As mentioned above, limitations in data availability and the operational characteristics of the border regional airport of Alexandroupolis limited the study period to two years. Unfortunately, the recording of meteorological data before this period presents some inconsistencies, making it difficult to ensure reliable long-term data. However, the choice of this period provides the study of the impact of the pandemic crisis on air traffic in 2020, namely pre-pandemic vs. pandemic emissions. In particular, the sharp decline in aviation activity due to restrictions imposed on air travel at national and international levels provides a unique opportunity to examine pre-pandemic emissions versus the evolution of the pandemic, offering valuable insights into how operational disruptions affect air quality.
This study provides a substantial snapshot of aviation-related emissions. Future research efforts should focus on a larger dataset, e.g., 2015–2025, to decipher long-term trends, the rate of air traffic recovery, and the corresponding impacts. However, such a large-scale study would require consistent methodologies for collecting all data over many years to ensure its comparability and reliability.
To summarize, 168 measurements were catalogued for all parameters, air pollutants from aviation emissions, and meteorological variables in 2019 and 2020. Descriptive statistics and the Pearson correlation coefficient were applied to the meteorological and pollutant variables at a 0.01 confidence level, except otherwise stated.
Data Description, Machine Learning Models, and Evaluation Metrics
Machine learning, a subset of Artificial Intelligence, focuses on granting computers the capacity to acquire the skills needed to execute particular tasks without explicit human programming. It revolves around creating models that can absorb knowledge from data and subsequently use it to make informed decisions or predictions when confronted with new available data (
Figure 2) [
25].
Linear Regression is a straightforward choice for basic predictive tasks, performing well on high-dimensional, sparse datasets. Decision trees are non-parametric models that efficiently navigate data using simple tests and are great for nonlinear decision boundaries. In regression, an ensemble of decision trees creates a combined Gaussian distribution prediction. Gradient boosting is a powerful technique for regression, incrementally building trees while minimizing error. It excels in handling complex problems with a stepwise approach. Bayesian inference aids data analysis and learning. Fields like medicine need to assess prediction uncertainty. Neural networks can be used for Regression, offering adaptability in modelling nonlinear functions, especially in complex scenarios [
26,
27,
28,
29,
30,
31].
In the study proposed here, the air pollution database for the city of Alexandroupolis, Greece, was considered, and an attempt was made to predict the emissions levels of PM
2.5 using various machine learning methods. Five basic algorithms, Bayesian Linear Regression, Boosted Decision Tree, Linear Regression, Decision Forest Regression, and Neural Network Regression, were used for regression analysis using machine learning methods. The machine learning algorithms used in this research are presented here briefly: Decision trees are non-parametric models that efficiently navigate data using simple tests and are great for nonlinear decision boundaries. In regression, an ensemble of decision trees creates a combined Gaussian distribution prediction. Gradient boosting is a powerful technique for regression, incrementally building trees while minimizing error. It excels in handling complex problems with a stepwise approach. Linear Regression is a straightforward choice for basic predictive tasks, performing well on high-dimensional, sparse datasets. Bayesian inference aids data analysis and learning. Finally, neural networks can be used for Regression, offering adaptability in modelling nonlinear functions, especially in complex scenarios (
https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/boosted-decision-tree-regression?view=azureml-api-2, accessed on 1 March 2024) [
2,
32].
Classical regression-based algorithms, especially machine learning ones like Decision Trees and Random Forest, have been widely applied in forecasting air quality levels and characteristics. Regarding predictor variables, three main categories were discerned: variables associated with pollutant concentrations, meteorological parameters, and variables about temporal and spatial characteristics [
33]. In the same survey, PM
2.5 was the most predicted pollutant among the analyzed documents. Consequently, a combination of variables from the three categories mentioned above was chosen to anticipate PM
2.5 concentration in this research study.
Figure 3 illustrates the research workflow of the proposed system.
The evaluation metrics, such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), Relative Squared Error (RSE), and Coefficient of Determination (R
2), were used to assess all the machine learning methods. Evaluation metrics are valuable tools for determining the performance of machine learning models, and they can be categorized into two groups. On the one hand, range-dependent metrics are used to compare different models on the same dataset. On the other hand, percentage metrics facilitate model comparisons independently of the dataset. Some of the most commonly used metrics in the analyzed studies in regression are R
2 and MAPE (20.1% each). RMSE/MSE and MAE are prevalent among the range-dependent metrics, appearing in 68.45% and 46.3% of the publications, respectively [
34].
The robust evaluation framework of the models can be briefly explained as follows: A lower value signifies a more robust prediction ability regarding the mean absolute error. Likewise, a smaller value indicates more substantial predictive capabilities for root mean squared error. In the case of mean absolute error, better model performance is displayed by lower values. On the contrary, the coefficient of determination (R
2) assesses the model’s predictive capacity, which is revealed to have higher values [
34].
Supplementary Materials Figure S1 presents all modelling parameters extracted from the Microsoft Azure Studio Classic for regression machine learning algorithms.
Finally, it is also essential to understand the predictive ability of machine learning models, apart from their evaluation framework, through specific metrics as mentioned above. In this vein, the Permutation Importance method (PIM) was used to interpret the best machine learning model that will result [
35,
36]. The process’s function is simple and practically based on the fact that if a variable is essential for the model’s predictive ability, rearranging its values will significantly reduce its accuracy. On the contrary, if rearranging the values of the variable leaves the predictive ability of the model indifferent, then this variable does not significantly impact the predictive ability of the model. Furthermore, this method can be applied to any machine learning model, and thus, its application does not require retraining the model but only rearranging the input values. Consequently, the researcher is provided with a direct estimate of the importance of all features based on the modification in the model performance [
37,
38,
39].
3. Results
As stated, all information about air traffic and fleet composition, depicted in
Figure 4, was gathered from the Hellenic Civil Aviation Authority (CAA).
Figure 4 shows the aircraft that used Alexandroupolis airport in 2019 and 2020. The A320 had the largest percentage share, followed by the AT43, with 35% and 22%, respectively. The remaining aircraft types for 2019 shared percentages of 16%, 14% for the AT45 and AT 72, while the A319 and DH8D aircraft had percentages below 10%. A similar picture is presented in the percentage shares by aircraft type in 2020 at Alexandroupolis airport. Specifically, the A320 held the most significant percentage with 34%, while the AT72 share increased to 24%. The AT45 (15%) and AT43 (12%) aircraft types had slight changes compared to the previous year. The share of DH8D aircraft increased to 10%, and other aircraft accounted for 2%, indicating a slight diversification in the fleet composition.
Figure 5 illustrates the two pie charts representing the percentage contribution of the pollutants under study for 2019 and 2020.
In 2019, carbon dioxide had the most extensive percentage distribution, accounting for 75.5% of total emissions, followed by fuels, which accounted for 24%, and the remaining pollutants, such as NOx, CO, HC, SOx, and PM2.5, had more minor impacts. In 2020, the same pattern is presented. Namely, carbon dioxide represents the most significant percentage of pollutants, while the rate of fuels, although decreasing compared to 2019, still has the second percentage distribution among them.
Furthermore, although the proportions of the remaining pollutants represent approximately the exact percentages, it is evident that an overall decrease is observed, which can be attributed to the sharp reduction in air transport and activity due to the restrictive measures imposed during the pandemic crisis. The most significant decrease was observed for hydrocarbons (62.2%), followed by PM2.5 (35.4%), CO (33.4%), and CO2 (28.1%), NOx (26.5%), SOx (25.3%), and fuel consumption, which decreased by 22.9%. The above highlights the significant influence of air traffic volume on emission levels and emphasizes the need for further mitigation strategies to control aviation-related pollution. In conclusion, carbon dioxide is the pollutant with the most considerable percentage contribution to emissions.
Figure 6 depicts the monthly comparison of pollutant emissions between 2019 and 2020 through seven bar graphs. Each graph describes the variation of a specific pollutant per month, with yellow representing 2019 emissions and orange representing 2020 emissions.
Figure 6 shows the monthly carbon monoxide emissions, with a lower concentration in 2020. The exact figure shows a decrease in carbon monoxide emissions in both years in 2020. Moreover, hydrocarbon emissions in 2020 are significantly lower, mainly at the beginning of the year. In
Figure 6, which depicts the monthly emissions per year of nitrogen oxides, decreasing trends appear in 2020 compared to 2019. The PM
2.5 concentrations also decreased in 2020 compared to 2019, with the difference being less pronounced than in the other pollutants. Also, monthly sulfur oxide emissions show less change, decreasing in 2020. Finally, fuel emissions also follow a downward trend in 2020, indicating a decrease in consumption. Generally speaking, pollutant emissions are lower in 2020 due to changes in activities that affect fuel combustion and gas emissions, such as restrictive measures due to the COVID-19 pandemic (
Supplementary Materials Figure S2 visualizes all monthly trends of pollutant emissions in 2019 and 2020).
Table 1 and
Figure 7, respectively, give the descriptive statistics and the values of the Pearson correlation coefficient between the meteorological and pollutant variables. The correlations between meteorological parameters, the pollutant variables, and the respective descriptive statistics are presented in the table below (
Table 1).
The above table shows the descriptive characteristics of the measurements for the variables that determine air pollution and the measurements of the collected meteorological parameters. It shows the range, minimum, and maximum values of the parameters, the average, the variance, the standard deviation, and the standard error of the variable’s values.
Nitrogen oxides (NOx) show a wide range of values and high standard deviations, indicating large fluctuations in the recorded values of their presence. Hydrocarbons (HC) and sulfur oxides (SOx) show variability in their concentrations, with lower values. Regarding PM2.5, occasional peaks in their presence are observed from the values of their descriptive statistics. Regarding fuel and carbon dioxide emissions, fuel consumption has the most extensive range (0–48904.8) and an average value of 9026.7, indicating significant fuel consumption and usage.
From the recording of meteorological parameters, the average temperature appears to have a value of 16.9 °C, and precipitation has an average value of 45.45 mm, indicating fluctuating weather conditions. The total traffic based on the recorded flights reaches an average of 16, with an average sunshine duration of 228 min. In conclusion, the high variation of pollutants such as NOx, CO, and CO2 indicates the seasonal variation of flights. Concurrently, the relatively low levels of PM2.5 values against the background of the fluctuation in the values of meteorological parameters (wind, rain) indicate the variability in the dispersion levels of pollutants.
Figure 7 outlines the visual representation of the Pearson correlation coefficients between different pollutant concentrations (e.g., NO
x, CO, SO
x, PM
2.5, CO
2) and meteorological parameters (e.g., mean temperature, rain precipitation, wind speed). Briefly, the red tones imply positive correlations (closer to +1), the blue tones indicate negative correlations (closer to −1), while white or light colors demonstrate weak or absent correlations between the variables.
Strong positive correlations are shown between CO2 and NOx, SOx and NOx, and fuel with CO2 and NOx, possibly due to familiar sources of fuel combustion and air traffic emissions. Fuel consumption also contributes to carbon dioxide emissions. Finally, PM2.5 is strongly associated with NOx and SOx.
Conversely, moderate correlations between total traffic, PM2.5, and CO2 are shown, as more air traffic leads to increased suspended particles and carbon dioxide emissions.
Rainfall shows negative correlations with PM2.5 and sunshine duration since it is known that rainfall reduces the concentrations of these particles. In conclusion, the meteorological parameters, temperature, and wind speed do not significantly affect pollutant concentrations, with the corresponding Pearson correlation coefficients ranging at levels that indicate weak or no correlation. In terms of statistical significance, airplane fuel consumption and total traffic appear to be the primary drivers of air pollution, significantly affecting the effect of emissions of the above pollutants.
In
Figure 8, the comparison of the proposed algorithms is represented. Bayesian Linear Regression and Linear Regression performed better than the other algorithms. These two had almost the same Coefficient of determination metric (R
2) score. Linear Regression had the lowest value of Relative Squared Error. In contrast, the Bayesian Linear Regression algorithm had the lowest value for the metrics Mean Absolute Error, Root Squared Error and Relative Squared Error. The remaining algorithms show mixed trends in their evaluation metrics. Some excel in one metric and others in another.
The extant algorithms exhibit disparate trends in their respective evaluation metrics, manifesting prowess in distinct domains. Notably, Neural Network Regression encountered suboptimal error rate performance while attaining a commendable Coefficient of Determination value. This is evident in its elevated values across multiple error metrics, including Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, and Relative Squared Error.
Conversely, the Boosted Decision algorithm demonstrated moderate performance across most metrics, except the Coefficient of Determination, where it ranked second-lowest compared to its algorithmic counterparts. In a final analysis, the Decision Forest Regression emerged as the third-best performer among the ensemble of algorithms under consideration. This conclusion is substantiated by its superior performance in Absolute Error, Root Mean Square Error, Relative Absolute Error, Relative Mean Square Error, and Coefficient of Determination compared to the remaining machine learning algorithms, as illustrated in
Figure 8.
Figure 9 shows the effect of features on the prediction of PM
2.5 levels for the Bayesian Linear Regression model. The analysis is based on the Permutation Importance Method, and the importance of the features is depicted in two different ways.
Panel A visualizes the features’ importance at their variation level in predicting PM
2.5 levels. Each point shows the degree to which it contributes to the model. Thus, it is noted that carbon monoxide, NO
x, FUEL, CO
2, SO
x, HC, and TOTAL TRAFFIC concentrations have different levels of influence. The TOTAL TRAFFIC feature shows the most negligible dispersion, in contrast to CO and NO
x, which show the most considerable variability.
Figure 9B shows the mean value importance of the features. TOTAL TRAFFIC emerges as the most critical factor influencing the prediction of PM
2.5 levels, followed by the concentrations of HC, SO
x, CO
2, and FUEL pollutants. On the contrary, CO shows minor importance, which demonstrates that it is not a critical factor for the change in PM
2.5 levels.
4. Discussion
The European Green Deal prioritizes addressing air pollution, recognizing its critical impact on public health and the environment. Proactive measures aim to pave the way for all European residents’ healthier and cleaner future. The risks posed by air pollution are severe, contributing significantly to respiratory illnesses, cardiovascular complications, and premature mortality. To combat this pressing issue, the European strategy revolves around comprehensive actions, including reducing transport, industry, and agriculture emissions [
40]. In addition, air quality has begun to be investigated in terms of its impact on mental disorders, with studies attempting to elucidate the role of PM
2.5, for example, concerning the development of depression, schizophrenia, anxiety, and bipolar disorder [
41].
In the present study, the Bayesian and Linear Regression models yielded high metric performances to predict PM2.5 pollutants related to aviation emissions, with R2 of 0.96 and 0.97, respectively. This shows high accuracy when considering the concentration of other pollutants and meteorological factors.
Also, the above prediction models effectively captured the impact of aviation activity NO
x and CO
2 emissions, with Pearson correlation coefficients of 0.92 and 0.89, respectively (
Figure 7). This highlights the critical role of aviation activity and the corresponding fuel consumption in the concentration of these pollutants. This finding also aligns with the existing literature on the impact of aviation activity on the concentration levels of various pollutants [
42,
43,
44].
It should also be emphasized that the predictive ability of the models is captured in the right direction with the actual measurements of a sharp reduction in emissions in 2020, the year of the start of the pandemic crisis. Specifically, the 28.1% reductions for CO
2 and 26.5% for NO
x (
Figure 5) also reflect the actual picture of the reductions resulting from the corresponding air traffic reduction during this period. Other studies which evaluated the impact of the pandemic on air activity and air quality have confirmed such a pattern [
45,
46].
The impact on air quality due to lockdown restrictions has been observed, and concentrations of air pollutants have decreased significantly during the pandemic, mainly due to reduced anthropogenic activities. In Greece, a significant drop in urban air pollution has also been reported, for example, a reduction in NO emissions of up to 78% in urban stations and by 45% at Athens International Airport, while NO
2 levels decreased by 73% in the two largest cities of Greece, Athens and Thessaloniki. This decrease is also due to the general reduction in aviation activity and, by extension, emissions. Lastly, it was observed that pollutants such as NO
2 showed a sharp decrease. In contrast, pollutants such as PM
2.5 and PM
10 showed more variable trends influenced by meteorological variables such as wind dispersion and dust transport [
47,
48].
Despite machine learning models’ high predictive ability, the models do not take into account small deviations that should be attributed to factors such as meteorological fluctuations and local pollution sources. These are generally the recommendations for improving air forecast models, namely incorporating more variables that reflect weather conditions [
49,
50,
51].
Artificial intelligence models for forecasting environmental pollution are a previously introduced concept. Investigations into employing artificial intelligence in the context of atmospheric pollution have experienced a notable surge since 2017. Within the domain of air pollution, machine learning models, with a specific emphasis on regression techniques, stand out as widely adopted approaches for scrutinizing and deciphering the distributions of air pollutants, mainly when focusing on PM
2.5 concentration and its implications for public health [
52].
Another study [
53] compared several algorithms (MLR, KNN, M5P, RF, SVM, or MLP) to predict various pollutant concentrations in Valencia, Spain. Notably, RF achieved the highest accuracy [
53]. Ameer et al. (2019) performed a similar comparison involving four models (RF, DT, MLP, Boosting) to predict PM
2.5 levels in several Chinese cities, with RF demonstrating superior accuracy [
54]. Li et al. (2019) pitted Logistic Regression against RF for forecasting AQI in California, and RF emerged as the more accurate predictor [
55]. Pasupuleti et al. (2020) considered three algorithms (RF, DT, MLR) to forecast the concentration of various pollutants in Spain, with RF again showing the highest accuracy [
56].
In a different context, Kaur Bamrah et al. (2020) compared various regressor methods (MLP, RF, DT, and SVR) for predicting AQI in India, incorporating terrain features [
57]. In these studies, RF consistently achieved the highest accuracy. Yarragunta et al. (2021) compared six regression algorithms (DT, KNN, SVR, MLR, RF, and Naive Bayes) to predict AQI in Delhi, and the DT algorithm secured the highest accuracy in this particular case [
58]. Chakradhar Reddy et al. (2021) conducted a comprehensive comparison of six supervised ML models (LR, RF, DT, SVR, KNN, and Naive Bayes) for forecasting AQI in New Delhi [
58], and the results indicated that DT achieved notably high accuracy, approaching 100% [
59,
60].
In this current research, Bayesian Linear Regression and Linear Regression algorithms were the most accurate. Particulate matter concentrations, specifically PM
2.5, are predominantly influenced by pollution emissions and prevailing weather conditions [
61]. Over four years, Kou et al. (2021) [
61] scrutinized the meteorological impact on PM
2.5-related air quality in China between 2016 and 2019, utilizing a high-resolution atmospheric composition reanalysis dataset [
62]. The correlation between weather patterns and air quality was further investigated. The results indicated that, in tandem with China’s stringent enforcement of its clean air policy from 2016 to 2019, meteorological conditions played a constructive role in enhancing air quality [
20]. In a separate investigation, Alpan and Sekeroglu (2020) employed machine learning algorithms to predict six pollutant levels, integrating meteorological data such as precipitation and temperature [
62]. The Random Forest algorithm demonstrated a high predictive capability across two distinct datasets. The authors asserted that accurate forecasts of pollutant concentrations could be achieved solely by utilizing meteorological data [
63].
Ambient air pollution is a significant global health concern, contributing to over 3 million premature deaths worldwide, with Low- and Middle-Income Countries (LMICs) bearing the majority of this burden. In these countries, facing air pollution levels classified as public health hazards, megacities resort to emergency measures like red alerts and vehicle-rationing interventions (VRIs). Even during interventions, both cities experienced increased cardiopulmonary mortality, emphasizing the need for short- and long-term strategies to manage the health impacts of air pollution [
64].
Analyzing the dynamics behind fine particulate matter (PM
2.5) and ozone (O
3) pollution across key regions in China, extensive studies employed the Weather Research and Forecasting/Community Multiscale Air Quality (WRF/CMAQ) system from 2013 to 2019. The model demonstrated high accuracy, evaluating against observed pollutants in significant areas like the North China Plain, Yangtze River Delta, Pearl River Delta, Chengyu Basin, and Fenwei Plain, slightly overestimating PM
2.5 in one region. Notably, nitrate (NO
3−) and ammonium (NH
4+) emerged as vital PM
2.5 components in heavily polluted zones. This analysis highlighted negative correlations between PM
2.5 and O
3 in most areas, underscoring the model’s ability to simulate China’s long-term air quality trends, which is crucial for effective emission control strategies [
65].
Furthermore, understanding pollutant emission sources is crucial for effective mitigation. Air quality data from urban, suburban, industrial, and rural areas in Jining, Shandong Province, China, were compared for characteristics and health risks associated with air pollutants. Variances in PM
2.5, PM
10, SO
2, NO
2, and CO concentrations between 2017 and 2018 were observed, with O
3 concentrations increasing. Functional areas exhibited similar seasonal variations and diurnal patterns, with O
3 contributing significantly to exposure excess risks (ERs). Premature deaths attributable to air pollutants were calculated, highlighting O
3 as the significant contributor. Pollution transport from industrial areas to urban and suburban regions played a crucial role in determining air quality, emphasizing urgent measures to reduce O
3 pollution, particularly considering the prevalent ozone formation regime in industrial areas [
66].
Air pollution and climate change exhibit intricate interdependencies, where climate fluctuations impact air pollution dynamics and vice versa. This relationship is complex, with emissions of air pollutants affecting climate through radiative forcing and climate changes altering the physical, chemical, and biological processes linked to air pollution. High-pressure weather conditions tend to be associated with elevated PM
2.5 and O
3 levels. Seasonally, PM
2.5 concentrations are higher during the winter, while O
3 concentrations are higher during the summer [
67]. Uncertainties persist despite recognizing these interactions, requiring deeper insights to comprehend their mechanisms and consequences. Additionally, the co-emission of greenhouse gases (GHGs) with air pollutants suggests the potential for synergistic mitigation strategies. Yet, the existing literature needs an in-depth understanding of these co-benefits [
68].
Notably, research has shown a link between long-term exposure to PM
2.5 and child mortality, with studies confirming this pattern in countries in Asia, Africa, and Latin America and an additional decline in living standards due to air pollution [
69,
70,
71,
72]. Given the impact of air pollution on the levels and severity of respiratory diseases, especially among elders and children, machine learning methods were applied to link air pollutants, seasonal variation, and climate data. A study in Taizhou, China, utilized various machine-learning models, including Linear Regression, Random Forest (RF), AdaBoost, and Neural Networks, to investigate the relationship between air pollutant concentrations and pediatric respiratory diseases. The findings reveal significant seasonal fluctuations in both the numbers of pediatric respiratory outpatients and the concentrations of air pollutants. NO
2, CO, particulate matter (PM
10 and PM
2.5), and outpatient numbers peak during the winter, indicating a substantial impact of air pollution on pediatric respiratory diseases. Regression models demonstrate that ML methods capture clinic visit trends and turning points, with nonlinear models outperforming their linear counterparts. Notably, the RF model emerged as the most effective [
73].
The burden of air pollution is disproportionately related to factors such as age and gender. An additional study showed that short-term exposure to air pollutants, mainly gaseous pollutants such as NO
2 and CO, is linked with an escalated risk of hospital visits for AD in a city in southern China with low pollution concentrations. The age group of women between the ages of 45 and 64 seems to be most affected, providing evidence that the level of air pollution may be a risk factor, even for anxiety disorders [
74].
The influence of meteorological factors such as wind is significant for the dispersion of pollutants, specifically PM
2.5, and air masses at a local level. The fluctuation of air masses at a seasonal level can also affect the transport of these particles and alter the level of air quality [
75]. In our research, four meteorological parameters were considered when estimating air quality in Alexandroupolis. Meteorology, atmospheric reactivity, and emissions at the regional level are, among other factors, the most contributing factors in the temporal variability of PM
2.5 concentration and air quality, as revealed in a study that applied statistical methods to consider PM
2.5 daily measures and meteorological parameters in India [
26]. The airport in Alexandroupolis primarily caters to domestic flights and experienced minimal alterations in flight frequency and fleet composition from 2019 to 2020. Nevertheless, in both years, there was a notable rise in emission concentrations—including fuel, NO
x, CO, HC, SO
x, and PM—during the summer and New Year seasons, coinciding with increased travel activity.
Temperature is also essential to CO emissions, as low temperatures reduce aircraft fuel evaporation due to inefficient combustion, resulting in increased carbon monoxide emissions. Humidity was also positively correlated with the above aircraft exhaust emissions [
74,
75,
76].
Comparing the present study with other approaches to emissions and pollutants at other airports in Greece, with a different methodological approach, it was found that NO
2 concentrations exceeded regulatory limits by almost 30% of the cases under specific meteorological conditions. In the same study, although the PM
10 and SO
2 concentration levels were within limits for air quality standards, in the present study, the maximum value recorded for PM
2.5 was 5.6 and for SO
x was 41, indicating that in smaller and regional airports, there is a different dynamic and distribution of these pollutants, possibly due to, among other things, local emissions from aircraft, ground vehicles. Finally, although the approach in the above study involves static dispersion modelling, the machine learning models here offer a slight advantage because they consider the real-time estimation of pollutant levels based on changing meteorological conditions and airport activity. This dynamic capability benefits proactive air quality management, while dispersion models mainly provide ex-post emission assessments [
77,
78].
In closing, we will also refer to a study that concerns the impact of lockdowns during the pandemic crisis at the two largest airports in Greece. What was observed is that at the airport of the capital of Greece, Athens, NO
2 and CO concentrations decreased by 45% and 30%, respectively, highlighting the dominant role of air transport and aviation in urban air pollution. Although there are no data yet for Alexandroupolis airport to make a comparison of pollutant concentrations before, during, and immediately after the pandemic crisis, we can speculate that other phenomena, such as extreme weather phenomena such as the dust transport observed in Greece and agro-industrial activities around the airport, may also contribute to the sources of atmospheric air pollution at airports and PM [
47,
48,
79,
80].