2.2. Reviewed AQI Systems
‘Comparison of the Revised Air Quality Index with the PSI and AQI indices’ [
5] notes that the pollution standards index (PSI) was established due to the increasing number of people suffering from respiratory problems and subsequently developed into AQI. RAQI was developed as an alternative to PSI and AQI, achieving more significant outcomes as it covers a wider range of pollutants and concentration levels. RAQI gave more accurate results than PSI and AQI, with certain abilities to distinguish certain pollutants. However, the cost of establishing a monitoring system covering PM2.5 is prohibitively expensive for many countries to implement, notwithstanding serious global O
3 problems.
‘Novel, fuzzy-based air quality index (FAQI) for air quality assessment’ [
6] proposed FAQI, using fuzzy logic for different pollutants based on different weighting factors. The FAQI was suggested as a more sensitive tool to address limitations in legacy AQI. The results of FAQI were compared to those of AQI USEPA (of the ‘United States Environmental Protection Agency’), and the authors flagged FAQI as a comprehensive, reliable method for decision-makers.
‘Towards an improved air quality index’ [
7] claimed that there are no universally significant methods that cover all specific situations of air quality and pointed out that methods of AQI could differentiate based on the number of pollutants, air quality levels categories, and boundaries points, and sampling period. The current AQI paradigm has limitations, as it is difficult to compare air quality levels across countries using it. The authors suggested adding PM2.5-specific standards and indexing for specific sources of pollutants (‘traffic areas’, ‘industries’, and ‘others’), adding a ‘natural events’ factor, as well as producing data when monitoring is not working. The authors acknowledged the complexity of developing more significant AQI and pointed out the need to study long-term factors for air pollutants, along with health-related descriptions.
‘A comparative study of air quality index based on factor analysis and EPA methods for an urban environment’ [
8] suggested factor analysis of the national air quality index (NAQI) to be used to cover the gaps in the EPA’s system. The authors claimed that NAQI could be used to compare daily and seasonal pollution levels in different areas to allow monitoring of seasonal trends.
‘Comparing urban air quality in Europe in real-time: A review of existing air quality indices and the proposal of a common alternative’ [
9] proposed a new common AQI (CAQI) to be able to compare air quality levels across Europe. It consists of two indices, one for roadside sites and the other for average city background conditions. The structure is assumed to bring consistency when comparing diverse parameters.
2.3. Systematic Reviews
Existing research has clearly observed the preponderance of certain algorithms in the field of AQP, and it is thought that usage and discovery of other algorithms could create more paths for more accuracy. Moreover, the existing literature suggested that there are limitations associated with including certain pollutants in predictions.
‘Time series forecasting using artificial neural networks methodologies: A systematic review’ [
2] studied new ANN models (developed during the period 2006–2016) and presented evidence that the predictions of hybrid models were more accurate than those of traditional ANN models (such as back-propagation with single hidden layers), despite the lack of a systematic process for hybrid model development. The study explored some new models in terms of architecture, complexity, relevant variable selection, parameters estimation, and implementation and evaluation, and recommended more research to specify criteria for relevant variable selections (i.e., the basis for selection), methodological development for the selection of ANN architectures, the creation of evaluation models, and a methodology that tests the generalization of models.
‘Machine learning approaches for outdoor air quality modeling: A systematic review’ [
10] analyzed 46 papers to determine why some algorithms are selected over others in prediction. It addressed the main need for ML-based statistical models to overcome the limitations of deterministic techniques, to model non-linear relationships between concentrations with required accuracy, and provided details about algorithms and how they are applied to enhance accuracy (principles of algorithms). Its main findings showed that estimation problems usually apply ensemble learning and regressions, forecasting problems mostly arise due to NNs and SVMs, and challenges exist for improving peak prediction and contaminants (such as nanoparticles). The authors claimed that ML research is mainly undertaken in Europe and North America and chiefly focuses on the estimation of pollutants’ concentration (using ensemble learning and regression analysis) and forecasting problems (using NN and SVM), which gives priority to accuracy over interpretability.
Estimation is more precise than forecasting; hence, forecasting is more variable. More complex methods, such as deep learning, are needed to accommodate the complexity of predicting air pollution ahead of time (days or hours), although such complex methods have the drawbacks of being very computationally demanding. The study emphasized the suitability of ML to predict air quality, and traditional deterministic methods showed complexity in modeling fine PM, while ML approaches (estimation and forecasting) showed high accuracy relative to other emission gases; however, lower precision is noted for peak values. The study reported that accuracy is higher for medium and small peaks than high concentrations of pollutants (high peaks), while forecasting for some gases such as CO and NOx is limited in terms of performance. The assessed models showed better performance in peak weather conditions. The study suggested future directions, developing models that enhance pollution peak prediction and models that improve critical pollutants such as CO and NOx.
‘Machine learning algorithms to forecast air quality: A survey’ [
11] reviewed 155 publications and demonstrated a direct correlation between the most polluted and the most studied countries, noting an increasing trend in the number of ML models used for pollution studies. For the studied pollutant measures, nearly half of the studied papers used AQI, and for air pollutant concentration, 54 papers showed that PM2.5 is the most predicted. The most used pollutant features were weather variables.
In terms of ML techniques, DL methods were more widely used than regression algorithms, and hybrid algorithms include both types. Specifically, the most used algorithms were LSTM and MLP, while CNN, RNN, GRU, and auto-encoders were used less commonly. The most used regression algorithms were SVR and RF, while the less frequently used ones comprised DT, ARIMA, KNN, and Boosting. The review noted the increasing trend of using deep transformer networks. It also observed that air quality and climate change have been correlated in recent studies, creating a need to develop models for early warning of climate change consequences that could be caused by air pollution (for sustainable cities and societies). Recently, graph NNs have become more popular for air quality forecasting, which could model dynamic interactions (e.g., different cities, neighborhoods, and streets) with distance-based weights. There are recent applications for using temporal convolutional networks (TCNs) specifically for PM2.5 and recent mention of the use of recent applications of complex event processing (CEP) for air quality forecasting.
‘Statistical approaches for forecasting primary air pollutants: A review’ [
4] quantitatively analyzed research published between 1990 and 2018, identifying trends. It found that most papers mainly focused on air pollution and its relation to health diseases, urban pollution exposure models, and land use regression methods. PM, NOx, and O
3 were the most studied pollutants, and there was a marked preference for using ANN when studying PM and O
3, while LUR was mostly used in NOx studies. Hybrid methods (a combination of models) became the most used method between 2010 and 2018. The authors expected future mixed methods of statistical predictions to predict multiple pollutants at the same time. Interactions between pollutants are a challenging part of air pollution prediction future research, and there is an increasing trend for studying PM and the influence it has on air pollution. Reviewed research papers showed that PM is the most studied emission, followed by NOx and O
3, while the most used methods are ANN, LUR, multiple linear statistical analysis, and multi-method coupling models. The work highlighted the high importance of early warning system studies, pointed out the increase in accuracy for AQP studies over years of efforts in the domain, and discussed that there are still gaps in the domains and work to be conducted in this regard. It highlighted the necessity to study the interaction or relation between air pollutants, human health, and the urban environment, including the interactions between pollutants, in particular PM-NOx and PM-O
3 (as the main combination of interest).
‘A systematic literature review of deep learning neural network for time series air quality forecasting’ [
3] reviewed recent deep learning applications for time series air quality forecasting, and combinations of multiple components that produced hybrid forecasting models were suggested for potential superior performance and improved accuracy. Hybrid models may increase computational complexity and reduce the time efficiency of the models, which can be a downside of using hybrid models. The main components of deep learning studied were feature extraction, data decomposition, and spatiotemporal dependency. Various combinations of deep learning input parameters were presented for different problem requirements (different applications studied).
‘Machine learning algorithms in air quality modeling’ [
12] analyzed 38 studies applying ML techniques and studied input predictors and the impact of inputs on prediction accuracy improvements, considering the geographical locations of studies. It explored techniques applied for pollutant concentration (forecasting/estimation) and linear regression, NN, SVM, and ensemble learning algorithms, etc. The study concluded that ML techniques are usually used and applied in North America and Europe, and multicomponent analysis (factorial analysis) showed that estimation for pollution was performed using ensemble learning and linear regression but forecasting commonly used NNs and SVM. The study reported that ensemble learning and regression outperformed NN and SVM, noting estimation models’ low variability and standard deviation. Forecasting was found to remain very limited with NN and SVM, and the study advised that other models and pollutants should be considered (specifically, NOx and SO
2; currently, there is more focus on PM10 and PM2.5). The authors also suggested considering other models (such as ensemble learning or others) to improve model accuracy.
The above analysis presents some highlights of the recent literature conducted to study the features used and models designed to contribute in a feasible way, considering possible important factors that could affect gaseous concentrations within the complexity of the atmosphere components to be addressed. The existing literature showed limitations in the presented comparative geographical contexts (e.g., comparing countries or cities) in the field of AQP analysis. Another major point to consider is the lack of a unified framework to ultimately represent AQI across countries, which makes comparison for pollution levels almost impossible; hence, there is a need for a methodology to build an AQI framework. The current study seeks to contribute to emerging studies in this area.