It is evident that air pollution has become one of the most critical challenges faced by modern societies. Air pollutants originate from a variety of sources, including both man-made and environmental factors. Naturally occurring causes of pollutant emissions into the atmosphere include wildfires and volcanic activities. However, human activities, such as burning fossil fuels and industrial processes, contribute significantly to the overall pollutant emissions [
1]. Another substantial man-made contributor to pollutant emissions is the ever-increasing demand for electricity. According to reports from the International Energy Agency in 2019, global electricity generation has surged by 129% compared to 1990, reaching a staggering 27,000 terawatt-hours. Notably, fossil fuels accounted for 62.8% of the world’s electricity energy production in the same year, as the reports show that a significant portion of global electricity demand is met through fossil fuel power plants. These power plants utilize coal, oil, and natural gas to generate electricity. Burning fossil fuels contributes to the release of significant amounts of hazardous gases, including ozone (O
3), carbon dioxide (CO
2), carbon monoxide (CO), nitrogen oxides (NO
x), sulfur oxides (SO
x), and hydrocarbons, into the atmosphere. These gases can have adverse effects on both human health and the environment. Addressing air pollution is a complex and ongoing process that necessitates collaboration among all members of society and governments. However, there are many efficient pollution forecasting methods which facilitate issuing advance warnings to the public, authorities, and decision makers about air quality.
There is a wide variety of approaches aimed at predicting air pollutant concentrations using data-driven methods. These methods can be broadly categorized into two main groups: statistical and artificial intelligence (AI)-based methods. Statistical methods rely on historical data to predict a future event. The most frequently used methods for this purpose are Autoregressive Moving Average (ARMA) and Auto-regressive Integrated Moving Average (ARIMA). While statistical methods can capture linear features in time series data, they may not adequately handle non-linear characteristics. AI-based approaches use past experiences, observations, and patterns to predict future values. Examples of AI-based methods include artificial neural networks (ANNs), Extreme Learning Machine (ELM), Multi-Layer Perceptron (MLP), support vector machine (SVM), long short-term memory (LSTM), bidirectional LSTM (BiSLTM), recurrent neural networks (RNNs), generative adversarial network (GAN), convolutional neural networks (CNNs), and gated recurrent unit (GRU). AI-based methods, unlike statistical methods, have good ability to obtain non-linear features; however, they are sensitive to the values of input parameters and learning parameters, risk falling into local optima, and are computationally complex [
2,
3]. To overcome these limitations, researchers have developed hybrid models. Hybrid models employ a wide range of methods including data decomposition, feature selection, optimization algorithms, and learning approaches to accurately predict future values. Decomposition methods break down a sequence into multiple sub-sequences and reduce the noise. Feature selection methods choose the most effective input features. Using feature selection techniques enhances the accuracy of the prediction model significantly. Since decomposition, feature selection, and learning approaches have many fine-tuning parameters, using optimization algorithms to find the most optimal values of these parameters enhances the prediction accuracy and reduces the training time.
In recent years, many studies have been conducted to predict air pollution concentration using the above-mentioned methods. For example, authors in [
4] employed a hybrid model including a data decomposition technique, a multi-objective optimization algorithm, and ELM to predict air pollution. In their work, Gu et al. [
5] applied a hybrid prediction model including Nonlinear Auto Regressive Moving Average with Exogenous Input and neural networks to predict particulate matter 2.5 (PM
2.5). Researchers in [
6] developed an air pollution prediction model including two decomposition methods, an optimization algorithm, and BiLSTM neural networks. They firstly decomposed the air pollutant sequence with complete ensemble empirical mode decomposition with adaptive noise method (CEEMDAN) and obtained some sub-series. Subsequently, they used variational mode decomposition (VMD) as a secondary decomposition method for further denoising. Also, they applied an optimization algorithm named grey wolf optimizer to find the optimal values of VMD. Finally, BiLSTM neural networks were employed to train the model. In [
7], authors introduce a model for the prediction of PM
2.5. Their proposed model included a combination of GRU based on encoder–decoder. They demonstrated that their model outperformed many benchmark models. Asaei-Moamam et al. in [
8] proposed a framework of air quality prediction include GAN network. Tao et al. [
9] introduce a model for air pollution prediction. In their work, they applied partial correlation and simulated annealing methods for the feature selection process. Subsequently, they used a combination of extremely randomized trees (ERT) and LSTM for the learning process. Also, they optimized the hyperparameter of LSTM with Bayesian optimization. In another study [
10], authors used LSTM for the learning process of air pollution prediction. Moreover, they utilized genetic algorithm (GA) to fine-tune the hyperparameter of LSTM. Researchers in [
11] used SVM to predict air pollution index. Authors in [
12] developed an air pollution model using Pearson correlation, for feature selection, and BiLSTM with attention mechanism, for the learning process. Bekkar et al. [
13] developed a novel model to predict PM
2.5. Their model included Pearson correlation for feature selectin. Subsequently, they applied a combination of CNN and LSTM for the learning process. Authors in [
14] employed a combination of LSTM and deep autoencoder to predict PM concentration. In another study [
15], authors used a combination of linear regression, ANN, and LSMT to forecast PM
2.5 concentration. To predict the concentration of PM
10 and PM
2.5, authors in [
16] employed a combination of SVM, geographically weighted regression, ANN, and auto-regressive nonlinear neural network with external input. Mihirani et al. in [
17] proposed a model to predict PM
2.5, SO
2, NO
2, and CO. They used various methods, including linear regression, lasso regression, random forest regression, and K-nearest neighbor regression. Their experimental findings demonstrated that random forest regression outperformed other models. In their work [
18], Srivastava et al. proposed using SVM, random forest classifier, logistic regression, linear regression, and random forest regression to forecast air pollution. Their results demonstrated that random forest regression and random forest classification outperformed other models. In another study [
19], authors introduced a novel air pollution method based on Spiking Neural Networks. Authors in [
20] compared different deep learning models (LSTM, Bi-LSTM, Bi-RNN) and a statistical method (Kernel Ridge Regression) for air quality index prediction. Their finding demonstrated that the Bi-RNN model significantly outperformed all other models. By considering the difficulty of air pollution monitoring in megacities, Rabie et al. [
21] developed a hybrid forecasting model, including CNN and BiLSTM neural network. Ozone pollution is highlighted in [
22] as a major concern contributing to climate warming and crop productivity. In response to this issue, researchers employed a time series forecasting approach to analyze and predict future ozone levels. They also introduced a new method called the time selection layer in deep learning models to improve feature selection, enhancing prediction accuracy, model performance, and interpretability.
This survey aims at developing an air pollution prediction model based on a two-step feature selection approach, an optimization algorithm, and neural networks. For this purpose, we used real-world air pollutant data collected from statistics of Center of Kerman Combined Cycle Power Plant from May to September 2019, Kerman, Iran. The main contributions and novelties of this study include the following: