A Short-Term Air Pollutant Concentration Forecasting Method Based on a Hybrid Neural Network and Metaheuristic Optimization Algorithms

Jalali, Hossein; Keynia, Farshid; Amirteimoury, Faezeh; Heydari, Azim

doi:10.3390/su16114829

Open AccessArticle

A Short-Term Air Pollutant Concentration Forecasting Method Based on a Hybrid Neural Network and Metaheuristic Optimization Algorithms

¹

Department of Energy Management and Optimization, Institute of Science and High Technology and Environmental Sciences, Graduate University of Advanced Technology, Kerman 7631885356, Iran

²

Department of Computer Engineering and Information Technology, Islamic Azad University of Kerman, Kerman 7635131167, Iran

³

Department of Astronautical, Electrical and Energy Engineering (DIAEE), Sapienza University of Rome, 00184 Rome, Italy

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(11), 4829; https://doi.org/10.3390/su16114829

Submission received: 9 March 2024 / Revised: 24 May 2024 / Accepted: 31 May 2024 / Published: 5 June 2024

(This article belongs to the Special Issue Novel Decision Technology Analytics for Evaluating Sustainable Strategies and Environmental Operations)

Download

Browse Figures

Versions Notes

Abstract

:

In the contemporary era, global air quality has been adversely affected by technological progress, urban development, population expansion, and the proliferation of industries and power plants. Recognizing the urgency of addressing air pollution consequences, the prediction of the concentration levels of air pollutants has become crucial. This study focuses on the short-term prediction of nitrogen dioxide (NO₂) and sulfur dioxide (SO₂), prominent air pollutants emitted by the Kerman Combined Cycle Power Plant, from May to September 2019. The proposed method utilizes a new two-step feature selection (FS) process, a hybrid neural network (HNN), and the Coot optimization algorithm (COOT). This combination of FS and COOT selects the most relevant input features while eliminating redundant ones, leading to improved prediction accuracy. The application of HNN for training further enhances the accuracy significantly. To assess the model’s performance, two datasets, including real data from two different parts of Combined Cycle Power Plant in Kerman, Iran, from 1 May 2019 to 30 September 2019 (namely dataset A and B), are utilized. Subsequently, mean square error (MSE), mean absolute error (MAE), root mean square deviation (RMSE), and mean absolute percentage error (MAPE) were employed to obtain the accuracy of FS-HNN-COOT. Experimental results showed MSE of FS-HNN-COOT for NO₂ ranged from 0.002 to 0.005, MAE from 0.016 to 0.0492, RMSE from 0.0142 to 0.0736, and MAEP from 4.21% to 8.69%. Also, MSE, MAE, RMSE, and MAPE ranged from 0.0001 to 0.0137, 0.0108 to 0.0908, 0.0137 to 0.1173, and 9.03% to 15.93%, respectively, for SO₂.

Keywords:

NO₂ forecasting; SO₂ forecasting; multi-objective optimization; hybrid neural network

1. Introduction

It is evident that air pollution has become one of the most critical challenges faced by modern societies. Air pollutants originate from a variety of sources, including both man-made and environmental factors. Naturally occurring causes of pollutant emissions into the atmosphere include wildfires and volcanic activities. However, human activities, such as burning fossil fuels and industrial processes, contribute significantly to the overall pollutant emissions [1]. Another substantial man-made contributor to pollutant emissions is the ever-increasing demand for electricity. According to reports from the International Energy Agency in 2019, global electricity generation has surged by 129% compared to 1990, reaching a staggering 27,000 terawatt-hours. Notably, fossil fuels accounted for 62.8% of the world’s electricity energy production in the same year, as the reports show that a significant portion of global electricity demand is met through fossil fuel power plants. These power plants utilize coal, oil, and natural gas to generate electricity. Burning fossil fuels contributes to the release of significant amounts of hazardous gases, including ozone (O₃), carbon dioxide (CO₂), carbon monoxide (CO), nitrogen oxides (NO_x), sulfur oxides (SO_x), and hydrocarbons, into the atmosphere. These gases can have adverse effects on both human health and the environment. Addressing air pollution is a complex and ongoing process that necessitates collaboration among all members of society and governments. However, there are many efficient pollution forecasting methods which facilitate issuing advance warnings to the public, authorities, and decision makers about air quality.

There is a wide variety of approaches aimed at predicting air pollutant concentrations using data-driven methods. These methods can be broadly categorized into two main groups: statistical and artificial intelligence (AI)-based methods. Statistical methods rely on historical data to predict a future event. The most frequently used methods for this purpose are Autoregressive Moving Average (ARMA) and Auto-regressive Integrated Moving Average (ARIMA). While statistical methods can capture linear features in time series data, they may not adequately handle non-linear characteristics. AI-based approaches use past experiences, observations, and patterns to predict future values. Examples of AI-based methods include artificial neural networks (ANNs), Extreme Learning Machine (ELM), Multi-Layer Perceptron (MLP), support vector machine (SVM), long short-term memory (LSTM), bidirectional LSTM (BiSLTM), recurrent neural networks (RNNs), generative adversarial network (GAN), convolutional neural networks (CNNs), and gated recurrent unit (GRU). AI-based methods, unlike statistical methods, have good ability to obtain non-linear features; however, they are sensitive to the values of input parameters and learning parameters, risk falling into local optima, and are computationally complex [2,3]. To overcome these limitations, researchers have developed hybrid models. Hybrid models employ a wide range of methods including data decomposition, feature selection, optimization algorithms, and learning approaches to accurately predict future values. Decomposition methods break down a sequence into multiple sub-sequences and reduce the noise. Feature selection methods choose the most effective input features. Using feature selection techniques enhances the accuracy of the prediction model significantly. Since decomposition, feature selection, and learning approaches have many fine-tuning parameters, using optimization algorithms to find the most optimal values of these parameters enhances the prediction accuracy and reduces the training time.

In recent years, many studies have been conducted to predict air pollution concentration using the above-mentioned methods. For example, authors in [4] employed a hybrid model including a data decomposition technique, a multi-objective optimization algorithm, and ELM to predict air pollution. In their work, Gu et al. [5] applied a hybrid prediction model including Nonlinear Auto Regressive Moving Average with Exogenous Input and neural networks to predict particulate matter 2.5 (PM_2.5). Researchers in [6] developed an air pollution prediction model including two decomposition methods, an optimization algorithm, and BiLSTM neural networks. They firstly decomposed the air pollutant sequence with complete ensemble empirical mode decomposition with adaptive noise method (CEEMDAN) and obtained some sub-series. Subsequently, they used variational mode decomposition (VMD) as a secondary decomposition method for further denoising. Also, they applied an optimization algorithm named grey wolf optimizer to find the optimal values of VMD. Finally, BiLSTM neural networks were employed to train the model. In [7], authors introduce a model for the prediction of PM_2.5. Their proposed model included a combination of GRU based on encoder–decoder. They demonstrated that their model outperformed many benchmark models. Asaei-Moamam et al. in [8] proposed a framework of air quality prediction include GAN network. Tao et al. [9] introduce a model for air pollution prediction. In their work, they applied partial correlation and simulated annealing methods for the feature selection process. Subsequently, they used a combination of extremely randomized trees (ERT) and LSTM for the learning process. Also, they optimized the hyperparameter of LSTM with Bayesian optimization. In another study [10], authors used LSTM for the learning process of air pollution prediction. Moreover, they utilized genetic algorithm (GA) to fine-tune the hyperparameter of LSTM. Researchers in [11] used SVM to predict air pollution index. Authors in [12] developed an air pollution model using Pearson correlation, for feature selection, and BiLSTM with attention mechanism, for the learning process. Bekkar et al. [13] developed a novel model to predict PM_2.5. Their model included Pearson correlation for feature selectin. Subsequently, they applied a combination of CNN and LSTM for the learning process. Authors in [14] employed a combination of LSTM and deep autoencoder to predict PM concentration. In another study [15], authors used a combination of linear regression, ANN, and LSMT to forecast PM_2.5 concentration. To predict the concentration of PM₁₀ and PM_2.5, authors in [16] employed a combination of SVM, geographically weighted regression, ANN, and auto-regressive nonlinear neural network with external input. Mihirani et al. in [17] proposed a model to predict PM_2.5, SO₂, NO₂, and CO. They used various methods, including linear regression, lasso regression, random forest regression, and K-nearest neighbor regression. Their experimental findings demonstrated that random forest regression outperformed other models. In their work [18], Srivastava et al. proposed using SVM, random forest classifier, logistic regression, linear regression, and random forest regression to forecast air pollution. Their results demonstrated that random forest regression and random forest classification outperformed other models. In another study [19], authors introduced a novel air pollution method based on Spiking Neural Networks. Authors in [20] compared different deep learning models (LSTM, Bi-LSTM, Bi-RNN) and a statistical method (Kernel Ridge Regression) for air quality index prediction. Their finding demonstrated that the Bi-RNN model significantly outperformed all other models. By considering the difficulty of air pollution monitoring in megacities, Rabie et al. [21] developed a hybrid forecasting model, including CNN and BiLSTM neural network. Ozone pollution is highlighted in [22] as a major concern contributing to climate warming and crop productivity. In response to this issue, researchers employed a time series forecasting approach to analyze and predict future ozone levels. They also introduced a new method called the time selection layer in deep learning models to improve feature selection, enhancing prediction accuracy, model performance, and interpretability.

This survey aims at developing an air pollution prediction model based on a two-step feature selection approach, an optimization algorithm, and neural networks. For this purpose, we used real-world air pollutant data collected from statistics of Center of Kerman Combined Cycle Power Plant from May to September 2019, Kerman, Iran. The main contributions and novelties of this study include the following:

This paper employs a two-step feature selection model (FS) to carefully choose the most effective input variables, recognizing their crucial role in enhancing the forecasting model’s performance.
To optimize the process of two-step feature selection, this paper used the COOT optimization algorithm.
A novel forecasting model (FS-HNN-COOT) is introduced for the prediction of NO₂ and SO₂ emissions from the Combined Cycle Power Plant. Also, to fine-tune hypermeters of HNN, the COOT optimization algorithm was employed.
The impact of air pollution from manufacturing industry production is investigated across various months, utilizing two datasets. The effectiveness of the analysis is validated using real-world datasets.

The reminder of this paper is as follows. Section 2 describes the case study and Section 3 provides the methodology. The simulation and discussion of results are presented in Section 4. Finally, Section 5 and Section 6 include study limitations and the conclusion.

2. Case Study

Kerman Combined Cycle Power Plant, located in the third kilometer of Baghin road and twentieth kilometer of Kerman-Rafsanjan highway with geographical coordinates 11,230 N and 274,856 S, has 1912 MW capacity. It comprises eight gas units with 159 MW capacity and eight steam units with 160 MW capacity. The power plant occupies 120 hectares of land, of which about 60 hectares are green space. The energy source is about 70% natural gas and 30% diesel fuel. To generate one megawatt per hour (MWh) of electricity, the Kerman Combined Cycle Power Plant uses approximately 335 cubic meters of natural gas and 335 L of diesel fuel (with a ±10% tolerance).

Air Pollution Data

To assess the proposed method, we collected meteorological data, including wind speed, air temperature, and air pollutant concentrations (specifically nitrogen dioxide (NO₂) and sulfur dioxide (SO₂)) from the Kerman Combined Cycle Power Plant in Iran. The data spanned from 1 May 2019 to 30 September 2019. Notably, we recorded SO₂ and NO₂ levels at 3 h intervals. Our data collection involved two sets: Set A and Set B. Table 1 and Table 2 present statistical information on air pollutants for both sets, including average, minimum, maximum, and standard deviation values by month.

3. Methodology

3.1. Normalization

Normalization refers to a statistical method in which values measured on different scales are adjusted to a common scale. For example, in this survey, the scales of the meteorological data, such as air temperature, differ from those of the air pollutant data. Additionally, air temperature values are not directly comparable to air pollutant values. Consequently, in this survey, all input values are rescaled using the feature scaling method to fall within the range of [0, 1]. Equation (1) describes normalization within the range of [0, 1]; in this equation, X_i is an original value, X′_i is the normalized value, X_max is the maximum value, and X_min is the minimum value of original values:

X_{i}^{'} = \frac{X_{i} - X_{m i n}}{X_{m a x} - X_{m i n}}

(1)

3.2. Data Preparation

Data preparation refers to the process of cleaning and transforming the raw data. Converting a time series into a supervised learning model involves transforming the sequential data into a tabular format suitable for standard machine learning algorithms. Time series data consist of observations collected at regular intervals over time. Each observation has a timestamp and a corresponding value. In this study, we have four time series, including wind speed (W), air temperature (T), NO₂ concentration (N), and SO₂ concentration (S) time series. Therefore, we constructed four initial matrices. Then, these four matrices were combined to build the input matrix (known as the feature matrix). Target variables are the next time point’s value of NO₂ or SO₂. Shifting the target variables by the desired prediction horizon (BS) results in the generation of the target vector. The columns of the target vector and input matrix represent the number of training samples (TS). We used Equations (2) and (3) to build the initial input matrix and target vector.

I n p u t M a t r i x (B S * T S) = \begin{matrix} I V_{1} \\ I V_{2} \\ ⋮ \\ I V_{i} \end{matrix} [\begin{matrix} W_{h - j} & \dots & W_{h - 2} & W_{h - 1} \\ W_{h - (j + 1)} & \dots & W_{h - 3} & W_{h - 2} \\ ⋮ & ⋱ & ⋮ & ⋮ \\ W_{h - ((j + i) - 1)} & \dots & W_{h - (1 + i)} & W_{h - i} \end{matrix}] \begin{matrix} I V_{1 + (2 * i)} \\ I V_{2 + (2 * i)} \\ ⋮ \\ I V_{3 * i} \end{matrix} [\begin{matrix} N_{h - j} & \dots & N_{h - 2} & N_{h - 1} \\ N_{h - (j + 1)} & \dots & N_{h - 3} & N_{h - 2} \\ ⋮ & ⋱ & ⋮ & ⋮ \\ N_{h - ((j + i) - 1)} & \dots & N_{h - (1 + i)} & N_{h - i} \end{matrix}] \begin{matrix} I V_{1 + (3 * i)} \\ I V_{2 + (3 * i)} \\ ⋮ \\ I V_{4 * i} \end{matrix} [\begin{matrix} S_{h - j} & \dots & S_{h - 2} & S_{h - 1} \\ S_{h - (j + 1)} & \dots & S_{h - 3} & S_{h - 2} \\ ⋮ & ⋱ & ⋮ & ⋮ \\ S_{h - ((j + i) - 1)} & \dots & S_{h - (1 + i)} & S_{h - i} \end{matrix}]

(2)

{Target}_{1 * j} (y) = [x_{h - (j - 1)} \dots x_{h - 2} x_{h - 1} x_{h}]

(3)

In the next step, the input matrix and target vector are divided into two subsets, including training and test sets. A total of 80% of the data were allocated to training and the remaining 20% to testing.

3.3. Two-Step Feature Selection Model

In the process of developing a predictive model, feature selection refers to the procedure of applying some algorithms to reduce the dimensionality of data. Applying feature selection methods results in the elimination of irrelevant, redundant, and inconsistent input features [23]. Furthermore, feature selection methods improve the performance of the model, reduce computational complexity and training time, diminish required storage space, and build a model with generalizability. In this section, a two-step feature selection approach based on mutual information (MI) is presented (Figure 1). In the first step, to remove irrelevant features, MI measures the mutual dependence between the target vector and each feature (x).

The removal of irrelevant features

Take, for instance, Stotal = {x₁, x₂, x₃, …}, which is a set of input features, and y is the target vector. We calculated MI by Equation (4). In this equation,

P (x_{i}, y_{j})

represents the joint probability distribution between the input variable (

x_{i}

) and the target variable (

y_{i}

), and

P (x)

represents the probability distribution of the random variable x.

M I (x, y) = \sum_{i = 1}^{n} \sum_{j = 1}^{m} P (x_{i}, y_{j}) \log_{2} (\frac{P (x_{i}, y_{j})}{P (x_{i}) P (y_{j})})

(4)

The higher the MI (x, y) value, the higher the correlation between the input variable (x_i) and target vector. Furthermore, Equation (5) states that if the correlation value between an input variable and target vector is equal to or higher than TH1, the input variable is relevant to the target vector and should be selected as one of the input variables of the prediction model.

M I (x, y) \geq T H_{1}

(5)

In the second step, to eliminate redundant features, MI measures the mutual dependence between every two variables of the initial matrix.

The removal of redundant features

After the removal of irrelevant features from S_total, redundant features must be eliminated. To this end, take S1 as a set of input variables completely relevant to the target vector, more specifically, S1 ⊂ Stotal. To find redundant features, we used Equations (6) and (7). In these equations, x_i and x_j are two selected inputs.

R E (x_{i}, x_{j}) = |I G (x_{i}; x_{j}; y)|

(6)

I G (x_{i}; x_{j}; y) = M I [(x_{i}, x_{j}); y] - M I (x_{i}, y) - M I (x_{j}, y)

(7)

After calculating IG, we used Equation (8) in order to find redundant features.

I G (x_{i}; x_{j}; y) \geq T H_{2}

(8)

According to Equation (8), two input variables x_i and x_j that possesses an IG value equal to or higher than TH₂ are the same, and one of them should be eliminated. In this study, we calculated TH₁ and TH₂ using optimization methods. We will describe the optimization methods in following sections. After the removal of irrelevant and redundant features, the final set of S2, S2⊂ S1, is the input matrix of the neural networks. Figure 1 represents the process of two-step feature selection.

3.4. Learning Models

Air pollution data are nonlinear and time-variant. Due to these characteristics, a simple neural network is not able to create a high-performance prediction model. In comparison to a standard neural network, a hybrid model achieves enhanced performance and requires fewer training examples. In this study, we employed a learning model comprising three MLP neural networks trained using the Levenberg–Marquardt (LM) and Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithms. LM appears to be the fastest method to train medium-sized neural networks. BFGS is a type of second-order optimization algorithm. It belongs to a class of algorithms referred to as Quasi-Newton methods.

Weights are the learnable parameters of a neural network. When a neural network is trained, it is initialized with a set of random weights, but during the training process, weights are optimized and then transferred to the next layer. It means that each layer uses incoming weights to continue the training process. The proposed model consists of three MLPs. The first and third MLPs are trained with LM as the learning algorithm and the second MLP is trained with BFGS. After training the first MLP, its final weights are considered as the initial weights for the second MLP. The second MLP applies incoming weights and produces new weights, then transfers them to the third MLP. The last MLP employs incoming weights and makes the final prediction.

The most common error functions for regression problems are mean square error (MSE, Equation (9)), mean absolute error (MAE, Equation (10)), root mean square error (RMSE, Equation (11)), and mean absolute percent error (MAPE, Equation (12)).

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(x_{i A C T} - x_{i F O R})}^{2}

(9)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |x_{i A C T} - x_{i F O R}|

(10)

R M S E = \sqrt{\frac{1}{N}} \sum_{i = 1}^{N} {(x_{i A C T} - x_{i F O R})}^{2}

(11)

M A P E = 100 * \frac{1}{N} \sum_{i = 1}^{N} \frac{|(x_{i A C T} - x_{i F O R})|}{x_{i A C T}}

(12)

In these equations, x_iFOR is the ith predicted value and x_iACT is the ith actual value.

3.5. Optimization Algorithms

Optimization is defined as the problem of finding the best solution from all possible solutions. An integral part of research and practical applications is single-objective problems. But most real-world problems are multi-objective. In single-objective optimization problems, there is just one objective function to be optimized, but multi-objective problems involve more than one objective function to be optimized.

3.5.1. Non-Dominated Sorting Genetic Algorithm-II (NSGA-II)

Non-dominated Sorting Genetic Algorithm II (NSGA-II), proposed by Deb et al. in 2002, is a multi-objective optimization algorithm designed to address limitations of the earlier NSGA algorithm [24]. NSGA faced challenges like high computational cost and difficulty in setting the optimal value for the sharing parameter. NSGA-II addresses these issues by employing a fast non-dominated sorting approach and incorporating elitism, resulting in lower computational complexity. Additionally, NSGA-II excels at finding well-distributed solutions (superior spread) and achieving convergence near the true Pareto-optimal front. We will now delve into the core concepts of NSGA-II.

Dominance concept

Solution A dominates solution B if solution A is not worse than solution B in all objectives and solution A is superior to solution B in at least one objective. Because the dominance concept makes available comparing solutions with multiple objectives, it is employed in multi-objective optimization problems to find non-dominated solutions.

Swarm distance

To assess the density of solutions around a specific solution within the population, the mean distance between two points on either side of this solution along each objective function is computed. This measure acts as an approximation of the perimeter of the cuboid formed by considering the nearest neighbors as vertices, referred to as the crowding distance. For the ith solution within its front, the crowding distance is defined as the average side length of the cuboid. Equations (13)–(15) have been used to calculate the crowding distance.

d_{i}^{1} = \frac{f_{1} (X_{i + 1}) - f_{1} (X_{i - 1})}{f_{1}^{m a x} - f_{1}^{m i n}}

(13)

d_{i}^{2} = \frac{f_{2} (X_{i + 1}) - f_{2} (X_{i - 1})}{f_{2}^{m a x} - f_{2}^{m i n}}

(14)

d_{i} = d_{i}^{1} + d_{i}^{2}

(15)

Crowded-Comparison Operator

The crowded-comparison operator (<n) directs the selection process at different algorithmic stages, aiming for a uniformly distributed Pareto-optimal front. Each individual in the population is assumed to have two attributes: nondomination rank (i_rank) and crowding distance (i_distance). Consequently, a partial order (<n) is established, as described in Equation (16). When comparing two solutions with distinct nondomination ranks, preference is given to the solution with the superior rank. In cases where both solutions belong to the same front, priority is given to the solution situated in a less crowded region.

i ≺_{n} j i f i_{r a n k} < j_{r a n k} o r ((i_{r a n k} = j_{r a n k}) a n d (i_{d i s t a n c e} > j_{d i s t a n c e}))

(16)

Main Loop

In each generation of t, initially, the offspring population of Q_t using the parent population of P_t and the genetic algorithm operator is created. Then, these two populations are combined, and the new population, named R_t with population size of 2N, is generated. Afterward, R_t is divided into non-dominated classes, and the new population is filled with different non-dominated front points. The filling starts with first non-dominated front (from class 1) and continues with points of the second non-dominated front, and so on. Once non-dominated sorting is completed, because the overall size of R_t is 2N, it is not possible to accommodate all the fronts in N slots. In order to reduce the size of R_t from 2N to standard size, which is N and represents the size of parent population, crowding distance should be calculated for all member of R_t classes and then sorted based on the crowded-comparison operator. Next, the first N individuals are selected to make the parent population P_t+1. Then, the new population P_t+1 of size N is used for selection, crossover, and mutation to create a new offspring population Q_t+1 of size N. This cycle continues to exist until stop conditions are met. Figure 2 shows the main loop of the NSGA-II algorithm.

3.5.2. Coot Optimization Algorithm

The coot optimization algorithm is a metaheuristic optimization algorithm based on population. The coot is a type of water bird. The collective behavior of coots foraging for food is the source inspiration for the algorithm [25].

Similar to other population-based optimization algorithms, the algorithm’s initial population is randomly generated. Then, the initial population is divided into two groups, including group leaders and ordinary coots. Afterwards, the fitness of each solution is calculated, and subsequently, the best fitness values of group leaders and coots are stored in two different variables. Also, the best fitness value of both group leaders and coots is stored in a variable representing the best solution. If stop conditions are not met, the next step is the random movements of coots. Random movement is followed by chain movement. The average position of two coots can be used to implement chain movement. More specifically, the new position of coot i, based on chain movement, is the average position of coot i and coot i − 1. In the next step, coots must adjust their positions based on the group leaders. To this end, several groups based on the number of leaders are formed, and each coot chooses one leader and moves toward the group led by the chosen leader. Finally, all leaders move toward the optimal area. Figure 3 represents the coot optimization algorithm flowchart.

4. Simulation and Discussion of Results

In this study, after data normalization and constructing input and targe matrices, two-step feature selection was employed. The feature selection approach utilizes two key parameters, TH₁ and TH₂, which significantly impact the prediction accuracy. Following feature selection, the preprocessed data are fed into the MLPs. The number of hidden layer nodes (NH) in each MLP can vary considerably, and this value also significantly affects the model’s prediction accuracy. To find out the optimal values of TH₁, TH₂, and N_H, we applied the coot optimization algorithm. The coot optimization algorithm focuses on minimizing the MAPE of the test data.

To verify the efficiency of the proposed model, we constructed several benchmark models, including FS-HNN-NSGA_II, FS-MLP-NSGA_II, FS-MLP-COOT, FS-HNN, and FS-MLP. The goal of using the NSGA-II optimization algorithm in the FS-HNN-NSGA_II model is to find solutions that simultaneously minimize the MAPE of the test data and the RMSE of the training data. In fact, we wanted to compare the effectiveness of NSGA-II with the coot optimization algorithm. Figure 4 represents the process of finding the optimal values of TH₁, TH₂, and N_H with NSGA-II and the coot optimization algorithm.

As illustrated in Figure 4, in the first step of the proposed method, we initialized TH₁, TH₂, and N_H randomly. Then, we ran two-step feature selection and fed its outputs to the hybrid neural network. The hybrid neural network generates a prediction, and the MAPE and RMSE are calculated to evaluate the prediction accuracy. This process continues iteratively until a stopping criterion is met. The stopping criteria can be reaching the maximum number of iterations allowed or identifying the optimal values for TH1, TH2, and NH. The overall flowchart of the proposed method is illustrated in Figure 5.

Results

In this study, we leveraged air pollutant data, including NO₂ and SO₂, in conjunction with meteorological variables such as wind speed and air temperature to predict future values of NO₂ and SO₂. Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8 provide experimental results of applying the proposed model and benchmark models on dataset A and dataset B.

Analyzing the results reveals an interesting finding. As the results show, the MAPE of FS-MLP-NSGA_II for NO₂ and SO₂ is 21.06% and 18.41%, respectively. In contrast, the MAPE of FS-MLP-COOT for NO₂ and SO₂ is 21% and 17.91%, respectively. This comparison proves the superiority of the coot optimization algorithm. On the other hand, the MAPE of the FS-HNN-NSGA_II model for NO₂ and SO₂ is 18.644% and 15.159%, while the MAPE of the proposed model is 4.215% and 12.534%. Interestingly, although NSGA-II was designed to optimize two objective functions for parameter tuning, the COOT optimization algorithm consistently achieved superior performance in terms of model performance for both datasets across various test months. In the comparison between the hybrid neural network (HNN) and the multilayer perceptron (MLP), the results show that FS-HNN outperforms FS-MLP for NO₂ prediction using MAPE. However, the opposite is true for SO₂ prediction, where FS-MLP achieves better accuracy based on MAPE. As the numerical results demonstrated, FS-HNN-COOT outperformed other models. This improved prediction accuracy is attributed to two main factors: the combination of the two-step feature selection method with coot and the use of HNN. First, the two-step feature selection method ranks each input feature using mutual information, which measures the dependency between the feature and the decision variable. This ranking is expressed as numerical values that indicate the relevance of each feature. While this feature selection approach is highly powerful, it raises the important question of which features should be selected as the most relevant and which should be eliminated as redundant. This is where coot enhances the effectiveness of the feature selection process. By determining the optimal threshold values for selecting the most relevant input features and eliminating redundancies, this approach demonstrated its superiority compared to other feature selection models. Second, the application of HNN ensures that in each step, optimized weights are transferred to the next step. This effectively enhances the prediction accuracy of the HNNs.

In addition, Figure 6 provides a visual representation of the results, showcasing two distinct sections. The first section presents a comprehensive overview of the actual air pollution data alongside the corresponding predictions. Meanwhile, the second section offers a magnified view specifically focused on the test data and its corresponding predictions.

Furthermore, Figure 7 illustrates the Pareto front for NO2 and SO2 values within dataset A for the same month. These visualizations contribute significantly to enhancing our understanding of the forecasting capabilities of the proposed model, providing valuable insights into the trade-offs between NO₂ and SO₂ predictions.

In summary, using coot optimization algorithms, integrated with a two-step feature selection method and a hybrid neural network architecture, demonstrates promising results in forecasting air pollution levels. The presented tables and figures offer a comprehensive evaluation and visualization of the model’s performance on datasets A and B, substantiating its effectiveness in predicting NO₂ and SO₂ concentrations.

5. Limitations

While this study offers valuable insights into the dynamics of NO₂ and SO₂ pollution and their impacts, it is important to acknowledge several limitations. Considering the role of other air pollutants would be ideal for a more comprehensive analysis. Unfortunately, data on other pollutants were not available in this study, which is a limitation. This constraint highlights the need for more extensive data collection in future research to account for the multifaceted nature of air pollution.

6. Conclusions

Air pollution is a growing global crisis, posing significant threats to public health and the environment. Accurate air pollution prediction can not only prevent environmental risks to public health but also inform policymakers in developing effective strategies for air pollution control. In this study, a hybrid air pollution method was proposed comprising a two-step feature selection approach, three MLPs with different learning algorithms, and the coot optimization algorithm. Furthermore, this study constructed five benchmark models including FS-HNN-NSGA_II, FS-MLP-NSGA_II, FS-MLP-COOT, FS-HNN, and FS-MLP to predict concentrations of major air pollutants and meteorological data collected from the Kerman Combined Cycle Power Plant. A comparison between the results of the proposed model and the benchmark models demonstrated the superiority of the proposed model during the process of air pollution prediction.

Author Contributions

Conceptualization, all authors.; methodology, H.J. and F.A.; software, H.J. and F.A.; validation, H.J. and F.A.; writing—original draft preparation, H.J. and F.A.; writing—review and editing, all authors; supervision, F.K. and A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nakhjiri, A.; Kakroodi, A.A. Air pollution in industrial clusters: A comprehensive analysis and prediction using multi-source data. Ecol. Inform. 2024, 80, 102504. [Google Scholar] [CrossRef]
Maciąg, P.S.; Bembenik, R.; Piekarzewicz, A.; Del Ser, J.; Lobo, J.L.; Kasabov, N.K. Effective air pollution prediction by combining time series decomposition with stacking and bagging ensembles of evolving spiking neural networks. Environ. Model. Softw. 2023, 170, 105851. [Google Scholar] [CrossRef]
Ding, Z.; Chen, H.; Zhou, L.; Wang, Z. A forecasting system for deterministic and uncertain prediction of air pollution data. Expert Syst. Appl. 2022, 208, 118123. [Google Scholar] [CrossRef]
Bai, L.; Liu, Z.; Wang, J. Novel hybrid extreme learning machine and multi-objective optimization algorithm for air pollution prediction. Appl. Math. Model. 2022, 106, 177–198. [Google Scholar] [CrossRef]
Gu, Y.; Li, B.; Meng, Q. Hybrid interpretable predictive machine learning model for air pollution prediction. Neurocomputing 2022, 468, 123–136. [Google Scholar] [CrossRef]
Wu, F.; Min, P.; Jin, Y.; Zhang, K.; Liu, H.; Zhao, J.; Li, D. A novel hybrid model for hourly PM25 prediction considering air pollution factors meteorological parameters, GNSS-ZTD. Environ. Model. Softw. 2023, 167, 105780. [Google Scholar] [CrossRef]
Shakya, D.; Deshpande, V.; Goyal, M.K.; Agarwal, M. PM2.5 air pollution prediction through deep learning using meteorological, vehicular, and emission data: A case study of New Delhi, India. J. Clean. Prod. 2023, 427, 139278. [Google Scholar] [CrossRef]
Asaei-Moamam, Z.-S.; Safi-Esfahani, F.; Mirjalili, S.; Mohammadpour, R.; Nadimi-Shahraki, M.-H. Air quality particulate-pollution prediction applying GAN network and the Neural Turing Machine. Appl. Soft Comput. 2023, 147, 110723. [Google Scholar] [CrossRef]
Tao, H.; Jawad, A.H.; Shather, A.; Al-Khafaji, Z.; Rashid, T.A.; Ali, M.; Al-Ansari, N.; Marhoon, H.A.; Shahid, S.; Yaseen, Z.M. Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters. Environ. Int. 2023, 175, 107931. [Google Scholar] [CrossRef] [PubMed]
Drewil, G.I.; Al-Bahadili, R.J. Air pollution prediction using LSTM deep learning and metaheuristics algorithms. Meas. Sens. 2022, 24, 100546. [Google Scholar] [CrossRef]
Leong, W.C.; Kelani, R.O.; Ahmad, Z. Prediction of air pollution index (API) using support vector machine (SVM). J. Environ. Chem. Eng. 2020, 8, 103208. [Google Scholar] [CrossRef]
Jia, T.; Cheng, G.; Chen, Z.; Yang, J.; Li, Y. Forecasting urban air pollution using multi-site spatiotemporal data fusion method (Geo-BiLSTMA). Atmos. Pollut. Res. 2024, 15, 102107. [Google Scholar] [CrossRef]
Bekkar, A.; Hssina, B.; Douzi, S.; Douzi, K. Air-pollution prediction in smart city, deep learning approach. J. Big Data 2021, 8, 161. [Google Scholar] [CrossRef]
Xayasouk, T.; Lee, H.; Lee, G. Air pollution prediction using long short-term memory (LSTM) and deep autoencoder (DAE) models. Sustainability 2020, 12, 2570. [Google Scholar] [CrossRef]
Sinnott, R.O.; Guan, Z. Prediction of Air Pollution through Machine Learning Approaches on the Cloud. In Proceedings of the 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT), Zurich, Switzerland, 17–20 December 2018. [Google Scholar]
Delavar, M.R.; Gholami, A.; Shiran, G.R.; Rashidi, Y.; Nakhaeizadeh, G.R.; Fedra, K.; Afshar, S.H. A novel method for improving air pollution prediction based on machine learning approaches: A case study applied to the capital city of Tehran. ISPRS Int. J. Geo-Inf. 2019, 8, 99. [Google Scholar] [CrossRef]
Mihirani, M.; Yasakethu, L.; Balasooriya, S. Machine Learning-based Air Pollution Prediction Model. In Proceedings of the 2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET), London, UK, 19–21 May 2023. [Google Scholar]
Srivastava, H.; Sahoo, G.K.; Das, S.K.; Singh, P. Performance Analysis of Machine Learning Models for Air Pollution Prediction. In Proceedings of the 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 23–25 December 2022. [Google Scholar]
Maciąg, P.S.; Kasabov, N.; Kryszkiewicz, M.; Bembenik, R. Air pollution prediction with clustering-based ensemble of evolving spiking neural networks and a case study for London area. Environ. Model. Softw. 2019, 118, 262–280. [Google Scholar] [CrossRef]
Pande, C.B.; Kushwaha, N.L.; Alawi, O.A.; Sammen, S.S.; Sidek, L.M.; Yaseen, Z.M.; Pal, S.C.; Katipoğlu, O.M. Daily scale air quality index forecasting using bidirectional recurrent neural networks: Case study of Delhi, India. Environ. Pollut. 2024, 351, 124040. [Google Scholar] [CrossRef]
Rabie, R.; Asghari, M.; Nosrati, H.; Niri, M.E.; Karimi, S. Spatially resolved air quality index prediction in megacities with a CNN-Bi-LSTM hybrid framework. Sustain. Cities Soc. 2024, 109, 105537. [Google Scholar] [CrossRef]
Jiménez-Navarro, M.J.; Martínez-Ballesteros, M.; Martínez-Álvarez, F.; Asencio-Cortés, G. Explaining deep learning models for ozone pollution prediction via embedded feature selection. Appl. Soft Comput. 2024, 157, 111504. [Google Scholar] [CrossRef]
Amjady, N.; Keynia, F. A new prediction strategy for price spike forecasting of day-ahead electricity markets. Appl. Soft Comput. 2011, 11, 4246–4256. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Naruei, I.; Keynia, F. A new optimization method based on COOT bird natural life model. Expert Syst. Appl. 2021, 183, 115352. [Google Scholar] [CrossRef]

Figure 1. The general process of two-step feature selection.

Figure 2. Schematic diagram of NSGA-II algorithm.

Figure 3. Coot optimization algorithm.

Figure 4. The structure of NSGA-Ⅱ and coot optimization algorithm.

Figure 5. The process of finding the optimal values of TH₁, TH₂, and N_H.

Figure 6. Training, test, and predicted NO₂ and SO₂ values of A dataset in May 2019.

Figure 7. Pareto front for NO₂ and SO₂ values of dataset A in May 2019.

Table 1. Statistical data of dataset A.

Month	Air Pollutant	Average	Minimum	Maximum	Standard Deviation
May	NO₂	1.2174	0.73	2.02	0.2193
May	SO₂	0.2672	0.11	0.43	0.0791
June	NO₂	1.2210	0.4	1.84	0.2004
June	SO₂	0.2780	0.11	0.48	0.0772
July	NO₂	1.0176	0.26	1.73	0.2333
July	SO₂	0.2812	0.11	0.43	0.0756
August	NO₂	1.1163	0.69	1.47	0.1653
August	SO₂	0.2776	0.06	0.43	0.0795
September	NO₂	0.1556	0.74	1.68	0.1946
September	SO₂	0.0549	0.06	0.43	0.0693

Table 2. Statistical data of dataset B.

Month	Air Pollutant	Average	Minimum	Maximum	Standard Deviation
May	NO₂	1.4680	0.01	8.76	1.4308
May	SO₂	0.2748	0.04	1.91	0.1435
June	NO₂	1.1494	0.1	1.9	0.2015
June	SO₂	0.2767	0.06	0.48	0.0767
July	NO₂	1.0363	0.21	2.19	0.2601
July	SO₂	0.3268	0.06	2.26	0.2784
August	NO₂	1.1126	0.71	1.55	0.1712
August	SO₂	0.2792	0.11	0.43	0.0731
September	NO₂	1.0782	0.01	1.66	0.3094
September	SO₂	0.2775	0.11	0.48	0.0689

Table 3. MSE, MAE, and RMSE error of training data versus MAPE error of test data using FS-HNN-NSGA_II on dataset A.

Output Variables	Input Variables	Month	Error Indices
Output Variables	Input Variables	Month	FS-HNN-NSGA_II
			MSE	MAE	RMSE	MAPE
NO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0037	0.0468	0.0615	10.5709
		June	0.0030	0.0460	0.0549	7.7062
		July	0.0053	0.0580	0.0734	10.55
		August	0.0086	0.0648	0.0929	9.6795
		September	0.0152	0.0887	0.1234	18.6441
SO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0132	0.0697	0.1149	13.8924
		June	0.0112	0.0770	0.1061	19.2530
		July	0.0197	0.0996	0.1404	18.0187
		August	0.0268	0.1237	0.1638	22.2917
		September	0.1161	0.0908	0.1161	15.1599

Table 4. MSE, MAE, and RMSE error of training data versus MAPE error of test data using FS-MLP-NSGA_II on dataset A.

Output Variables	Input Variables	Month	Error Indices
Output Variables	Input Variables	Month	FS-MLP-NSGA_II
			MSE	MAE	RMSE	MAPE
NO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0057	0.0475	0.0701	14.50
		June	0.0044	0.0499	0.0669	9.64
		July	0.0071	0.0593	0.0802	11.34
		August	0.0092	0.0689	0.0991	12.36
		September	0.0172	0.0907	0.1934	21.06
SO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0161	0.0921	0.1232	14.92
		June	0.0133	0.0991	0.1436	23.91
		July	0.0199	0.1216	0.1623	20.11
		August	0.0281	0.1657	0.1782	23.31
		September	0.1201	0.1778	0.1389	18.41

Table 5. MSE, MAE, and RMSE error of training data versus MAPE error of test data using FS-MLP-coot on dataset A.

Output Variables	Input Variables	Month	Error Indices
Output Variables	Input Variables	Month	FS-MLP-COOT
			MSE	MAE	RMSE	MAPE
NO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0047	0.0465	0.0700	13.50
		June	0.0039	0.0474	0.0654	9.34
		July	0.0066	0.0592	0.0791	11.21
		August	0.0087	0.0676	0.0988	12.31
		September	0.0112	0.0934	0. 1911	21.00
SO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0134	0.0916	0.1211	14.77
		June	0.0116	0.0988	0.1422	23.81
		July	0.0100	0.1211	0.1589	20.00
		August	0.0266	0.1599	0.1779	23.01
		September	0.1160	0.1777	0.1374	17.91

Table 6. MSE, MAE, and RMSE error of training data versus MAPE error of test data using FS-HNN on dataset A.

Output Variables	Input Variables	Month	Error Indices
Output Variables	Input Variables	Month	FS-HNN
			MSE	MAE	RMSE	MAPE
NO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0073	0.0518	0.0755	17.01
		June	0.0081	0.0559	0.0689	10.31
		July	0.0165	0.0664	0.0841	12.34
		August	0.0145	0.0679	0.0963	15.58
		September	0.0186	0.1156	0. 1944	23.26
SO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0156	0.0988	0.1277	15.99
		June	0.0142	0.1457	0.1465	26.35
		July	0.0189	0.1276	0.1687	25.15
		August	0.0232	0.1721	0.1788	27.11
		September	0.1255	0.1782	0.1400	21.06

Table 7. MSE, MAE, and RMSE error of training data versus MAPE error of test data using FS-MLP on dataset A.

Output Variables	Input Variables	Month	Error Indices
Output Variables	Input Variables	Month	FS-MLP
			MSE	MAE	RMSE	MAPE
NO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0093	0.0523	0.0767	17.50
		June	0.0082	0.0568	0.0699	11.34
		July	0.0134	0.0667	0.0843	13.34
		August	0.0167	0.0699	0.0978	16.38
		September	0.0194	0.1127	0. 1965	25.06
SO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0189	0.0987	0.1282	17.64
		June	0.0151	0.1453	0.1488	27.94
		July	0.0211	0.1289	0.1699	26.17
		August	0.0299	0.1691	0.1791	28.77
		September	0.1243	0.1797	0.1411	20.66

Table 8. MSE, MAE, and RMSE error of training data versus MAPE error of test data using FS-HNN-coot on dataset B.

Output Variables	Input Variables	Month	Error Indices
Output Variables	Input Variables	Month	FS-HNN-COOT
			MSE	MAE	RMSE	MAPE
NO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0002	0.0160	0.0142	8.6992
		June	0.0016	0.0331	0.0405	5.6254
		July	0.0011	0.0492	0.0333	7.0860
		August	0.0054	0.0439	0.0736	8.3494
		September	0.0007	0.0241	0.0273	4.2154
SO₂	Wind speed, air temperature, the value of NO₂ one hour ago, the value of SO₂ one hour ago	May	0.0006	0.0645	0.0247	11.0186
		June	0.0049	0.0526	0.0706	9.0310
		July	0.0001	0.0108	0.0137	11.1684
		August	0.0137	0.0538	0.1173	15.9380
		September	0.0042	0.0908	0.0649	12.5348

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jalali, H.; Keynia, F.; Amirteimoury, F.; Heydari, A. A Short-Term Air Pollutant Concentration Forecasting Method Based on a Hybrid Neural Network and Metaheuristic Optimization Algorithms. Sustainability 2024, 16, 4829. https://doi.org/10.3390/su16114829

AMA Style

Jalali H, Keynia F, Amirteimoury F, Heydari A. A Short-Term Air Pollutant Concentration Forecasting Method Based on a Hybrid Neural Network and Metaheuristic Optimization Algorithms. Sustainability. 2024; 16(11):4829. https://doi.org/10.3390/su16114829

Chicago/Turabian Style

Jalali, Hossein, Farshid Keynia, Faezeh Amirteimoury, and Azim Heydari. 2024. "A Short-Term Air Pollutant Concentration Forecasting Method Based on a Hybrid Neural Network and Metaheuristic Optimization Algorithms" Sustainability 16, no. 11: 4829. https://doi.org/10.3390/su16114829

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Short-Term Air Pollutant Concentration Forecasting Method Based on a Hybrid Neural Network and Metaheuristic Optimization Algorithms

Abstract

1. Introduction

2. Case Study

Air Pollution Data

3. Methodology

3.1. Normalization

3.2. Data Preparation

3.3. Two-Step Feature Selection Model

3.4. Learning Models

3.5. Optimization Algorithms

3.5.1. Non-Dominated Sorting Genetic Algorithm-II (NSGA-II)

3.5.2. Coot Optimization Algorithm

4. Simulation and Discussion of Results

Results

5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI