Next Article in Journal
Unequal Distribution of Overweight Adolescents in Immigrant-Rich Areas: Analysis of Disparities among Public and Private School Students in Shanghai, China
Previous Article in Journal
Risk Estimates and Risk Factors Related to Psychiatric Inpatient Suicide—An Overview
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research and Application of an Air Quality Early Warning System Based on a Modified Least Squares Support Vector Machine and a Cloud Model

School of Statistics, Dongbei University of Finance and Economics, Dalian 116025, China
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2017, 14(3), 249; https://doi.org/10.3390/ijerph14030249
Submission received: 30 November 2016 / Revised: 11 February 2017 / Accepted: 24 February 2017 / Published: 2 March 2017
(This article belongs to the Section Global Health)

Abstract

:
The worsening atmospheric pollution increases the necessity of air quality early warning systems (EWSs). Despite the fact that a massive amount of investigation about EWS in theory and practicality has been conducted by numerous researchers, studies concerning the quantification of uncertain information and comprehensive evaluation are still lacking, which impedes further development in the area. In this paper, firstly a comprehensive warning system is proposed, which consists of two vital indispensable modules, namely effective forecasting and scientific evaluation, respectively. For the forecasting module, a novel hybrid model combining the theory of data preprocessing and numerical optimization is first developed to implement effective forecasting for air pollutant concentration. Especially, in order to further enhance the accuracy and robustness of the warning system, interval forecasting is implemented to quantify the uncertainties generated by forecasts, which can provide significant risk signals by using point forecasting for decision-makers. For the evaluation module, a cloud model, based on probability and fuzzy set theory, is developed to perform comprehensive evaluations of air quality, which can realize the transformation between qualitative concept and quantitative data. To verify the effectiveness and efficiency of the warning system, extensive simulations based on air pollutants data from Dalian in China were effectively implemented, which illustrate that the warning system is not only remarkably high-performance, but also widely applicable.

1. Introduction

1.1. Motivation

With the high-speed growth of the industrial economy in the past decades, atmospheric pollution has been acknowledged as one of the most serious environmental issues, because it not only threatens environmental security, but also induces adverse effects on health [1,2]. Additionally, particulate matter (PM) can also cause many environmental problems such as corrosion, soiling, damage to vegetation and reduced visibility [3]. Accordingly, modeling, forecasting and evaluating air quality play a significant and pivotal part in the early management and warning. However, although they are very vital, relevant studies regarding air quality forecasting and evaluation are still insufficient. High-efficiency forecasting for air quality has the capability to aid the public take effective initiatives to address air pollution, which can reduce the risk of falling ill and enhance living standards. Additionally, scientific evaluation of forecasting results is also an effective means to foresee the diversification of air quality levels. The assessment of air quality is a multiple criteria decision-making process, which can achieve a qualitative evaluation via addressing quantitative information. Effective forecasting and evaluation for air quality can also provide remarkable information to government policymakers to draw up scientific emission policies. Given the aforementioned analysis, a scientific early warning system (EWS) for air quality is urgently needed.

1.2. Literature Review

There are a variety of tools that are used to forecast air pollutants concentration, which can be classified into two major models: deterministic models and empirical models [4]. The prevalent deterministic models are chemical transport models (CTMs), which are based on simulating the special mechanisms of atmospheric physics and chemistry. The primary studies on CTMs concentrate on the analysis of pollution sources and the transport of chemical species. Different chemical mechanisms, chemical kinetic expressions, reaction rate coefficients, chemical species and gas phase reactions are usually incorporated into very complex models [5]. The accuracy of CTMs is sensitive to the scale and quality of the emissions data used [6], largely stemming from the incomplete knowledge on the sources, dispersion of PM, transport processes and atmospheric chemicals [4]. Accordingly, compared to empirical models, CTMs is less accurate. Empirical models mainly involve multiple linear regression (MLR), autoregressive integrated moving average model (ARIMA), hidden Markov model and artificial intelligence models, which are generally applied in air pollutant forecasting [5,7,8,9]. However, the most prevalent model for air pollutant forecasting is based on the theory of artificial intelligence, which is efficient and accurate in practical application.
Artificial intelligence models that are exploited to forecast air pollutant concentration mainly include some artificial neural networks and intelligent optimization algorithms, which are very prevalent due to their high-performance learning capacity for nonlinear patterns like air pollution. Song et al. applied a cuckoo search optimization algorithm to optimize the parameters of Weibull, Rayleigh, Lognormal and Gamma distribution functions, which can be conducive to implement interval forecasting further. Besides, the paper applied an adaptive neuro-fuzzy model to perform deterministic forecasting of PM2.5 and PM10 from three cities in China [9]. Kanchan Prasad et al. developed an adaptive neuro-fuzzy inference system to comprehensively forecast daily air pollution concentrations of five air pollutants, namely SO2, NO2, CO, O3 and PM10. In order to reduce the computational cost, a forward selection method was exploited to choose optimal subsets of input dataset [10]. Hybrid artificial intelligence models are more effective and robust than single models. Qin et al. built a hybrid model combining ensemble empirical mode decomposition (EEMD), cuckoo search (CS) and a back-propagation artificial neural network to implement PM forecasting, and the simulation revealed that the hybrid outperformed the benchmark models mentioned in the paper [11]. Niu et al. proposed a novel hybrid decomposition-and-ensemble model based on complementary ensemble empirical mode decomposition (CEEMD), grey wolf optimizer and support vector regression (SVR) to perform PM2.5 forecasting, and the empirical study illustrated that the proposed hybrid forecasting model was significantly superior to the benchmark models used in the paper [12]. Zhou et al. presented a general regression neural network (GRNN) model combining EEMD. The function of EEMD is exploited to decompose raw PM2.5 data into some intrinsic mode functions (IMFs), and the GRNN is implemented to forecast each IMF. The simulations showed that the developed hybrid EEMD-GRNN model outperformed a single GRNN model without EEMD, MLR model, a principal component regression model, and an ARIMA model [13].
The aforementioned literature on air pollutants forecasting was mainly focused on PM2.5 and PM10 forecasting, whereas none of them perform comprehensive air pollutants forecasting. In this paper, a comprehensive air pollutants forecasting involving PM2.5, PM10, O3, CO, NO2, SO2 was carried out. Additionally, most of the aforementioned literature was focused on deterministic forecasting actualized by individual or hybrid models, while few studies implement interval forecasting for air pollutants.
Although the forecasting module has vital significance for EWS, the evaluation system for air quality also plays a remarkable part in air quality EWSs. Scientific evaluation of air pollutants will provide valid information for supervisory departments, and aid them to formulate scientific policies. Many researchers have focused on effective assessment models for air quality. Amit et al. explored a fuzzy—analytical hierarchical process (AHP) model for fuzzy air quality health indexes, which can be used as a signal to reduce health risk [14]. Zhao et al. put forward a fuzzy comprehensive model combined with entropy theory for air quality evaluation, and the model was utilized to address the issue of air quality assessment in Fuxin city [15]. Olvera-García et al. proposed a novel assessment model utilizing fuzzy inference integrated with an analysis hierarchy process, contributing to a new air quality index. Simulation results illustrated that the presented air quality index provided a better evaluation than those in previous studies [16]. The aforementioned evaluation methods have no capability to take the quantification of evaluation factors and the randomness and fuzziness of hierarchy into consideration simultaneously, which makes evaluation results lack relative accuracy. However, a cloud model can achieve the unity between randomness for air quality evaluation and fuzziness for qualitative expression of language.

1.3. Aim and Contributions

In the EWS, we designed two novel models to implement point forecasting and interval forecasting for six air pollutants, respectively. For point forecasting, a hybrid model based on the theory of complementary ensemble empirical mode decomposition (CEEMD) and least squares support vector machine (LSSVM) optimized by a modified biogeography-based optimization was successfully proposed, which was designated as CEEMD-BBODE(i.e., a combination of BBO and DE algorithms)-LSSVM. For interval forecasting, a novel interval forecasting model based on the theory of bias and variance estimation and LSSVM regression was developed for interval forecasting, which can overlook the uncertainty of future air pollutant levels and greatly reduce the probability of improper decision-making. Additionally, most papers either involve forecasting or assessment for air quality, whereas studies concerning both forecasting and comprehensive evaluation are very scarce. This paper not only implements air pollutant forecasting but also performs a comprehensive evaluation applying the theory of probability and fuzzy set, forming a novel air quality warning system. The primary step of the proposed EWS can be divided into three steps: firstly, as shown in Figure 1, the original data is decomposed into some intrinsic mode functions (IMFs) by CEEMD, and the first IMF (IMF1) that possesses noise feature will be removed. Then, the preprocessed data will be reconstructed into training set and validation set. Secondly, CEEMD-BBODE-LSSVM model and interval forecasting model will be testified by the aforementioned training and validation set. Finally, cloud model will be established on the basis of air quality index and its tiered standards, and then the results of point forecasting for six air pollutants will be regard as an evaluation sample for a cloud model. After 2000 instances of numerical simulation, the final degree of certainty that a sample belongs to certain air quality rating will be determined by averaging the degrees of certainty generated by 2000 simulations. Summarizing, the main contributions of this paper are as follows:
(1)
A comprehensive warning system is developed firstly, which consists of a forecasting module and an evaluation module. It is proven as a remarkably effective and high-performance warning system via many numerical implementations;
(2)
In the forecasting module, interval forecasting, which has capability to provide more effective and credible information than point forecasting, is implemented effectively;
(3)
A modified optimization based on the theory of biogeography is utilized to determine the optimal parameters in LSSVM in order to achieve excellent forecasting performance in the warning system;
(4)
A comprehensive evaluation based on probability and fuzzy set is implemented in the EWS, which has enough capability to realize the transformation between qualitative concept and quantitative data.
The remainder of the paper is organized as follows: Section 2 introduces the related methodology utilized in this paper. In Section 3, modeling preparation is reported, and a detail case study that includes point forecasting, interval forecasting and comprehensive evaluation for air quality is effectively implemented. The forecasting effectiveness, implications and future considerations for the EWS are discussed in Section 4. Finally, the conclusions are put forth in the final section.

2. Methodology

In this section, the related methodologies of the comprehensive warning system are introduced. Modified optimization based on the theory of biogeography is utilized to optimize the parameters of five distributions for six air pollutants. As for the forecasting module, a hybrid model combining a novel decomposition means, a modified optimization and a classical LSSVM model is developed to implement point and interval forecasting for air pollutants. Additionally, in order to obtain qualitative conclusions about the forecasting results, we apply the evaluation based on the probability and fuzzy set theory to perform an overall assessment of air quality.

2.1. Distribution Functions

Statistical distribution functions were utilized to determine the basic characteristics of air pollutant concentration, from which we can penetrate into the uncertainty of air pollutants. Five distribution functions, namely Weibull, Gamma, Lognormal, Log-logistic and Inverse Gaussian were exploited to study the statistical properties of six air pollutants, which are PM2.5, PM10, O3, CO, NO2, SO2 respectively. The probabilistic distribution functions (PDF) and the cumulative distribution functions (CDF) of the aforementioned distributions are as shown in the Appendix A.

2.2. CEEMD

The empirical mode decomposition (EMD) is an adaptive time-frequency data analysis method designed for nonlinear and nonstationary signal analysis [17]. However, the mode mixing problem, a serious deficiency of the EMD, leads to its limitation in practical applications. As a consequence, many modified EMD methods devoted to signal decomposition were developed by researchers [18,19,20,21,22]. The ensemble EMD (EEMD) was developed as a noise-assisted mean, which can thoroughly eliminate the shortcomings of EMD. Time consumption in the process of analyzing large ensemble means and suffering from the residual of the added white noise are remarkable deficiencies in EEMD, even though EEMD has the capability to address the problem of mode mixing effectively. In order to remove these inherent defects of EEMD and improve its calculation efficiency, CEEMD was established by Yeh et al. [23]. As a noise-improved method, the CEEMD not only overcomes the mode mixing problem, but also eliminates the residual added white noise persisting into the IMFs and enhances the calculation efficiency of the EEMD method [24]. In order to eliminate the weaknesses in EMD and EEMD, the CEEMD appends a pair of white Gaussian noises to the original signal, which can make the algorithm save more computing time and lessen the final white noise residue at the same time. The essential steps of CEEMD are as follows:
(1)
Given that a single white noise has no enough capability to solve all intermittent signals, we established a positive mixture f1(t) and a negative mixture f2(t) via appending a pair of white noise ( ± ε n ( t ) ) to the original signal:
{ f 1 ( t ) = f ( t ) + ε n ( t ) f 2 ( t ) = f ( t ) ε n ( t )
(2)
Afterward, kij+ and kij are two ensembles of IMFs acquired from decomposing the positive and negative mixtures by the EMD, and kij+ or kij is the jth IMF acquired via additive of the ith positive noise or negative noise.
(3)
Then, the final IMF is computed by:
IMF j = 1 2 N i = 1 N [ k i j + ( t ) + k i j _ ( t ) ]
(4)
(Accordingly, the original signal f(t) can be indicated via:
f ( t ) = j = 1 N IMF j ( t ) + r n ( t )
where rn(t) is the n-th residue (i.e., local trend).

2.3. The Modified BBO Algorithm

Biogeography-based optimization (BBO) was originally proposed by Simon [25]. The algorithm stems from a natural process, which can be utilized to address optimization problems in many fields concerning sensor selection [25], power system optimization [26,27], groundwater detection [28] and satellite image classification [29]. The BBO algorithm builds a habitat migration pattern based on probability according to the geographical distribution characteristics of species, in which individuals can probabilistically share information based on a habitat suitability index, and the inferior individuals can be improved by obtaining information from superior individuals. The BBO is an global optimization algorithm that possess powerful exploration capability for the current populations, while its global exploitation capability is poor. On the contrary, differential evolution (DE) possesses commendable exploitation capability, implements effective searches of the decision variable space and can avoid local convergence. To enhance the global exploitation capability of the BBO algorithm, this work proposes a novel modified BBO algorithm in which DE was added to the BBO algorithm when the number of iterations is even, and we designated the modified BBO algorithm as BBODE algorithm, which is essentially a combination of a BBO algorithm and a DE algorithm. The detail pseudo-code of our BBODE algorithm can be seen in Appendix A.
Additionally, there are four migration strategies among single islands in the BBODE algorithm, namely, the cosine model, quadratic model, exponential model, linear model, respectively. The linear model is the most commonly used one in practice. In the algorithm test section we discuss what kind of strategy has the most outstanding performance in the global optimization process. This paper provides four migration strategies in detail, which computational formulas are as shown in Equations (4)–(7) respectively:
Cosine model:
{ λ k = 0.5 I ( 1 + c o s ( k π n ) ) μ k = 0.5 E ( 1 c o s ( k π n ) )
Quadratic model:
{ λ k = I ( k n 1 ) 2 μ k = E * ( k n ) 2
Exponential model:
{ λ k = I * exp ( k n ) μ k = E * exp ( k n 1 )
Linear model:
{ λ k = I ( 1 k n ) μ k = E * k n
where I denotes maximum possible immigration rate, which will occur when there are no species in the habitat. E represents maximum possible emigration rate, which will happen when the habitat reaches its maximum environment capacity. The terms λ and μ express the probability of immigration and emigration, respectively. n denotes the maximum number of species, and k represents the number of species on the kth island.

2.4. LSSVM

Support vector machine (SVM), a significant branch of machine learning, was proposed by Vapnik [30] on the basis of statistical learning theory, and is an effective means to address pattern recognition and classification missions. The LSSVM based on the structural risk minimization principle is an extension of SVM, which applies the linear least squares criteria to the loss function instead of inequality constraints [31]. In fact, the LSSVM, which spends less computation time than SVM in practice, possesses effective capability in forecasting fields. More details on LSSVM can be found in [32].
It is noteworthy that different types of Mercer kernel function will consequentially generate different LSSVM models. Sigmoid, polynomial and radial basis function (RBF) are frequently-used kernel function for LSSVM model. In [33], the RBF is a prevalent choice for the kernel function on account of the fewer parameters to be set and superior capability in application. Accordingly, this work determined the RBF as the appropriate kernel function:
K ( x i , x j ) = exp { x j x i 2 / 2 σ 2 }
Consequently, in this paper the parameters (i.e., σ, γ) in the LSSVM model were optimized by our modified BBO algorithm to achieve high-performance forecasting.

2.5. Interval Forecasting Based on LSSVM

The LSSVM tool not only implements effective point forecasting, but also performs outstandingly in interval forecasting, which has capability to quantify the uncertainty for point forecasting. In this paper, the LSSVM toolbox in MATLAB provided by De et al. (http://www.esat.kuleuven.be/sista/lssvmlab/) was utilized to carry out interval forecasting for air pollutants. The construction of the forecasting intervals are based on the central limit theorem for linear smoothing combined with bias correction and variance estimation. Details of the code of LSSVM for interval forecasting can be obtained from the aforementioned website, and accordingly here we only a brief description on its steps: Step 1: utilize original data to train the LSSVM model based a RBF basis function. Step 2: calculate the smoother matrix for LSSVM. Step 3: compute the conditional basis and conditional variance. Step 4: set up the significance level. Step 5: obtain forecasting intervals for this fixed significance level. More details about interval forecasting using LSSVM can be found in [34].

2.6. Normal Cloud Model Applied for Air Quality Evaluation

A novel hybrid model integrating randomness and fuzziness, namely the cloud model, based on the theory of probability and fuzzy set, presented by Li et al. [35], is an effective cognitive model based on the conversion between qualitative concept and quantitative data, which is applied in many fields. Randomness and fuzziness are generally considered in the evaluation. The cloud model possesses the joint properties of randomness and fuzziness, which are more effective and comprehensive than single randomness or fuzziness model [36]. In Figure 2, the x-axis and y-axis of normal cloud denote one kind of air pollutant and a certain degree of air quality, respectively.
Ex denotes the expectation for the quantitative values presenting the level of air quality. En indicates the scope of a universe, which can be accepted by the level of air quality. He is a measurement for the variation of certainty degree from evaluations. The comprehensive workflow of the cloud model for air quality evaluation is illustrated in Figure 3, and includes five steps.
Determining the air quality criterion (i.e., PM2.5, PM10, O3, CO, NO2, SO2) is the first step. The second step is to determine the parameters (i.e., Ex, En, He) in the cloud model. The third step is to compute the hybrid entropy, i.e., the analytic hierarchy process (AHP) weights. Transforming the observed data into cloud models repeatedly to achieve the distributions of certainty degrees is the fourth step. The fifth step is to calculate the mean of the certainty degrees and obtain the final air quality level.
The evaluation of air quality is a multi-criteria decision-making process, and the air quality criteria are shown in Table 1. How to properly address steps 2–5 is our primary concern. In this paper, we adopt Equation (9) to compute the cloud model parameters:
{ E x = ( B m a x + B m i n ) / 2 E n = ( B m a x B m i n ) / 3 H e = k * E n
where Bmax and Bmin present the upper bounds and lower bounds of a qualitative concept, which is essentially the grade of an air pollutant criterion. Parameter k has the capability to determine the degree of atomization for a normal cloud. Herein, the parameter k is supposed as 0.1 to achieve a balance between variation and robustness in the evaluation. It is worthy to note that the Bmax of PM2.5, PM10, O3, CO, NO2, SO2 on the level VI is non-existent. Herein, we utilized a polynomial regression to obtain the pseudo-bounds.
It is significant to emphasize that the half normal cloud model, which is the half of a normal cloud model, was exploited on the highest and lowest level for all criteria, as the certainty degree in this interval is monotonous. As the observed data is beyond the pseudo-bound, the corresponding certainty degree is 1.
The AHP method is widely applied in multi-criteria decision-making processes. Olvera et al. applied the AHP method to estimate the weights (zi) of PM2.5, PM10, O3, CO, NO2, SO2 in the evaluation of air quality in Mexico City, which are 0.3, 0.3, 0.233, 0.1, 0.033, 0.033, respectively [17]. However, the AHP method has the inherent deficiency of being sensitive to the potential subjective uncertainty. In order to mitigate the influence of the subjective uncertainty in AHP and regional differences, a hybrid computational method of weights integrating entropy was presented. In the assessment of air quality, the entropy of air pollutant data (et) can be computed by Equation (10). Then, the AHP weights based on entropy of ith criteria ωi can be obtained, which is on the basis of normalized entropy (Ei) [37]. Additionally, the Ei and ωi can be computed by Equations (11) and (12), respectively.
e t = t = 1 T F t I n F t
E i = e i I n T
ω i = 1 E i C i = 1 C E i
where Ft denotes the frequency of ith interval. ei, namely entropy, represents the uncertainty of observed data for a criterion with T intervals. C represents the number of criteria.
To balance the latent uncertainty of subjectivity in the AHP method, a novel entropy-AHP method was proposed, which can be calculated via Equation (13). Then, the certainty degree U for a level of one criterion can be obtained using Equation (14):
W i = z i ω i i C z i ω i
U = i = 1 C W i μ i
where μi denotes the certainty degree computed by cloud model for each criterion.

3. Simulation Modeling and Analysis

In this section, modeling preparations are briefly introduced. A function test is implemented to verify the performance of the BBO and BBODE algorithms. The distribution function parameters for six air pollutants are estimated using BBO and BBODE, respectively. Point and interval forecasting are performed to infer the trends of air pollutants in the future. A comprehensive air quality evaluation is implemented by applying the cloud model.

3.1. Modeling Preparations

In this section, the study site, data source and fitness function are briefly described. Six metrics are employed to evaluate the performance of point forecasting and interval forecasting. Finally, a D-M test is used to test the forecasting performance.

3.1.1. Study Site and Data Source

In this paper, the Chinese city of Dalian (latitude and longitude 120°58′–123°31′ and 38°43′–40°10′) was selected as the study site for the EWS. It is located in the extreme south of the Liaodong Peninsula. The area of Dalian is 12,573.85 square kilometers. The population of the city is 6.6904 million, and the population density is 464 per square kilometer. In recent years, with the rapid development of the industrial economy of the city, air pollution has been increasingly worsening, which has becomes a growing concern of the public. The deteriorating air quality has increased the incidence of cardiovascular, asthma and lung disease among the public, especially for the elderly and children, which has increased the necessity of an air quality EWS. The existing air quality EWS in the city focuses on monitoring and lacks effective forecasting and comprehensive pollution evaluation, which hinders the development of an effective air quality EWS. Additionally, there is little research on the topic of air quality EWSs in Dalian, and the existing literature puts particular emphasis on cause analysis and air quality indexes, therefore, we chose Dalian as the study site for air quality EWS design.
The hourly air pollutants data were collected from a website (http://wat.epmap.org/), which is engaged in the collection of environmental data. Data concerning articulate matters (PM2.5, PM10), ozone (O3), carbon monoxide (CO), nitrogen dioxide (NO2), sulfur dioxide (SO2), as six common air pollutants, were collected from Dalian in the aforementioned website, and were utilized to validate the performance of forecasting models and implement a comprehensive air quality evaluation for the city. Figure 4 shows the study data for the six air pollutants in Dalian, which was divided into a training subset and a testing subset.

3.1.2. The Fitness Function for the CEEMD-BBODE-LSSVM Model

Establishing a proper fitness function is very crucial for the BBODE algorithm, which can build a connection between LSSVM model and the BBODE algorithm and improve the performance of LSSVM via searching for the optimal LSSVM parameters. The fitness function represents the mean of the forecasting error, which is gradually decreasing during the process of searching for the optimal LSSVM parameters until the fitness value satisfies the end condition. In this paper, the fitness function was defined as follows:
F = MSE ( | y y ^ | )
where MSE denotes the mean square error between target and forecasting values and y and represent the target values and forecasting values, respectively.

3.1.3. The Performance Metric

To determine quantitatively which forecasting model is optimal is our main concern. In this paper, six statistical criteria were utilized to investigate the accuracy and efficiency for point and interval forecasting. Four metrics as shown in Table 2 were used to evaluate the accuracy of point forecasting, which are mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE) and goodness of fit (R2), respectively. Two criteria were adopted to validate the effectiveness of interval forecasting, which are the coverage probability (CP) and average width (AW), respectively.
CP is a vital metric for interval forecasting, which is evaluated via reckoning the amount of target points within the constructed forecasting intervals. It can verify the effectiveness of interval forecasting with the corresponding significance level (a). Theoretically, the forecasting intervals are valid if CP (1 − a)%. If not, the implementation of interval forecasting is invalid. AW provides a measurement of the informativeness for interval forecasting. In theory, the narrow AW can provide greater information value than the wide AW:
CP = 1 N i = 1 N C i i . e . , C i = { 1 , i f y i [ L i , U i ] 0 , o t h e r w i s e
AW = 1 N i = 1 N ( U i L i )
where Lt and Ut represent the lower and upper bounds of the ith interval forecasting respectively. yi denotes target points.

3.1.4. D-M Test

The D-M test, first proposed by Diebold and Mariano [38], can be utilized to determine whether there is a significant difference among samples. The D-M statistic is defined as follows:
D M = i = 1 t ( F ( ε t ( 1 ) ) F ( ε t ( 2 ) ) ) / t S 2 / t S 2
where εt(1) and εt(2) denote forecasting errors from two competing models in this paper. Each forecast accuracy is evaluated via an appropriate loss function F, and the prevalent loss functions are the square error function and absolute deviation function [39]. S2 is a variance estimator of V t = F ( ε t ( 1 ) ) F ( ε t ( 2 ) ) .
The null hypothesis and alternative hypothesis of D-M test method are as follows:
Null   hypothesis , H 0 : E ( V t ) = 0 Alternative   hypothesis , H 1 : E ( V t ) 0
In the null hypothesis circumstance, DM follows the standard normal distribution N (0, 1). The null hypothesis will be rejected if | D M | > z α / 2 , which means that there is significant difference among samples.

3.2. Numerical Analysis of the BBO and BBODE Algorithms

An excellent optimization algorithm should possess the ability of global exploration and local exploitation. To enhance the efficiency of the BBO algorithm, the BBODE algorithm was proposed in this paper. In order to investigate the performance of the BBODE algorithm, the implemented functions tests are described in this section. Six functions as shown in the Appendix A were exploited to validate the capabilities of exploration and exploitation for the BBO and BBODE algorithms. In order to implement an effective and fair comparison between the BBO and BBODE algorithms, each test function was optimized independently 20 times and we initialized random populations in the same way for the different algorithms. The average of the optimal value in each experiment and standard deviation were computed after numerical experiments. All numerical simulations were performed on the platform of MATLAB R2014b for Windows 7 with a 3.30 GHz Intel Core i5, 64 bit CPU and 8 GB RAM. The experimental parameters of BBO and BBODE are shown in Table 3.
The numerical analysis conclusions can be summarized by studying Table 4, which exhibits the results of different test functions with different dimensions, which can sufficiently show that the BBODE algorithm generally has a significant superiority over the BBO algorithm.
From the detailed information in Table 4, the BBODE algorithm can search for an optimal solution for a sphere function with dimensions of 5 and 10, a Rosenbrock function with dimensions of 2, a Rastrigin function with the dimensions of 2 and 5, a Shaffer function with dimensions of 2, and a Griewank function with dimensions of 2. Considering the elapsed time, BBODE is slightly more time-consuming than BBO. However, considering comprehensively the elapsed time, accuracy and standard deviation, BBODE is still more superior to BBO. Accordingly, the BBODE algorithm was proven to be an efficient and robust optimization algorithm.
Additionally, four kinds of migration strategies (i.e., cosine model, quadratic model, exponential model, linear model) in the BBODE algorithm are discussed in this section. Six test functions with different dimensions as shown in Figure 5 are utilized to validate the efficiency of the four strategies. Figure 5 clearly shows that the performance and convergence speed for the four strategies in the migration process, from which it is clearly evident that the cosine model possesses superior performance. Consequently, the cosine model, was adopted as an efficient migration strategy in our BBODE algorithm.

3.3. The Distributional Characteristics of the Air Pollutants

Studying the distributional characteristic of air pollutants is an important task, which can reveal the nature and statistical properties of air pollutant data. Six distributions were adopted to perform the analysis of the distribution characteristics of the air pollutants, which are shown in the Appendix A.
The distribution function parameters are commonly estimated by the ways of minimum least square (MLS) and maximum likelihood estimation (MLE). In [9], the experimental results show that artificial intelligent optimization is superior to MLS or MLE in the process of searching for optimal distribution parameters. Accordingly, in this paper, we utilized artificial intelligence optimization to search for the optimal distribution function parameters.
In the function test section, the BBODE algorithm has high performance in the parameter optimization process. Here, the BBODE and BBO algorithms were utilized to search for the optimal distribution function parameters, and we performed a comparison between the performance of the BBODE and BBO algorithms. Table 5 reveals the estimated distribution function parameters obtained for the six air pollutants utilizing the BBODE and BBO algorithms. Goodness of fit (R2) is adopted to evaluate the fitting performance using different distribution functions and different optimization methods. A larger value indicates better fitting performance. Table 6 presents the R2 using different artificial intelligent optimization methods, from which can be concluded that the fitting performance using BBODE exceeds the performance of fitting using BBO. Figure 6 shows the combination of frequency histograms and the fitted distributions for six air pollutants. It can be concluded that Inverse Gaussian function performs superior performance in the process of fitting for PM2.5, PM10, SO2 on the reason that the corresponding R2 is larger than other distributions. The Gamma function is suitable to implement fitting for O3 and NO2, and Log-logistic distribution is appropriate for fitting the CO data based on the aforementioned reasons.

3.4. The Point Forecasting for Air Pollutants

In this section, the proposed hybrid CEEMD-BBODE-LSSVM model was used to implement point forecasting. CEEMD, as a novel decomposition ensemble methodology, was adopted to decompose the original air pollutants data into several IMFs. The parameter setting of CEEMD is as follows: the total number of IMFs and residuals to be decomposed is 8, the standard deviation of added white noise in each ensemble is 0.4, the ensemble number is 200. In actual application, the first IMF will be removed, and the remaining IMFs will be added to construct a new dataset that is used for training and testing the model. The performance of LSSVM is very sensitive to the parameters (i.e., σ, γ). Therefore, the BBODE algorithm was applied to optimize the parameters in the LSSVM model in order to obtain high-performance forecasting accuracy. The forecasting work was actualized by LSSVM, which is an excellent forecasting tool in many fields. The air pollutants data from Dalian was utilized to test the performance of the proposed hybrid model, which were divided into training subset and testing subset as clearly shown in Figure 4.
Table 7 and Table 8 report the forecasting performance of all benchmark models for air pollutants from Jul. to Oct. in 2015. Four metrics (i.e., MAE, MAPE, RMSE, R2) were employed to reveal the forecasting capability for model assessment and comparison. From the forecasting performance of PM2.5 in Table 7 and Table 8, the MAE, MAPE, RMSE of LSSVM, EEMD-LSSVM, CEEMD-LSSVM, CEEMD-BBODE-LSSVM are decreasing as a whole, which indicates that CEEMD-BBODE-LSSVM has better performance than the considered benchmark models. The R2 of LSSVM, EEMD-LSSVM, CEEMD-LSSVM, CEEMD-BBODE-LSSVM for PM2.5 forecasting increases progressively, which illustrates that the proposed hybrid model CEEMD-BBODE-LSSVM has superior forecasting capability than the other benchmark models. Similarly, the forecasting performance of CEEMD-BBODE-LSSVM for PM10, O3, CO, NO2, SO2 is still superior to that of the other benchmark models. As for decomposition method, compared to models without CEEMD, the models with CEEMD show significant improvements, which illustrates that CEEMD is actually an excellent tool for de-noising. For example, in the forecasting of PM2.5, PM10, O3, CO, NO2, SO2 in Jul. in Table 7, compared with LSSVM, the MAPE of CEEMD-LSSVM reflects 9.48%, 7.41%, 4.54%, 3.29%, 8.17%, 10.46% improvement, respectively, and the MAPE of CEEMD-LSSVM reflects 2.77%, 1.71%, 1.29%, 0.75%, 1.96%, 2.21% improvement, respectively, compared with EEMD-LSSVM. As for optimization, when making a comparison between CEEMD-LSSVM and CEEMD-BBODE-LSSVM for the six air pollutants in Table 7 and Table 8, CEEMD-BBODE-LSSVM indicates an improvement in forecasting accuracy for CEEMD-LSSVM, which denotes the BBODE algorithm has better performance in the application of searching for optimal solutions for forecasting models. The aforementioned comparative analysis demonstrates that the CEEMD-BBODE-LSSVM model is superior to the benchmark models mentioned in this section. In order to be more clearly illustrate the forecasting performance of all models, we selected the first three days in July to make a visualization, which contain 35 test samples for the air pollutants, respectively. Figure 7 exhibits the comparison of forecasting values based on all models, which shows that the proposed hybrid CEEMD-BBODE-LSSVM model is more accurate and robust. From Figure 7, there is strong correlation between PM2.5 and PM10 on the reason of the similarity of forecasting results. From the black dotted line in Figure 7, it can be concluded that the CEEMD-BBODE-LSSVM model has outstanding capacity for outlier forecasting. Given the superior performance of the hybrid model in different forecasting environments, we concluded that the hybrid forecasting model has comprehensively wider applicability, effectiveness, compatibility.

3.5. The Interval Forecasting for Air Pollutants

The quantification of uncertainty, namely interval forecasting, plays a significant part in air quality EWSs, which can provide more credible and dynamic forecasting results. In this paper, the constructed nonsymmetrical forecasting intervals were generated by LSSVM since the point forecasting has weak capability to address the uncertainties in the forecasting process. Quantitative measures (i.e., AW, CP) are commonly used for evaluating the performance of interval forecasting, which are affected by the different significance level settings.
In theory, the constructed forecasting interval is effective if the condition that the CP is larger or equal to its corresponding confidence level is satisfied. Table 9 reports the numerical results of interval forecasting using the metrics CP and AW quantitatively.
From Table 9, the CP is larger than the corresponding confidence level in most constructed intervals, which remarkably demonstrates that the constructed intervals are valid. It is noteworthy that there is a regular pattern where the interval forecasting width will be smaller when the significance level is increasing gradually, which was displayed schematically in Figure 8 as an illustrative example. The smaller the significance is, the larger the interval forecasting width is. It can observed that the interval forecasting has the best performance when the significance level is 0.05. However, in this situation, it is hard to determine precise values for forecasting when the interval forecasting width is large. The effectiveness of interval forecasting declines when the significance level is increasing. Theoretically, the optimal interval forecasting occurs on actual application and meteorological conditions. For example, the AW can be squeezed if the weather is stable, and AW can be enlarged if the weather is unstable.
In order to clearly illustrate the interval forecasting results, we adopted the first 100 test samples in July and August to create a visualization, which can be seen in Figure 9.
In Figure 9, given the informativeness evaluated by CP and correctness assessed by AW in Table 9, we used significance levels of 0.2, 0.2, 0.2, 0.1, 0.1, 0.1 corresponding to PM2.5, PM10, O3, CO, NO2, SO2 in July to implement interval forecasting, respectively. From Figure 9, it can be observed that most of the actual values are located within the forecasting intervals, which indicates that the efficiency of interval forecasting is theoretically valid. A reference about the hazard using point forecasting will be provided to decision-makers since the uncertainties for forecasting are quantified within the forecasting intervals. Accordingly, the proposed interval forecasting model can provide a tradeoff between effectiveness and informativeness, which is of great importance to formulate scientific policy on early air quality warnings.

3.6. Comprehensive Evaluation Implementation

Air quality evaluation is a multiple criteria decision-making process, and the cloud model has outstanding capability to address the fuzziness and randomness in the evaluation process. In this section, a comprehensive evaluation using the cloud model is effectively performed. In the evaluation process, the forecasting values generated by CEEMD-BBODE-LSSVM were regarded as samples to participate in the evaluation, which plays a vital part in EWS.

3.6.1. Evaluation Preparation

Before evaluation, there are some vital sections that need to be prepared, which consist of criteria for air quality, pseudo-boundary for all criteria, parameters in the cloud model, and weights, respectively. The criteria for air quality evaluation are as shown in aforementioned Table 1. The parameters of the cloud model were calculated by Equation (9), and can be seen in Table 10. It is worthy to note that Bmax is missing for all level VI criteria, so in this paper we used a polynomial regression to obtain them. The detailed information on the polynomial regression for Bmax in level VI for all criteria is shown in Table 11. The weights generated by the hybrid entropy-AHP method for all criteria are reported in Table 12.

3.6.2. Evaluation Implementation

After preparation of the cloud model, a comprehensive assessment was effectively implemented. For the sake of simplicity, we extracted none samples from the testing subset to perform a comprehensive assessment utilizing the cloud model, which is shown in Table 13.
To enhance the accuracy and robustness, each sample was evaluated over 2000 times, and the mean of the distribution of certainty degree was adopted to determine the final certainty degree. The final air quality levels were attained with the maximum certainty degree, which presents the most possible membership. The final evaluation results for all cases are reported in Table 14. According to aforementioned Table 2, air quality can be classified in six levels: namely excellent, good, light pollution, moderate pollution, heavy pollution, serious pollution. From Table 14, the air quality of A1, A3, A7, A8 is at level I. A2, A4, A9 are belong to level II. A5 and A6 are belong to levels IV and V, respectively. It is worthy to note that the certainty degree 0 in Table 14 indicates that there is no membership at the level.
In order to illustrate the distribution pattern, we took case A4 in Table 13 as an illustrative example. In Figure 10, certainty degrees with different distribution patterns at each level for case A4 can be seen. The certainty degree is maximum on the level II for case A4, which indicates that case A4 belongs to level II. Additionally, when making a comparison among the cases that belong to the same level, more information rather than the simple final level can be provided by the certainty degree. For example, although cases A2, A4, A9 belong to the same level II, their certainty degrees are different. The certainty degree of belonging to level II of cases A2, A4, A9 are 0.5421, 0.6136 and 0.3589, respectively, which allows us to reach the conclusion that case A4 is more likely to be level II than cases A2, A9. The aforementioned discussion revealed that cloud model can not only determine the air quality level, but also further expresses the relative severity of air quality at the same level.
Table 15 shows the D-M test results on the basis of MAE loss function, from which a summary can be obtained as follows: in the forecasting of all pollutants, the D-M values of LSSVM, EEMD-LSSVM are larger than the upper bound of 1% significance level, which illustrates that CEEMD-BBODE-LSSVM is significantly superior to the LSSVM, EEMD-LSSVM model. Additionally, the D-M values for CEEMD-LSSVM are generally larger than the upper bound of 5% significance level, which denotes the proposed CEEMD-BBODE-LSSVM hybrid model has better performance than CEEMD-LSSVM in most cases. Obviously, the proposed hybrid model outperforms other benchmark models generally.

4. Discussion

4.1. The Forecasting Effectiveness Based on D-M Test

In this paper, a D-M test was utilized to distinguish the difference between error series generated by a benchmark forecasting model and a target forecasting model, respectively, which has the capability to verify the point forecasting performance for different forecasting models.

4.2. The Public Health Implications of the EWS

There is few air quality EWS studies in China, which mainly depends on weather research and forecasting models (WRFs). However, WRFs are faced with many challenges in current applications, such as high costs, heavy workload and the difficulty of model debugging in a short time. Additionally, WRFs are usually implemented in the form of grids, and their local forecasting capability is poor. The proposed EWS for Dalian is based on artificial intelligence theory. High precision and scientific evaluation of the EWS in practical application was verified via the aforementioned numerical simulations. The forecasting and evaluation modules in the proposed EWS can be integrated into the existing air monitoring system in Dalian, which will promote the development of an EWS of air quality and provide more warning information for the public. Furthermore, effective warnings about air quality are conducive to lowering the incidence of public health diseases, such as lung, asthma or cardiovascular disease.

4.3. Future Considerations for the Air Quality EWS

In the comparison of EWSs, the factors of effectiveness, efficiency, cost and precision are frequently considered. Although the proposed EWS shows admirable performance in the tasks of forecasting and evaluation, the presented system merely involves empirical models and does not involve the deterministic models mentioned in literature reviews and WRFs. In order to get better performance in the EWS, integration of empirical models, deterministic models and WRFs is necessary in the future, which will combine the respective merits of the three models as much as possible and strengthen the scientific basis of the EWS. Additionally, in order to enhance the practicability of the EWS, it is necessary to establish an information platform on the EWS.

5. Conclusions

Establishing a comprehensive air quality warning system plays a particularly crucial role due to the increasing levels of atmospheric pollution. However, how to establish an effective warning system that has best performance is not only a challenging technical assignment, but also a noticeable concern for the public. In this paper, a comprehensive warning system was developed successfully, which consists of effective forecasting and scientific evaluation, respectively. For the forecasting module, a novel hybrid forecasting model, namely CEEMD-BBODE-LSSVM, is proposed for point forecasting. To simplify the complexity of the original data, the series of air pollutants are decomposed into several IMFs using CEEMD, which can be reconstructed by the way of removing high-frequency signals. However, no theory can determine the proper number of IMFs so far, which may be an aspect for future investigations. The BBODE algorithm, as a modified BBO algorithm, is utilized to search for the optimal LSSVM parameters in order to achieve a desirable forecasting performance. The simulation results reveal that the hybrid model is remarkably superior to all benchmark models mentioned on the basis of four metrics (MAE, MAPE, RMSE, R2). However, point forecasting cannot directly provide the uncertainty information, which means that the decision-maker must bear great risk when using point forecasting. Accordingly, to improve the accuracy and robustness of the forecasting performance, interval forecasting is implemented with the purpose of quantifying the inherent uncertainties, which has the capability to provide malleable information for the future trends of pollutants. Accordingly, it is significant to integrate the point forecasting and interval forecasting, which is essential for optimally regulating air quality. For the evaluation module, air quality is evaluated comprehensively applying a normal cloud model based on entropy–AHP theory, which also plays a vital part in this warning system. Additionally, a multiple dimension cloud model, as an extension of the one dimensional cloud model, is a promising evaluation method, which is a worthy study topic for the future. In this paper, the study of an EWS for air quality is still in a starting phase, which merely involves one-step-ahead forecasting. More exploration on multi-step-ahead forecasting and combination forecasting in theory and practicality should be extensively implemented in the future.

Acknowledgments

This research was supported by the National Natural Science Foundation of China under Grant 71671029.

Author Contributions

Jianzhou Wang designed the experiment of warning system for air quality and wrote the manuscript. Tong Niu made the program in MATLAB and analyzed the data. Rui Wang provided critical review and manuscript editing. All authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

CTMchemical transport model
MLRmultiple linear regression
ARIMAintegrated moving average model
GRNNgeneral regression neural network
LSSVMleast squares support vector machine
SVMsupport vector machine
AHPanalytical hierarchical process
EMDempirical mode decomposition
EEMDensemble empirical mode decomposition
CEEMDcomplementary ensemble empirical mode decomposition
IMFintrinsic mode function
DEdifferential evolution
BBObiogeography-based optimization
PDFprobabilistic distribution function
CDFcumulative distribution function
AWaverage width
CPcoverage probability
AQIair quality index
MAEmean absolute error
MAPEmean absolute percentage error
RMSEroot mean square error
R2goodness of fit
Std.standard deviation

Appendix A

Appendix 1. The PDF and CDF Functions for Weibull, Gamma, Lognormal, Log-Logistic, Inverse Gaussian

Table A1. The PDF and CDF of five kinds of distributions.
Table A1. The PDF and CDF of five kinds of distributions.
DistributionPDF/CDFParameters
Weibull f ( x ; a , b ) = b a ( x a ) b 1 exp [ ( x a ) b ] , x 0 a > 0 scale parameter
b > 0 shape parameter
F ( x ; a , b ) = 1 exp [ ( x a ) b ]
Gamma f ( x ; a , b ) = x a 1 exp ( x b ) / b a Γ ( a ) , x > 0 a > 0 shape parameter
b > 0 scale parameter
F ( x ; a , b ) = 1 b a Γ ( a ) 0 x t a 1 exp ( x b ) d t , x > 0
Lognormal f ( x ; a , b ) = 1 x a 2 π e x p ( ( I n x b ) 2 2 a 2 ) , x > 0 a > 0 scale parameter
b > 0 location parameter
F ( x ; a , b ) = 1 a 2 π 0 x 1 t exp ( ( I n t b ) 2 2 a 2 ) d t
Log-logistic f ( x ; a , b ) = ( b / a ) ( x / a ) b 1 ( 1 + ( x / a ) b ) 2 , x > 0 a > 0 scale parameter
b > 0 shape parameter
F ( x ; a , b ) = x b a b + x b
Inverse Gaussian f ( x ; a , b ) = [ b 2 π x 3 ] 1 / 2 exp [ b ( x a ) 2 2 a 2 x ] , x > 0 a > 0 scale parameter
b > 0 shape parameter
F ( x ; a , b ) = Φ [ b x ( x a 1 ) ] + [ exp ( 2 b a ) ] Φ [ b x ( x a + 1 ) ]

Appendix 2. The Test Functions in This Paper for BBO and BBODE Algorithm

Table A2. Test functions.
Table A2. Test functions.
Function NameTest FunctionVariable DomainGlobal Optimum
Sphere f ( x ) = i = 1 d x i 2 x i [ 100 , 100 ] f m i n ( 0 , 0 , 0 0 ) = 0
Rosenbrock f ( x ) = i = 1 d 1 [ 100 ( x i 2 x i + 1 ) 2 + ( x i 1 ) 2 ] x i [ 30 , 30 ] f m i n ( 1 , 1 , 1 1 ) = 0
Rastrigin f ( x ) = i = 1 d ( x i 2 10 c o s ( 2 π x i ) + 10 ) x i [ 5.12 , 5.12 ] f m i n ( 0 , 0 , 0 0 ) = 0
Shaffer f ( x ) = 0.5 + ( s i n i = 1 d x i 2 ) 2 0.5 ( 1 + 0.001 i = 1 d x i 2 ) 2 x i [ 100 , 100 ] f m i n ( 0 , 0 , 0 0 ) = 0
Griewank f ( x ) = 1 4000 i = 1 d x i 2 i = 1 d c o s ( x i i ) + 1 x i [ 600 , 600 ] f m i n ( 0 , 0 , 0 0 ) = 0
Ackley f ( x ) = a exp ( b 1 n i = 1 d x i 2 ) exp ( 1 n i = 1 d c o s ( 2 π x j ) ) + a + e a = 20 , b = 0.2 , e = 2.7128 x i [ 32 , 32 ] f m i n ( 0 , 0 , 0 0 ) = 0

Appendix 3. Pseudo-Code of the BBODE Algorithm

The detailed pseudo-code of the BBODE algorithm used in this paper can be summarized as follows:
Parameters
t: the number of iteration       Iter_Max: the maximum number of iteration
n: the maximum number of species    size: the size of population
rand: the random number in [0,1]      Vi: the mixed hybrid operator
MaxXi: the maximum individual     MinXi: the minimum individual
C: the probability of mutation     F: difference operator
1 /* Parameter setup */
2 /* Initialize population Pi */
3 /* Compute the fitness function Fi of each habitat, sort Fi */
4 F i = MSE ( | y r e a l y ^ f o r e c a s t | )
5 /* Obtain elitist population */
6 /* Initialize probability of population in habitat */
7 FOR t < Iter_Max DO
8   IF t is odd THEN
9    /* Compute the number of population k */
10     /* Compute the rate of immigration λi and emigration μi for each habitat */
11     λ i = 0.5 I ( 1 + c o s ( k π n ) ) ; μ i = 0.5 E ( 1 c o s ( k π n ) )
12    /* Normalize the immigration rate λscale */
13     λ s c a l e = λ l o w e r + ( λ u p p e r λ l o w e r ) * ( λ i min ( λ i ) ) / ( max ( λ i ) min ( λ i ) )
14    /* Operation of migration */
15    Transform new information to habitat i
16    ELSE
17    FOR i = 1:size
18       Choose indexes r1r2i
19       /* Generate difference operator */
20        V i = P i + F * ( P i ( 1 ) P i ) + F * ( P i ( r 1 ) P i ( r 2 ) )
21       IF randC THEN
22          /* Mutation operation */
23           V i = M i n X i + ( M a x X i M i n X i ) * r a n d
24       END IF
25    END FOR
26   END IF
27 END FOR
28 /* Deassign for samples beyond the range */
29 /* Deassign for the same sample */
30 /* Compute fitness Fi for new population and sort Fi */
31 Obtain optimal solution
32 Postprocess results and visualization

References

  1. Du, X.; Kong, Q.; Ge, W.; Zhang, S.; Fu, L. Characterization of personal exposure concentration of fine particles for adults and children exposed to high ambient concentrations in Beijing, China. J. Environ. Sci. China 2010, 22, 1757–1764. [Google Scholar] [CrossRef]
  2. Qiu, H.; Yu, I.T.S.; Wang, X.; Tian, L.; Tse, L.A.; Wong, T.W. Differential effects of fine and coarse particles on daily emergency cardiovascular hospitalizations in Hong Kong. Atmos. Environ. 2013, 64, 296–302. [Google Scholar] [CrossRef]
  3. MECC (Ministry of the Environment and Climate Change). Fine Particulate Matter 2012; Ministry of the Environment and Climate Change: Scarborough, ON, Canada, 2012.
  4. Cobourn, W.G. An enhanced PM2.5, air quality forecast model based on nonlinear regression and back-trajectory concentrations. Atmos. Environ. 2010, 44, 3015–3023. [Google Scholar] [CrossRef]
  5. Sun, W.; Zhang, H.; Palazoglu, A.; Singh, A.; Zhang, W.; Liu, S. Prediction of 24-hour-average PM2.5, concentrations using a hidden Markov model with different emission distributions in Northern California. Sci. Total Environ. 2013, 443, 93–103. [Google Scholar] [CrossRef] [PubMed]
  6. Han, Z.; Ueda, H.; An, J. Evaluation and intercomparison of meteorological predictions by five MM5-PBL parameterizations in combination with three land-surface models. Atmos. Environ. 2008, 42, 233–249. [Google Scholar] [CrossRef]
  7. Konovalov, I.B.; Beekmann, M.; Meleux, F.; Dutot, A.; Foret, G. Combining deterministic and statistical approaches for PM10, forecasting in Europe. Atmos. Environ. 2009, 43, 6425–6434. [Google Scholar] [CrossRef]
  8. Wongsathan, R.; Seedadan, I. A Hybrid ARIMA and Neural Networks Model for PM-10 Pollution Estimation: The Case of Chiang Mai City Moat Area. Procedia Comput. Sci. 2016, 86, 273–276. [Google Scholar] [CrossRef]
  9. Song, Y.; Qin, S.; Qu, J.; Liu, F. The forecasting research of early warning systems for atmospheric pollutants: A case in Yangtze River Delta region. Atmos. Environ. 2015, 118, 58–69. [Google Scholar] [CrossRef]
  10. Prasad, K.; Gorai, A.K.; Goyal, P. Development of ANFIS models for air quality forecasting and input optimization for reducing the computational cost and time. Atmos. Environ. 2016, 128, 246–262. [Google Scholar] [CrossRef]
  11. Qin, S.; Liu, F.; Wang, J.; Sun, B. Analysis and forecasting of the particulate matter (PM) concentration levels over four major cities of China using hybrid models. Atmos. Environ. 2014, 98, 665–675. [Google Scholar] [CrossRef]
  12. Niu, M.; Wang, Y.; Sun, S.; Li, Y. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM concentration forecasting. Atmos. Environ. 2016, 134, 168–180. [Google Scholar] [CrossRef]
  13. Zhou, Q.; Jiang, H.; Wang, J.; Zhou, J. A hybrid model for PM2.5, forecasting based on ensemble empirical mode decomposition and a general regression neural network. Sci. Total Environ. 2014, 496C, 264–274. [Google Scholar] [CrossRef] [PubMed]
  14. Gorai, A.K.; Upadhyay, A.; Tuluri, F.; Goyal, P.; Tchounwou, P.B. An innovative approach for determination of air quality health index. Sci. Total Environ. 2015, 533, 495–505. [Google Scholar] [CrossRef] [PubMed]
  15. Zhao, X.; Qi, Q.; Li, R. The establishment and application of fuzzy comprehensive model with weight based on entropy technology for air quality assessment. Procedia Eng. 2010, 7, 217–222. [Google Scholar] [CrossRef]
  16. Olvera-García, M.Á.; Carbajal-Hernández, J.J.; Sánchez-Fernández, L.P.; Hernández-Bautista, I. Air quality assessment using a weighted Fuzzy Inference System. Ecol. Inform. 2016, 33, 57–74. [Google Scholar] [CrossRef]
  17. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-stationary Time Series Analysis. Proc. R. Soc. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
  18. Huang, N.E.; Wu, M.L.C.; Long, S.R.; Shen, S.S.; Qu, W.; Gloersen, P.; Fan, K.L. A confidence limit for the empirical mode decomposition and Hilbert spectral analysis. Proc. R. Soc. A 2003, 459, 2317–2345. [Google Scholar] [CrossRef]
  19. Oberlin, T.; Meignen, S.; Perrier, V. An Alternative Formulation for the Empirical Mode Decomposition. IEEE Trans. Signal Process. 2012, 60, 2236–2246. [Google Scholar] [CrossRef]
  20. Chen, Q.; Huang, N.; Riemenschneider, S.; Xu, Y. A B-spline approach for empirical mode decompositions. Adv. Comput. Math. 2006, 24, 171–195. [Google Scholar] [CrossRef]
  21. Sharpley, R.C.; Vatchev, V. Analysis of the Intrinsic Mode Functions. Constr. Approx. 2006, 24, 17–47. [Google Scholar] [CrossRef]
  22. Peng, Z.K.; Tse, P.W.; Chu, F.L. A comparison study of improved Hilbert–Huang transform and wavelet transform: Application to fault diagnosis for rolling bearing. Mech. Syst. Signal Process. 2005, 19, 974–988. [Google Scholar] [CrossRef]
  23. Yeh, J.; Shieh, J.; Norden, E.; Huang, N.E. Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2010, 2, 135–156. [Google Scholar] [CrossRef]
  24. Imaouchen, Y.; Kedadouche, M.; Alkama, R.; Thomas, M. A Frequency-Weighted Energy Operator and complementary ensemble empirical mode decomposition for bearing fault detection. Mech. Syst. Signal Process. 2016, 82, 103–116. [Google Scholar] [CrossRef]
  25. Simon, D. Biogeography-Based Optimization. IEEE Trans. Evolut. Comput. 2008, 12, 702–713. [Google Scholar] [CrossRef]
  26. Rarick, R.; Simon, D.; Villaseca, F.E.; Vyakaranam, B. Biogeography-based optimization and the solution of the power flow problem. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, USA, 11–14 October 2009; pp. 1003–1008.
  27. Roy, P.K.; Ghoshal, S.P.; Thakur, S.S. Biogeography-based Optimization for Economic Load Dispatch Problems. Electr. Power Compon. Syst. 2010, 38, 166–181. [Google Scholar] [CrossRef]
  28. Kundra, H.; Kaur, A.; Panchal, V. An integrated approach to biogeography based optimization with case based reasoning for retrieving groundwater possibility. In Proceedings of the 8th Annual Asian Conference and Exhibition on Geospatial Information, Technology and Applications, Singapore, 18–20 August 2009.
  29. Panchal, V.K.; Singh, P.; Kaur, N.; Kundra, H. Biogeography based Satellite Image Classification. Comput. Sci. 2009, 6, 269–274. [Google Scholar]
  30. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: Berlin, Germany, 2000; pp. 988–999. [Google Scholar]
  31. Suykens, J.A.K.; Vandewalle, J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  32. Wang, J.; Hu, J. A robust combination approach for short-term wind speed forecasting and analysis—Combination of the ARIMA, ELM, SVM and LSSVM forecasts using a GPR model. Energy 2015, 93, 41–56. [Google Scholar] [CrossRef]
  33. Keerthi, S.S.; Lin, C.J. Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput. 2003, 15, 1667–1689. [Google Scholar] [CrossRef] [PubMed]
  34. De Brabanter, K.; De Brabanter, J.; Suykens, J.A.; De Moor, B. Approximate Confidence and Prediction Intervals for Least Squares Support Vector Regression. IEEE Trans. Neural Netw. 2011, 22, 110–120. [Google Scholar] [CrossRef] [PubMed]
  35. Li, D.; Liu, C.; Gan, W. A new cognitive model: Cloud model. Int. J. Intell. Syst. 2009, 24, 357–375. [Google Scholar] [CrossRef]
  36. Wang, D.; Liu, D.; Ding, H.; Singh, V.P.; Wang, Y.; Zeng, X.; Wu, J.; Wang, L. A cloud model-based approach for water quality assessment. Environ. Res. 2016, 148, 24–35. [Google Scholar] [CrossRef] [PubMed]
  37. Zhang, J.G.; Singh, V.P. Entropy-Theory and Application; China Water & Power Press: Beijing, China, 2012; pp. 79–80. [Google Scholar]
  38. Diebold, F.X.; Mariano, R.S. Comparing Predictive Accuracy. NBER Tech. Work. Pap. 1994, 13, 134–144. [Google Scholar]
  39. Xiao, L.; Wang, J.; Hou, R.; Wu, J. A combined model based on data pre-analysis and weight coefficients optimization for electrical load forecasting. Energy 2015, 82, 524–549. [Google Scholar] [CrossRef]
Figure 1. Data preprocessing for EWS.
Figure 1. Data preprocessing for EWS.
Ijerph 14 00249 g001
Figure 2. Normal cloud.
Figure 2. Normal cloud.
Ijerph 14 00249 g002
Figure 3. The cloud modeling workflow.
Figure 3. The cloud modeling workflow.
Ijerph 14 00249 g003
Figure 4. Training and testing subsets for the forecasting model.
Figure 4. Training and testing subsets for the forecasting model.
Ijerph 14 00249 g004
Figure 5. The comparison of convergence speed for four migration strategies in BBODE. In Figure 5, (A) four kinds of migration strategies (i.e., cosine model, quadratic model, exponential model, linear model) were tested by griewank function with the dimension 5. From the fitness curve in (A), quadratic model has the better convergence speed and accuracy. In (B), four kinds of migration strategies were tested by rosenbrock function with the dimension 2. From (B), considering convergence speed and accuracy, we can conclude that cosine model has a superior performance. Similarly, in (CF), four kinds of migration strategies were tested by rastrigin function with the dimension 5, sphere function with the dimension 10, shaffer function with the dimension 5 and ackley function with the dimension 5, respectively. From (CF), compared with quadratic, exponential, linear models, the convergence speed and accuracy of cosine model remarkably illustrate its excellent performance. Summarizing, cosine model is a superior migration strategy in BBODE algorithm.
Figure 5. The comparison of convergence speed for four migration strategies in BBODE. In Figure 5, (A) four kinds of migration strategies (i.e., cosine model, quadratic model, exponential model, linear model) were tested by griewank function with the dimension 5. From the fitness curve in (A), quadratic model has the better convergence speed and accuracy. In (B), four kinds of migration strategies were tested by rosenbrock function with the dimension 2. From (B), considering convergence speed and accuracy, we can conclude that cosine model has a superior performance. Similarly, in (CF), four kinds of migration strategies were tested by rastrigin function with the dimension 5, sphere function with the dimension 10, shaffer function with the dimension 5 and ackley function with the dimension 5, respectively. From (CF), compared with quadratic, exponential, linear models, the convergence speed and accuracy of cosine model remarkably illustrate its excellent performance. Summarizing, cosine model is a superior migration strategy in BBODE algorithm.
Ijerph 14 00249 g005
Figure 6. The statistical distribution characteristics of the air pollutant concentrations.
Figure 6. The statistical distribution characteristics of the air pollutant concentrations.
Ijerph 14 00249 g006
Figure 7. The comparison of forecasting performance for air pollutants in July.
Figure 7. The comparison of forecasting performance for air pollutants in July.
Ijerph 14 00249 g007
Figure 8. The width of interval forecasting with different significance levels (a).
Figure 8. The width of interval forecasting with different significance levels (a).
Ijerph 14 00249 g008
Figure 9. The performance of interval forecasting for all air pollutants in July.
Figure 9. The performance of interval forecasting for all air pollutants in July.
Ijerph 14 00249 g009aIjerph 14 00249 g009b
Figure 10. Distributional patterns of certainty degrees in each level of Case A4.
Figure 10. Distributional patterns of certainty degrees in each level of Case A4.
Ijerph 14 00249 g010
Table 1. Quantitative boundaries of air pollution levels of all criteria.
Table 1. Quantitative boundaries of air pollution levels of all criteria.
LevelsAir Quality Criteria (µg/m3)
PM2.5PM10O3CONO2SO2
I≤35≤50≤10≤2≤40≤50
II≤75≤150≤160≤4≤80≤150
III≤115≤250≤215≤14≤180≤250
IV≤150≤350≤265≤24≤280≤475
V≤250≤420≤800≤36≤565≤800
VI>250>420>800>36>565>800
Table 2. Three metric rules for point forecasting.
Table 2. Three metric rules for point forecasting.
MetricDefinitionEquation
MAEMean absolute error MAE = 1 n i = 1 n | y i y i |
MAPEMean absolute percentage error MAPE = 1 n i = 1 n | y i y i y i | × 100 %
RMSERoot mean square error RMSE = 1 n i = 1 n ( y i y i y i ) 2
R2Goodness of fit R 2 = 1 i = 1 n ( y i y i ) 2 i = 1 n ( y i y ¯ ) 2
yi and y i denote the actual values and forecasting values, respectively. represents the average of actual values. The R2 was also utilized to evaluate the fitness performance in the process of distribution fitting, where yi, y i and represent the observed cumulative probability, estimated cumulative probability and the average of the observed cumulative probability, respectively.
Table 3. The experiment parameters of BBO and BBODE.
Table 3. The experiment parameters of BBO and BBODE.
Parameter SettingBBOBBODE
Maximum iteration50005000
Population size5050
The number of elite kept33
Maximum emigration rate11
Minimum emigration rate00
Maximum immigration rate11
Minimum immigration rate00
Mutation probability0.050.4
Difference operator-0.6
Table 4. Test results of BBO and BBODE.
Table 4. Test results of BBO and BBODE.
Test FunctionDimensionAlgorithmOptimal/Worse SolutionMean/Std.Elapsed Time (s)
Sphere5BBO3.83 × 10−3/1.87 × 10−21.21 × 10−2/6.09 × 10−324.5293
BBODE0/00/025.1026
10BBO1.05 × 10−2/3.42 × 10−18.06 × 10−2/3.14 × 10−227.1782
BBODE0/00/028.0055
Rosenbrock2BBO1.05 × 10−2/6.19 × 10−12.65 × 10−1/2.48 × 10−121.5151
BBODE0/00/038.8187
Rastrigin2BBO1.56 × 10−4/3.71 × 10−31.70 × 10−3/1.46 × 10−322.1951
BBODE0/00/023.0743
5BBO3.97 × 10−3/2.05 × 10−21.15 × 10−2/6.43 × 10−324.4002
BBODE0/00/024.1739
Shaffer2BBO9.72 × 10−3/3.33 × 10−21.45 × 10−2/1.06 × 10−222.4175
BBODE0/00/023.3923
5BBO9.72 × 10−3/7.82 × 10−23.99 × 10−2/2.45 × 10−224.3080
BBODE9.72 × 10−3/9.72 × 10−39.70 × 10−3/9.23 × 10−1129.3161
Griewank2BBO3.60 × 10−3/6.80 × 10−22.06 × 10−2/2.67 × 10−222.2211
BBODE0/7.40 × 10−33.00 × 10−3/4.05 ×10−322.4311
Ackley2BBO2.61 × 10−2/8.12 × 10−25.24 × 10−2/2.57 × 10−222.4061
BBODE8.88 × 10−16/8.88 × 10−160/022.9809
5BBO2.78 × 10−2/2.90 × 10−11.20 × 10−1/1.07 × 10−124.3998
BBODE8.88 × 10−16/8.88 × 10−160/025.2199
Table 5. Parameters of the different distributions based on the different optimized algorithm. In Table 5, a and b represent scale and shape parameters of distribution functions, respectively.
Table 5. Parameters of the different distributions based on the different optimized algorithm. In Table 5, a and b represent scale and shape parameters of distribution functions, respectively.
IndexesOptimized AlgorithmParameters
WeibullGammaLognormalLog-LogisticInverse Gaussian
ababababab
PM2.5BBO45.63531.14941.121939.39600.95903.34231.997131.963146.224747.6059
BBODE45.78831.17541.376931.59370.86753.46421.986532.034446.166147.6248
PM10BBO82.05801.37994.572815.19320.73513.95222.647762.404077.6225137.8372
BBODE82.14141.51522.218733.78940.69394.12052.467661.593278.0797137.2609
O3BBO81.96141.89793.701119.74250.61564.07543.097165.135776.2749199.8628
BBODE82.01891.89203.073524.09450.56484.17153.013664.967375.8027213.8432
SO2BBO27.13551.10970.969628.54822.83141.08331.672417.367827.171319.4500
BBODE26.46200.93120.919429.48222.83781.05541.635717.103529.344418.3036
NO2BBO35.12872.14493.84368.29040.55963.26233.248928.174132.5209112.2516
BBODE35.24762.13603.89488.15660.50753.35013.367328.539732.3743114.6579
COBBO0.85472.24241.12330.78540.4562−0.37973.83370.72081.09521.0957
BBODE0.82292.39834.43890.16890.4558−0.37893.71010.68320.75893.3818
Table 6. R2 of different distribution using different optimized algorithm. The data in bold denotes that it is largest in each line of Table 6, which represents the optimal R2 of distribution fitting.
Table 6. R2 of different distribution using different optimized algorithm. The data in bold denotes that it is largest in each line of Table 6, which represents the optimal R2 of distribution fitting.
IndexesOptimized AlgorithmEvaluation Criteria (R2)
WeibullGammaLognormalLog-LogisticInverse Gaussian
PM2.5BBO0.99180.99040.99190.99800.9998
BBODE0.99190.99370.99940.99800.9998
PM10BBO0.98790.96340.97600.99700.9991
BBODE0.98980.99370.99890.99820.9993
O3BBO0.99840.99790.98700.99500.9963
BBODE0.99940.99970.99680.99520.9966
SO2BBO0.97470.98200.99440.99160.9942
BBODE0.98380.98270.99470.99180.9970
NO2BBO0.99710.99900.99010.99700.9991
BBODE0.99710.99930.99900.99740.9991
COBBO0.97740.82570.99620.98940.8309
BBODE0.98110.98990.99620.99680.9963
Table 7. Performance evaluations of all forecasting models for air pollutants in July and August.
Table 7. Performance evaluations of all forecasting models for air pollutants in July and August.
Jul.LSSVMEEMD-LSSVMCEEMD-LSSVMCEEMD-BBODE-LSSVM
MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2
PM2.52.749313.724.64030.91901.52577.012.83920.96970.92234.241.74550.98850.83773.861.52640.9912
PM104.784410.877.69460.92282.33295.173.71080.98211.54763.462.55080.99151.50043.342.45810.9921
O35.76686.817.92880.94513.04253.564.43770.98281.96192.272.72410.99351.76022.042.41610.9949
CO0.02824.930.04610.90210.01372.390.02250.97660.00941.640.01530.98920.00931.640.01500.9896
NO22.443212.823.61640.82981.27056.611.87670.95420.88774.651.37330.97550.81384.291.28500.9785
SO21.317317.171.95380.73460.66078.920.90710.94280.50916.710.75960.95990.47626.420.72220.9637
Aug.LSSVMEEMD-LSSVMCEEMD-LSSVMCEEMD-BBODE-LSSVM
MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2
PM2.52.810210.124.21730.97181.34684.722.08540.99310.96013.351.41690.99680.85843.001.28140.9974
PM104.68268.277.90200.95174.68668.327.89810.95181.46822.552.47660.99531.42012.472.46870.9953
O36.89657.199.26760.94544.39484.665.70590.97932.19322.302.95720.99441.99392.052.69710.9954
CO0.03824.830.06280.94530.01962.520.03500.98300.01261.650.01960.99460.01231.630.01920.9949
NO22.693512.673.89060.75071.78828.442.51330.89601.10735.111.57080.95940.99064.621.42390.9666
SO21.443316.432.07740.78760.71748.490.98740.95200.52746.170.76910.97090.51246.050.75680.9718
Table 8. Performance evaluations of all forecasting models for air pollutants in September and October.
Table 8. Performance evaluations of all forecasting models for air pollutants in September and October.
Sept.LSSVMEEMD-LSSVMCEEMD-LSSVMCEEMD-BBODE-LSSVM
MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2
PM2.51.85979.262.67440.96320.84733.911.27250.99170.58452.730.85040.99630.53292.550.78360.9968
PM103.18817.254.60040.95441.48003.292.04620.99100.99062.151.40810.99570.94642.101.33100.9962
O36.10377.128.48230.94443.85984.525.50170.97661.94232.302.67340.99451.79362.102.48330.9952
CO0.03434.800.05010.92310.03374.700.05010.92290.01081.510.01590.99220.01061.490.01550.9926
NO23.315211.054.74730.85772.57339.403.67970.91451.33274.441.91690.97681.19593.991.71560.9814
SO21.566614.182.1999 0.69016.460.92780.96110.56275.080.80560.97070.54294.870.77640.9728
Oct.LSSVMEEMD-LSSVMCEEMD-LSSVMCEEMD-BBODE-LSSVM
MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2MAE
(µg/m3)
MAPE
(%)
RMSE
(µg/m3)
R2
PM2.53.142914.755.22800.96322.09407.183.96700.97881.00804.021.80610.99560.96563.871.64850.9963
PM105.51879.848.93700.95963.31635.395.77640.98311.76392.982.95400.99561.71072.642.82700.9960
O35.28738.897.56330.96223.17995.054.60410.98601.74902.992.57310.99561.57492.702.31230.9965
CO0.04916.260.08830.91850.04756.090.08460.92510.01732.220.03290.98870.01712.100.03180.9895
NO23.219210.744.62400.90252.42568.043.72720.93661.23794.131.81600.98501.14403.811.67780.9872
SO21.591214.012.21610.85780.71226.400.99190.97150.56055.010.82410.98030.55414.900.79950.9815
Table 9. The evaluation results of interval forecasting using CP and AW.
Table 9. The evaluation results of interval forecasting using CP and AW.
Indexes PM2.5PM10O3CONO2SO2
aCPAWCPAWCPAWCPAWCPAWCPAW
Jul.0.194.26%13.661890.48%31.027393.12%26.780798.60%0.207992.72%9.932590.20%8.4259
0.290.62%10.417489.92%24.498889.22%20.759697.06%0.160682.21%7.400988.80%6.6628
0.384.45%8.386886.30%22.490481.79%16.755294.68%0.129276.19%6.251584.03%5.1160
0.476.47%6.937678.01%15.550074.37%13.727790.03%0.091171.43%4.987079.61%4.2380
Aug.0.194.04%16.154996.78%37.358291.27%28.647798.34%0.298091.83%10.640691.27%9.8413
0.289.06%10.485395.24%29.414084.90%22.180296.81%0.231883.52%8.251789.61%7.6514
0.384.49%10.196891.60%23.330576.45%17.950094.74%0.184978.39%6.944686.70%6.1226
0.480.03%8.191888.39%19.245668.42%14.603891.74%0.139073.14%5.002582.57%4.7893
Sept.0.197.36%10.855489.52%27.870791.79%27.538599.27%0.264692.82%13.688790.03%10.6070
0.295.60%8.847188.08%22.503486.22%21.481098.24%0.205281.97%10.554888.94%8.0384
0.391.94%7.096587.50%18.571480.21%17.453094.13%0.162779.62%8.619186.07%6.5208
0.488.21%6.126386.13%14.751672.73%14.143190.38%0.132876.93%5.997283.16%3.9879
Oct.0.194.15%17.760192.72%39.643692.89%23.296296.03%0.332293.57%14.722094.39%11.6736
0.290.97%13.187289.64%31.128588.74%18.023792.89%0.258783.88%11.479092.75%9.2227
0.387.82%10.954183.38%24.945779.34%14.687990.83%0.210681.53%9.253690.01%7.2835
0.484.43%7.473280.13%20.690172.09%11.930888.86%0.188478.64%7.037687.49%5.0129
Table 10. The parameters of the cloud model for all criteria.
Table 10. The parameters of the cloud model for all criteria.
LevelsPM2.5PM10O3
ExEnHeExEnHeExEnHe
I17.511.671.172516.671.6753.330.33
II5513.331.3310033.333.3385505
III9513.331.3320033.333.33187.518.331.83
IV132.511.671.1730033.333.3324016.671.67
V20033.333.3338523.332.33532.5178.3317.83
VI291.9928.002.80457.9525.302.53988.8125.8612.59
LevelsCONO2SO2
ExEnHeExEnHeExEnHe
I10.670.072013.331.332516.671.67
II30.670.076013.331.3310033.333.33
III93.330.3313033.333.3320033.333.33
IV193.330.3323033.333.33362.5757.5
V3040.4422.5959.5637.5108.3310.83
VI44.215.470.5570794.679.47989.97126.6512.67
Table 11. Polynomial regression for Bmax of all criteria with level VI.
Table 11. Polynomial regression for Bmax of all criteria with level VI.
IndicesPolynomial RegressionBmax of Level VI
PM2.5f(x) = 8.21x2 + 1.21x + 31333.99
PM10f(x) = −4.29x2 + 119.7x − 68495.91
O3f(x) = 54.64x2 − 159.4x + 1671177.6
COf(x) = 1.43x2 + 0.23x − 0.452.42
NO2f(x) = 35x2 − 85x + 99849
SO2f(x) = 41.07x2 − 63.93x + 851179.94
Table 12. AHP-entropy weights for all criteria.
Table 12. AHP-entropy weights for all criteria.
CriteriaAHP Weight zEntropyEntropy Weight ωEntropy-AHP Weight W
PM2.50.34.66920.23480.4292
PM100.35.08280.19170.3505
O30.2335.12810.06210.0881
CO0.16.98100.07210.0439
NO20.0334.04070.17300.0348
SO20.0334.17330.26620.0535
Table 13. The forecasting samples from test subset used for evaluation.
Table 13. The forecasting samples from test subset used for evaluation.
DatePM2.5
(µg/m3)
PM10
(µg/m3)
O3
(µg/m3)
CO
(µg/m3)
NO2
(µg/m3)
SO2
(µg/m3)
Cases
1 July 2015 1:0028.770655.760267.34500.969725.860410.2004A1
1 July 2015 23:0072.9066107.0337165.49871.108620.479811.1226A2
2 July 2015 9:008.420520.896882.30720.426720.517610.7273A3
2 August 2015 17:0047.548369.4576175.06330.763422.80886.7971A4
14 August 2015 20:00127.6426178.7458217.25561.199325.261313.9264A5
15 August 2015 0:00154.4614211.3916228.80581.324417.048516.1272A6
1 September 2015 1:0017.903734.841878.10090.685724.92477.6599A7
1 Octorber 2015 8:0020.588737.058895.60891.022821.82824.3759A8
5 Octorber 2015 13:0075.7741135.5992157.90361.104925.818121.8493A9
Table 14. The final results of evaluation for air quality.
Table 14. The final results of evaluation for air quality.
CasesFinal Certainty DegreeFinal Air Quality Level
IIIIIIIVVVI
A10.45990.29300.00300.00000.00330.0000I
A20.13160.54210.16230.00000.01150.0000II
A30.91190.11420.00180.00000.00400.0000I
A40.13310.61360.07360.00010.01210.0000II
A50.12760.03080.33460.42780.06080.0000IV
A60.12720.00850.15540.18770.34080.0000V
A70.85180.15320.00250.00000.00380.0000I
A80.81380.16520.00290.00000.00480.0000I
A90.12840.35890.23230.00000.01080.0000II
Table 15. The results of D-M test.
Table 15. The results of D-M test.
D-M TestJul.
Benchmark ModelTarget ModelPM2.5PM10O3CONO2SO2
LSSVMCEEMD-BBODE-LSSVM4.28249 *6.99631 *11.55773 *6.69119 *8.35994 *7.56095 *
EEMD-LSSVMCEEMD-BBODE-LSSVM5.43788 *6.39938 *7.41868 *4.92945 *8.24263 *5.24603 *
CEEMD-LSSVMCEEMD-BBODE-LSSVM2.00715 **1.81403 ***5.80117 *1.65674 ***2.72935 *2.51401 **
D-M TestAug.
Benchmark ModelTarget ModelPM2.5PM10O3CONO2SO2
LSSVMCEEMD-BBODE-LSSVM8.35825 *5.99979 *12.10402 *7.63585 *8.51850 *8.87180 *
EEMD-LSSVMCEEMD-BBODE-LSSVM6.60765 *5.96828 *14.35558 *4.55709 *9.98957 *6.79926 *
CEEMD-LSSVMCEEMD-BBODE-LSSVM5.06978 *0.133364.62010 *1.77389 ***5.05217 *0.77962
D-M TestSep.
Benchmark ModelTarget ModelPM2.5PM10O3CONO2SO2
LSSVMCEEMD-BBODE-LSSVM8.77114 *9.34361 *11.15465 *9.87993 *10.61179 *10.64809 *
EEMD-LSSVMCEEMD-BBODE-LSSVM5.63757 *9.63785 *8.88177 *9.19687 *9.92837 *4.73004 *
CEEMD-LSSVMCEEMD-BBODE-LSSVM3.93133 *3.29112 *3.60436 *2.21033 **5.59378 *1.98392 **
D-M TestOct.
Benchmark ModelTarget ModelPM2.5PM10O3CONO2SO2
LSSVMCEEMD-BBODE-LSSVM5.26581 *6.48092 *9.63251 *5.52022 *9.57181 *10.60434 *
EEMD-LSSVMCEEMD-BBODE-LSSVM7.64110 *7.62847 *10.65943 *5.52252 *7.21038 *5.42034 *
CEEMD-LSSVMCEEMD-BBODE-LSSVM3.26291 *2.27028 **4.95977 *1.484175.29233 *1.99318 **
* Denotes the 1% significance level; ** Denotes the 5% significance level; *** Denotes the 10% significance level.

Share and Cite

MDPI and ACS Style

Wang, J.; Niu, T.; Wang, R. Research and Application of an Air Quality Early Warning System Based on a Modified Least Squares Support Vector Machine and a Cloud Model. Int. J. Environ. Res. Public Health 2017, 14, 249. https://doi.org/10.3390/ijerph14030249

AMA Style

Wang J, Niu T, Wang R. Research and Application of an Air Quality Early Warning System Based on a Modified Least Squares Support Vector Machine and a Cloud Model. International Journal of Environmental Research and Public Health. 2017; 14(3):249. https://doi.org/10.3390/ijerph14030249

Chicago/Turabian Style

Wang, Jianzhou, Tong Niu, and Rui Wang. 2017. "Research and Application of an Air Quality Early Warning System Based on a Modified Least Squares Support Vector Machine and a Cloud Model" International Journal of Environmental Research and Public Health 14, no. 3: 249. https://doi.org/10.3390/ijerph14030249

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop