Next Article in Journal
Toxic and Trace Elements in Seaweeds from a North Atlantic Ocean Region (Tenerife, Canary Islands)
Previous Article in Journal
Effect of Word-of-Mouth Communication and Consumers’ Purchase Decisions for Remanufactured Products: An Exploratory Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Ozone Hourly Concentrations Based on Machine Learning Technology

College of Economics and Management, Xi’an University of Posts & Telecommunications, Xi’an 710061, China
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(10), 5964; https://doi.org/10.3390/su14105964
Submission received: 31 March 2022 / Revised: 5 May 2022 / Accepted: 9 May 2022 / Published: 14 May 2022

Abstract

:
To optimize the accuracy of ozone (O3) concentration prediction, this paper proposes a combined prediction model of O3 hourly concentration, FC-LsOA-KELM, which integrates multiple machine learning methods. The model has three parts. The first part is the feature construction (FC), which is based on correlation analysis and incorporates time-delay effect analysis to provide a valuable feature set. The second part is the kernel extreme learning machine (KELM), which can establish a complex mapping relationship between feature set and prediction object. The third part is the lioness optimization algorithm (LsOA), which is purposed to find the optimal parameter combination of KELM. Then, we use air pollution data from 11 cities on Fenwei Plain in China from 2 January 2015 to 30 December 2019 to test the validity of FC-LsOA-KELM and compare it with other prediction methods. The experimental results show that FC-LsOA-KELM can obtain better prediction results and has a better performance.

1. Introduction

In recent years, with the rapid development of China’s economy, the ozone (O3) content in the air has gradually increased, and ozone has become another serious air pollutant in addition to PM2.5 and PM10. The formation mechanism of ozone is complex. In short, it is produced by the photochemical reaction of nitrogen oxides (NOx) in the atmosphere with volatile organic compounds (VOCs) [1]. High wind speed, high temperature, and low relative humidity also significantly promote the formation of ozone [2,3,4]. With the continuous increase in ozone concentration, its harmfulness becomes more and more serious. The study by Bell et al. [5] showed that high concentrations of ozone can have effects on human health, such as headache, chest pain, sore throat, cough and decreased lung function. However, ozone is not only harmful to human health, but also has a significant negative impact on the human living environment. Mills et al. [6] found that if the ozone concentration in the air exceeds 40 ppbv for a long time, crops and ecosystems will be damaged. It can be seen that ozone is very harmful. If a prediction method can be found that can predict changes in ozone concentration in a timely and accurate manner, it will help to alert the public to reduce the risk of exposure to high concentrations of ozone and mitigate the harm of ozone to public health [7]. However, the formation of ozone requires a series of chemical reactions, which makes ozone prediction challenging.
As a typical time series, the change in O3 concentration is affected by a variety of external factors, making it difficult to predict. To solve this problem, scholars have proposed some typical solutions, such as machine learning techniques and fuzzy set theory. These methods can provide valuable predictive results in some cases, but two major challenges remain. First of all, the existing O3 concentration prediction is mainly to design and improve a certain algorithm to achieve the prediction of ozone concentration in a specific region. Due to the high complexity of the O3 concentration data itself, the performance of a single algorithm is limited. Secondly, for those algorithms that use multivariate prediction models to predict ozone, without screening when selecting predictive features, it may result in higher computational complexity and even lower accuracy.
In order to cope with various challenges in ozone concentration prediction, and also to further improve the prediction accuracy of O3, this paper proposed an ozone prediction model that integrates multiple methods. The model mainly consists of three parts. The first part is the feature construction method, which is based on correlation analysis and incorporates time-delay effect analysis to provide a valuable feature set for subsequent prediction algorithms. The second part is the kernel extreme learning machine (KELM), which is an improvement of extreme learning machine and can be used for O3 prediction. KELM has the strong nonlinear fitting ability and can establish a complex mapping relationship between feature set and prediction object, but its fitting accuracy is affected by several hyper-parameters. The third part is the lioness optimization algorithm, which is a meta-heuristic optimization algorithm based on population. The purpose of this algorithm is to find the optimal parameter combination of the KELM. Then, we used air pollution data from 11 cities on the Fenwei Plain in China from 2 January 2015 to 30 December 2019 to test the validity of the prediction model and compared it with other prediction methods.
We made the following contributions in this paper.
(1)
We proposed a feature construction algorithm which can analyze the interaction strength between ozone and itself and other atmospheric pollutants from the perspectives of time and space, and built a feature set for ozone prediction based on this algorithm.
(2)
We proposed an ozone concentration prediction model, FC-LsOA-KELM, which comprehensively uses the feature construction algorithm, the kernel extreme learning machine and the lioness optimization algorithm.
(3)
We evaluated the prediction performance of FC-LsOA-KELM using 2015–2019 air pollution data from 11 cities on the Fenwei Plain in China. The results showed that the O3 prediction model proposed in this paper can obtain better prediction results and has a better performance compared with other prediction models.
The rest of this paper is organized as follows. The second section introduces the related work on ozone concentration prediction. In Section 3, the feature construction algorithm, lioness optimization algorithm and ozone concentration prediction model are discussed. The fourth section gives an overview of the research area and the experimental evaluation results. The fifth section summarizes the whole paper.

2. Related Work

Research on ozone prediction has been carried out by scholars for many years, and a certain number of research results have been formed which show some different methods for ozone prediction. These methods can be mainly divided into the following categories:
(1)
Prediction method based on linear regression. The linear regression prediction method is to study the linear causal relationship between the predictor (historical data of air pollutants) and the explanatory variable (future ozone concentration). The most commonly used linear regression methods in O3 prediction include ARIMA [8,9] and multiple linear regression (MLR) [10,11,12,13]. Although these statistical methods have been widely used for near-earth O3 concentration prediction, they also have many limitations. For example, when using MLR prediction, a large number of predictors need to be used, and these predictors often have multicollinearity problems [14]. In addition, the formation process of O3 is strongly nonlinear, and the concentration of ozone also depends on many other factors, such as meteorological factors (temperature, relative humidity, etc.), atmospheric transport process, and the concentration of ozone precursor compounds (VOC, NOx, etc.). These characteristics mean that the prediction accuracy of existing statistical models is often not ideal when predicting ozone concentration; especially when predicting some extreme values, the prediction error is large [15].
(2)
Prediction method based on artificial neural network (ANN). ANN is one of the most commonly used machine learning methods for ozone prediction. The ANN, derived with limited prior knowledge, is a nonlinear prediction model [15,16]. Therefore, it can find the nonlinear relationship between meteorological and photochemical processes and ozone concentration at a particular site, and then realize the prediction of ozone concentration. By comparing the prediction results of ANN with MLR and ARIMA, scholars found that ANN is more effective than statistical models such as ARIMA and MLR [17,18,19,20].
(3)
Prediction method based on support vector machine (SVM). SVM is a machine learning technique that has been widely applied to regression cases and classification problems [21]. Like ANN, SVM is a machine learning technique commonly used for ozone prediction [22,23,24]. Through studies, scholars have found that SVR has greater superiority and accuracy compared with statistical prediction methods such as ARIMA and MLR [25]. However, some researchers [26] have pointed out that ANN and SVM are not perfect, and they believed that these two methods still have certain limitations in ozone prediction. They proposed that ANN and SVM are easy to produce overfitting and local minimum problems, resulting in a poor prediction stability of the prediction model.
(4)
Prediction method based on fuzzy set theory. In 1993, Song and Chissom proposed a fuzzy time series (FTS) based on fuzzy set theory. Subsequently, scholars tried to apply this theory to O3 prediction. Domanska and Wojtylak [27] proposed a prediction method based on fuzzy set theory which uses a fuzzy time series model to predict O3, CO, NO and other pollutants. Although the prediction results were satisfactory, the lack of uncertainty and instability analysis in this paper led to doubts about the reliability of the prediction method [28].
(5)
Prediction method based on deterministic models. Deterministic models are based on mathematical equations describing chemical and physical processes in the atmosphere [29], and follow the principle of cause and effect [15]. When using a deterministic model for prediction, the O3 reaction equation must be established first, and then a large amount of ozone precursor and meteorological parameter data should be collected. In these two works, the design of the reaction equation is the key. If the equation design is not considered properly, or the parameters are not appropriate, the accuracy of the prediction model will be greatly reduced.

3. Method Design

In this section, Section 3.1 discusses the design of the feature construction algorithm, Section 3.2 introduces the kernel extreme learning machine, Section 3.3 describes the lioness optimization algorithm, and Section 3.4 discusses a model for O3 hourly concentration prediction: FC-LsOA-KELM.

3.1. Design of Feature Construction Algorithm

Suppose there is a multivariate time series, V , which contains m subsequences, and each subsequence has n time observation point data. Where V 1 is our prediction object, and we hope to predict the future value of V 1 by mining the historical values of V 1 , V 2 , …, V m .
V = ( V 1 V 2 V 3 V m ) = ( v 1 , 1 v 1 , 2 v 1 , 3 v 1 , n v 2 , 1 v 2 , 2 v 2 , 3 v 2 , n v 3 , 1 v 3 , 2 v 3 , 3 v 3 , n v m , 1 v m , 2 v m , 3 v m , n )
In the process of constructing V 1 prediction, the feature set F that may have a potential correlation with V 1 needs to be found first. In order to construct F , this paper proposed a feature construction algorithm for multivariate time series which integrates time-delay effect analysis and correlation coefficient sorting. This algorithm uses the correlation coefficient c between the candidate feature V i and the prediction target V 1 as the criterion. When the correlation coefficient c between the candidate feature and the prediction target is greater than the specified threshold c ¯ , the feature V i will be included in the feature set F . In addition, in order to assist users in further refining and screening the feature set F , the algorithm optimizes the structure of the feature set and ranks the feature set F according to the level of the correlation coefficient c . The higher the correlation coefficient, the higher the ranking of the feature V i .
In the specific construction process, this method not only considers the degree of correlation between V 2 ( t τ ) , …, V m ( t τ ) and V 1 at different time delays τ , but also considers the degree of correlation between V 1 itself under a certain time delay τ , that is, the degree of correlation between V 1 ( t τ ) and V 1 itself. The specific Algorithm 1 is described as follows:
Algorithm 1. Feature Construction algorithm
1Inputs: The Original multivariate time series V, Maximum time delay,
2Correlation coefficient threshold c ¯ , Feature set F = N u l l
3Outputs: Feature set F
4Whilei <= τ M A X do    % i is the time series delay value
5While j <= m       % m is the number of variables in
6  Calculate the correlation coefficient c j , i between V 1 and V j , i
7  if  c j , i >= c ¯
8    C ( i , 1 ) = c j , i , C ( i , 2 ) = i ,   C ( i , 3 ) = j, % C is used to record candidate feature information
9  end if
10End while
11End while
12Sort C in descending order according to the first column value of C% Sort by correlation coefficient
13While C ( i , : ) N U L L
14 Generate feature V j , i based on the values of C i , 2 and C i , 3
15  F = F V j , i
16End while
17ReturnF

3.2. Kernel Extreme Learning Machine (KELM)

Extreme learning machine (ELM) is a fast-learning method based on a single-hidden-layer feedforward neural network [30,31]. This method only requires specifying the number of nodes in the hidden layer, and then the minimal 2-norm least-squares solution can be obtained by solving the linear system of equations, and this solution can be used as the output weight of the hidden layer. The learning process for ELM is only once. Compared with the traditional neural network, the network generalization ability and learning speed of ELM are significantly improved. For the set ( x i , y i ) i = 1 Q with Q samples, where x i R n is the input vector and y i R m is the corresponding expected output vector, the mathematical equation of ELM is:
y i = j = 1 l β j g ( ω j x i + b j ) i = 1 , 2 , , Q
where ω j = [ ω 1 j   ω 2 j     ω n j ] T is the weight value connecting the j th hidden layer node and the input node, b j is the bias of the j th hidden layer node, β j = [ β j 1   β j 2     β j m ] T is the weight value connecting the j th hidden layer node and the output node, and g(⋅) represents the activation function.
In fact, Equation (2) can also be rewritten as a matrix:
Y = H β
where Y is the output matrix of the output layer, β is the output weight matrix, and H is the output matrix of the hidden layer. These can be specifically expressed as:
H = [ g ( ω 1 x 1 + b 1 ) g ( ω 2 x 1 + b 2 ) g ( ω l x 1 + b l ) g ( ω 1 x 2 + b 1 ) g ( ω 2 x 2 + b 2 ) g ( ω l x 2 + b l ) g ( ω 1 x Q + b 1 ) g ( ω 2 x Q + b 2 ) g ( ω l x Q + b l ) ] Q × l
β = [ β 1 β 2 β l ] T
Y = [ y 1 y 2 y Q ] T
Based on the theory of ELM, the input weights and bias of the hidden layer are randomly generated, so only the output weights need to be determined during the training process. According to the minimum norm solution rule [32], the corresponding solution for β can be expressed as:
β = H + Y = H T ( H H T + η I ) 1 Y
where H + is the M o o r e P e n r o s e generalized inverse of the output matrix H of the hidden layer. H + can be obtained analytically using the orthogonal projection method or the singular value decomposition method.
Since the kernel function mapping ϕ(x) in SVM is similar to the hidden layer node mapping h(x) in ELM, Huang [33] proposed to replace the h(x) in ELM with the kernel function mapping ϕ(x) in a support vector machine to construct the KELM algorithm. This algorithm solves the problem that ELM needs to determine the number of hidden layers, and has a better generalization performance. The kernel matrix in KELM is defined as:
Ω KELM = h ( x ) h ( x ) = K ( x , x )
The corresponding KELM output function can be expressed as:
f ( x ) = h ( x ) β ^ = [ K ( x , x ) K ( x , x ) ] T ( I C + Ω KELM ) 1 Y
Since Ω KELM adopts the form of the inner product, KELM does not need to set the number of hidden layer nodes when solving the output function value, and does not need to set the initial weight and bias of the hidden layer.

3.3. Lioness Optimization Algorithm (LsOA)

The research shows that the fitting accuracy and generalization ability of KELM are affected by its kernel parameters. Therefore, it is necessary to adopt a suitable optimization algorithm to optimize its kernel parameters. The existing research results mainly use genetic algorithm [34] and particle swarm optimization [35] to optimize KELM’s parameters. Although these methods have the possibility of finding the optimal parameters, they still have some problems, such as a slow iteration rate and an ease of falling into local optimum. In order to overcome these problems, and also to enable KELM to have a better predictive performance, this paper proposes a novel population-based meta-heuristic optimization algorithm—the lioness optimization algorithm (LsOA). The algorithm uses the lioness hunting mechanism as the prototype, and comprehensively considers the two methods of team hunting and elite hunting. In addition, in order to avoid premature convergence of the LsOA, and also referring to the behavioral process of the lioness hunting, this paper also introduces a phase-focused strategy in the elite hunting mechanism.
In LsOA, the prey is the global optimal solution, and the current optimal solution is considered to be lioness A or the top lioness. The lions mentioned are all candidate solutions, and they are all lionesses by default.
(1)
Team hunting mechanism. This mechanism refers to the lionesses in the lion pride hunting in a cooperative way. When hunting in groups, lionesses often form a team for collective hunting. One part of the lioness (“wings”) surrounds the prey, and another part of the lioness (“center”) moves relative to the position of the other lioness (“wings”) and the prey. When a part of the lioness at the “wings” begins to charge towards the prey, the lioness in the “center” role will cautiously approach the target and use all the barriers that can be used as a cover to hide as much as possible. When the “central” lioness is close enough to its prey, it can suddenly pounce on the target and catch the prey. This mathematical model is as follows:
D = | C   Prey   X ( t ) |
X ( t + 1 ) =   Prey   A D
where t is the current iteration number, X ( t ) is the position vector of the lioness, D is the distance between the current lioness and the prey, represents the dot product, and   Prey   is the position vector of the prey. In these formulas, the calculation formulas of A and C are as follows:
A = 2 a r 1 a
C = 2 r 2
where a = 2 I t e r 2 M a x _ i t e r [ 0 , 2 ] , decreasing with the increase in iteration times. r 1 , r 2 are both random vectors between [0, 1].
When capturing prey, we believe that the location of the target prey should be near the “central circle” of the hunting team. And this center circle is composed of the four best candidate solutions in the population (lioness A, lioness B, lioness C and lioness D) and the average of the four. The mathematical model is as follows:
D A = | C 1 X A X | , D B = | C 2 X B X | D C = | C 3 X C X | , D D = | C 4 X D X |
X a = X A A 1 ( D A ) , X b = X B A 2 ( D B ) X c = X C A 3 ( D C ) , X d = X D A 4 ( D D )
X a v e = X a + X b + X c + X d 4
where the calculation of A i and C i (i = 1, 2, 3, 4) is shown in Equations (12) and (13), X A , X B , X C , X D are the positions of lioness A, lioness B, lioness C and lioness D, respectively, X a v e is the average of X a , X b , X c , X d and X is the position of the current prey.
According to Equations (15) and (16), the positions of the lions at the center of the hunting team can be determined; these are X a , X b , X c , X d , respectively, and the last candidate is the average X a v e of these four candidates. These five positions constitute the hunting team’s “center circle”:
E _ Team = [ X a ; X b ; X c ; X d ; X a v e ]
Figure 1 shows the construction process of the “central circle”. Since the position of the target prey in the search space is not known a priori, this paper assumes that the position of the target prey may be located near any of these five positions, and the probability of the five candidates being selected is the same, which is 0.2.
(2)
Elite hunting mechanism. In addition to the team hunting, according to the theory of the survival of the fittest, the top lioness (the most physically strong lioness) sometimes hunts alone, that is, using the elite hunting mechanism. At this time, the population evolves into some agents of the elite through these agents to constantly test the direction and position of the elite and finally achieve the goal of capturing the prey. In order to avoid the risk of falling into a local optimum caused by relying only on a single elite position, here we draw on the principle of triangular stability, and replace the common method for determining the position of the elite from the common last-round best to a joint decision by the top three agent positions in the last-round fitness value. Figure 2 shows the construction process of the elite matrix.
Top _ lioness _ pos = X A + X B + X C 3
(3)
The march strategy. With the increasing number of optimization iterations, the risk of the lions falling into local optimization increases. In order to enable the lions to quickly explore a new area, this paper designed the march strategy. The implication is that when r 2 M , all dimensions of the predator are unified into a unique value. Otherwise, each dimension will randomly select a value in the E _ T e a m to be updated with the following formula:
P r e y ( i , j ) = E _ T e a m ( r a n d i ( s i z e ( E _ T e a m , 1 ) ) , : ) , IF ( r 2 > M )
P r e y ( i , : ) = E _ T e a m ( r a n d i ( s i z e ( E _ T e a m , 1 ) ) , : ) , IF ( r 2 M )
where r a n d i ( ) is used to generate pseudo-random integers and size() is used to return the size of the vector. M indicates the degree of influence of the march strategy on the process. The prey here is a matrix with the same dimension as the elite matrix.
(4)
Phase-focused strategy. The focus of the phase is to divide the lion hunting process into three phases, namely, the early iteration, the middle iteration and the late iteration [36]. Different search mechanisms were used for each stage. At the beginning of the iteration, the prey is energetic and moves fast. At this time, the lions disperse randomly in the search area and use Brownian motion to find the prey. This phase of the algorithm focuses on exploration. When the number of iterations reaches one-third, the lions begin to narrow the encircling circle to encircle the target prey. The prey being chased by the lions uses Levy flight to escape and flee, and individual elites of the lions also adopt Levy flight to chase the prey. At this phase, exploration is as important as exploitation. At the end of the iteration, as the physical strength of the target prey decreases and its speed slows down, it is no longer able to flee for a long distance. Meanwhile, the encircling circle of the lions becomes smaller and smaller, and the probability of capturing the prey increases greatly.
① The early iteration:
While I t e r < 1 3   M a x _ i t e r ,
step i = RB ( Elite _ lioness i RB Prey i ) i = 1 , n Prey i = Prey i + P R step i
where I t e r is the current iteration number and M a x _ i t e r is the maximum iteration number. RB is a vector of random numbers representing Brownian motion. The symbol means multiplication term by term. P = 0.5, and R is a random vector in [0, 1]. RB Prey i simulates the behavior of the prey.
② The middle iteration:
While 1 3 M a x _ i t e r < I t e r < 2 3 M a x _ i t e r
For prey:
step i = RL ( Elite _ lioness i RL Prey i ) i = 1 , , n / 2 Prey i = Prey i + P R step i
where RL is the random number vector representing Levy’s flight, and RL Prey i simulates the behavior of the prey. For lions, this study assumes:
step i = RB ( RB Elite _ lioness i Prey i ) i = n / 2 , , n Prey i = Elite _ lioness i + P . C F step i
where CF = ( 1 Iter Max _ Iter ) ( 2 Iter Max _ Iter ) , regarded as an adaptive parameter that controls the step length of the lion’s movement. RB Elite _ lioness i simulates the behavior of a predator (lioness).
③ The late iteration:
While I t e r > 2 3 M a x _ i t e r
step i = RL ( RL Elite _ lioness i Prey i ) i = 1 , , n Prey i = Elite _ lioness i + P . C F step i
where RL Elite _ lioness i simulates the behavior of a predator (lioness).
In addition, it has been found that many animals in the state of starvation have the characteristics of Brownian motion [36]. That is, they will turn suddenly during the movement, and the time interval of each turn is unpredictable. For this reason, we assumed that when a female lion observes a nearby prey, it will use Brownian motion to contain the prey. However, if there is a lack of prey within the territory and the lion needs to explore new territory, the lion will abandon Brownian motion and adopt Levy’s flight strategy instead [36]. During the design process of LsOA, we also considered this part of the content.
The pseudo code of the LsOA is as follows (Algorithm 2):
Algorithm 2. Lioness Optimization Algorithm
1Initialize search agents(Prey) population i = 1, …, n
2Assign free parameters: F A D s = 0.2; P = 0.5; Q = 0.5; M = 0.9
3While Iter < Max_iter
4 Calculate the fitness of each search agent
5 X A = the best search agent
6 X B = the second best search agent
7 X C = the third best search agent
8 X D = the fourth best search agent
9 Update C F , a , RL   and   RB
10  If ( q Q )     % Team hunting
11   For each search agent
12    Update A and C by the Equations (12) and (13)
13    Use A and C to calculate D
14    Calculate X a , X b , X c and X d by the Equation (15)
15     X a v e = X a + X b + X c + X d 4
16    Construct the “center circle”: E _ T e a m = { X a , X b , X c , X d , X a v e }
17    If ( r 2 < M )
18     Update the position of the current search agent by the Equation (19)
19    else if ( r 2 M )
20     Update the position of the current search agent by the Equation (20)
21    end if
22   End for
23  Else if ( q Q )    % Elite hunting
24   Top_lioness_pos = X A + X B + X C 3
25   Construct the Elite matrix and accomplish memory saving
26   For each search agent
27    If Iter < Max_iter/3
28     Update the position of the current search agent by the Equation (21)
29    else if Max_iter/3 < Iter < 2 Max_iter/3
30     For the first half of the populations (i = 1, …, n/2)
31      Update the position of the current search agent by the Equation (22)
32     End for
33     For the other half of the populations (i = n/2, …, n)
34      Update the position of the current search agent by the Equation (23)
35     End for
36    else if Iter > 2 Max_iter/3
37     Update the position of the current search agent by the Equation (24)
38    end if
39   End for
40  End if
41 Update Top_lioness_pos if there is a better solution
42 Applying FADs effect and update the position of the current search agent
43Iter = Iter + 1
44End while
45ReturnTop_lioness_pos

3.4. Design of the Prediction Model

Based on the methods introduced before, this paper proposes an hourly ozone concentration prediction model—FC-LsOA-KELM. The model structure is shown in Figure 3. Firstly, this model used the feature construction algorithm (FC) to construct the feature set of historical air pollution data with ozone concentration as the explained variable, thereby constructing the explanatory variable set. Then, according to the needs of model training and prediction, the feature data set was divided into a training set A and a test set B (the specific division is in Section 4.2). Finally, the model training set was put into KELM to train the ozone concentration prediction model. Since the fitting accuracy and generalization ability of the KELM are affected by the kernel parameters, this paper used LsOA to optimize the parameters of the KELM.
In addition, during this experiment, it was found that the KELM needs to read all the model training sets into the memory at one time to solve the linear equation system, and then obtain the minimal 2-norm least-squares solution. This calculation method has a high demand on the memory of the equipment used for model training, especially when the data set is large. To avoid the problem of memory overrun due to a large amount of data, this paper carried out random sampling on a training set A , and extracted 20% of the data each time as a model training set A 1 for training. At the same time, so as to avoid model training bias and overfitting problems caused by improper sampling, this paper made the following two improvements:
(1)
Randomly select 10% of the data from the training set A excluding A 1 as the model validation set A 2 , which was used to test the performance of the KELM trained by the training set on the untrained data set. Its purpose is to test the generalization performance of the prediction model.
(2)
Instead of taking the training results of the model training set as the optimization object in common optimization algorithms, this paper redesigned the fitness function:
f i t n e s s ( γ i ) = f ( A 1 , K E L M , γ i ) + f ( A 2 , K E L M , γ i ) 2 + | f ( A 1 , K E L M , γ i ) f ( A 2 , K E L M , γ i ) 2 |
where f i t n e s s ( ) is the fitness function, γ i is the position of the lioness, which is also the kernel parameter of KELM; as above, A 1 is the model training set and A 2 is the model validation set; | | means the absolute value. f ( ) is the evaluation indicator of the fitting accuracy of KELM. In this paper, MAPE (mean absolute percentage error) was used, which is calculated as follows:
M A P E = 1 K × t = 1 K |   o b s e r v e d t p r e d i c t e d t   o b s e r v e d t   | × 100 %
In Equation (26), o b s e r v e d represents the actual value, p r e d i c t e d represents the predicted value, and K is the total number of samples. The smaller the value of MAPE, the better the accuracy of the prediction model.
Figure 3 is a flow chart of the prediction model. By observing this flow chart, it can be found that after the prediction results are obtained, this paper adds the step of “prediction result correction”, which is mainly to correct the unreasonable values in the prediction results. The specific correction method is to perform statistical analysis on historical data monthly, statistically infer the maximum and minimum values of the predicted object, and then correct the value that exceeds the maximum or minimum range in the prediction result to the maximum or minimum value.

3.5. Comparison Methods

(1)
Multiple linear regression [37]. When performing regression analysis, we call regression with two or more independent variables under linear correlation conditions multiple linear regression (MLR). This method of predicting dependent variables using an optimal combination of several independent variables is usually more efficient than using only one independent variable for prediction or estimation. The mathematical equation of MLR is as follows:
y i = β 0 + β 1 · x i 1 + β 2 · x i 2 + + β n · x i n
where y i is the observed value of the dependent variable, i [ 1 , m ] , x i j is the value of the i -th dimension of the input variable j , β j is the regression coefficient of the input variable, j [ 1 , n ] , and its value is estimated using the ordinary least squares.
(2)
Gaussian process regression [38]. Gaussian process regression (GPR) is a nonparametric model that uses Gaussian process priors to perform a regression analysis on data, which provides flexibility for modeling stochastic processes. Compared with other models based on data parameters, GPR specifies a prior distribution over the function space, where the relationship between the data is encoded in the covariance function k ( x 1 , x 2 ) of the multivariate Gaussian distribution. The exponential square function, which is commonly used in many covariance functions, is shown in Equation (28).
k ( x 1 , x 2 ) = σ f 2 exp ( ( x 1 x 2 ) 2 2 l 2 )
where σ f is the variance and represents the noise degree of the data; l is the characteristic length scale parameter (the larger the value, the smoother the function).
(3)
Back propagation neural network [39]. The back propagation neural network (BPNN) is a multi-layer feedforward network trained according to error back-propagation. The basic idea of BPNN is the gradient descent method, which uses gradient search technology to achieve the minimum mean square error between the real output value and the expected output value of the network. It is the most widely used neural network.
(4)
Support vector regression [40]. SVM is a class of generalized linear classifiers that perform binary classification of data in a supervised learning manner. Support vector regression (SVR) is an application model of support vector machine in regression problems. The core idea is to find a hyperplane (hypersurface) that minimizes the expected risk.
(5)
Kernel ridge regression [41]. Ridge regression [42] is a well-known technique from multiple linear regression that implements a regularized form of least-squares regression. Kernel ridge regression (KRR) introduces the kernel function on the basis of ridge regression, realizes the mapping of low-dimensional data in high-dimensional space, and further constructs a linear ridge regression model in high-dimensional feature space to realize nonlinear regression [41]. At present, KRR is widely used in pattern recognition, data mining and other fields.
(6)
Decision tree. Decision tree (DT) is a non-parametric supervised learning method used for classification and regression. The goal is to create a model that learns simple decision rules from data features to predict the value of a target variable.
(7)
Stochastic gradient descent regression [43]. Stochastic gradient descent (SGD) is a simple but highly efficient method which is mainly used for the discriminant learning of linear classifiers under convex loss functions, such as (linear) support vector machines and logistic regression. Stochastic gradient descent regression supports different loss functions and penalties to fit linear regression models.
(8)
GCN [44]. A graph convolutional network (GCN) is actually a feature extractor, which is the same as a convolutional neural network (CNN), but its object is graph data. It can be applied to richer topological structure data, such as social networks, recommendation systems, transportation networks, etc. These data are characterized by disorderly connections. GCN cleverly designs a method to extract features from graph data, so that we can use these features to perform node classification, graph classification and edge prediction, and also obtain embedded representation of the graph by the way, which is widely used.
(9)
GAT [45]. Graph attention network (GAT) aggregates neighbor nodes through the attention mechanism and realizes the adaptive allocation of different neighbor weights. This is different from GCN. The weights of different neighbors in GCN are fixed, and they all come from the normalized Laplacian matrix. GAT greatly improves the expressive ability of the graph neural network model.
(10)
LsOA-KELM. The LsOA-KELM model uses the LsOA for the parameter optimization of KELM. Compared with the model in this paper, the data processed by the LsOA-KELM model are all original data without feature construction. When using the LsOA-KELM model, the population size of LsOA was 30, and the number of iterations was set as 50. The kernel function of KELM was set to ‘RBF’, and the other parameters were the default values of the original algorithm.
(11)
FC-LsOA-KELM--. The FC-LsOA-KELM-- model adds a feature construction step on the basis of LsOA-KELM, and reconstructs the original data and hands it to LsOA-KELM for training and prediction. The main difference between this model and the FC-LsOA-KELM model is the lack of a correction step for the predicted values. The parameter setting of the FC-LsOA-KELM model was consistent with LsOA-KELM.
We conducted many experiments on the parameter settings of the above-mentioned methods. The final parameter values and the differences between the methods are as follows (Table 1):

3.6. Evaluation Indicators

At present, there are many error evaluation indicators that can be used to evaluate the performance of regression prediction models. The common indicators are RMSE (root mean squared error), MAPE (mean absolute percentage error), R2 (R squared), MAE (mean absolute error), MSE (mean squared error) and so on. In this paper, the three most representative indicators, RMSE, MAPE and R 2 , were selected, where R 2 is a simple adjustment of R2 to facilitate subsequent statistical analysis. The calculation formula of MAPE among the three indicators is shown in Equation (26). The calculation formulas of RMSE and R 2 are as follows:
R M S E = 1 K t = 1 K ( o b s e r v e d t p r e d i c t e d t ) 2
R 2 = 1 t = 1 K ( o b s e r v e d t p r e d i c t e d t ) 2 t = 1 K ( o b s e r v e d ¯ p r e d i c t e d t ) 2
R 2 = 1 R 2
where o b s e r v e d represents the actual value, o b s e r v e d ¯ represents the actual average value, p r e d i c t e d represents the predicted value, and K is the total number of samples. The more accurate the prediction method, the closer MAPE, RMSE and R 2 will be to 0.

4. Research Results

4.1. Research Area

The Fenwei Plain is the general name of the Fenhe Basin and the Weihe Plain and its surrounding terraces in the Yellow River basin. The Fenwei Plain starts from Yangqu County in Shanxi Province in the north, reaches the Qinling Mountains in Shaanxi Province in the south, and reaches Baoji City in Shaanxi Province in the west. It is distributed in the northeast-southwest direction, with a length of about 760 km and a width of about 40 to 100 km. The Fenwei Plain includes Xi’an, Baoji, Xianyang, Weinan and Tongchuan in Shaanxi Province; Taiyuan, Jinzhong, Lvliang, Linfen and Yuncheng in Shanxi Province; and Luoyang and Sanmenxia in Henan Province, with a total land area of 70,000 square kilometers. It is the fourth largest plain in China and the largest alluvial plain in the middle reaches of the Yellow River, with a total population of 55.5445 million.
In recent years, with the rapid economic development and rapid population accumulation in the Fenhe Plain, the air quality in the area has deteriorated. According to data from the China Air Quality Monitoring Network, from 2015 to 2019, the average concentration of O3 in 11 cities in the Fenwei Plain showed an overall upward trend, with an annual average increase of 12.2 μg/m3, and it exceeded the secondary standard limit (160 μg/m3) from 2017 to 2019. In order to control the continued deterioration of air quality in the Fenwei Plain, in 2018, the Fenwei Plain was included in the “three-year action plan to fight air pollution” (National Development [2018] No. 22) of the State Council of China, making it one of the three key areas for continuous air pollution prevention and control. In this paper, the Fenwei Plain was selected as the research area, and the research area is shown in Figure 4.

4.2. Research Data

After selecting the research area, from the Tencent weather interface (http://weather.gtimg.cn/aqi/ (accessed on 30 January 2020)), we used crawler technology to crawl 11 cities’ hourly air pollution data, including Xi’an, Baoji, Xianyang, Weinan and Tongchuan in Shaanxi Province; Jinzhong, Lvliang, Linfen and Yuncheng in Shanxi Province; and Luoyang and Sanmenxia in Henan Province from 1 o’clock on 2 January 2015 to 23 o’clock on 30 December 2019. These air pollutants included O3, PM2.5, PM10, SO2, NO2, CO and AQI. The initial amount of data collected in each city was 43,799. Due to some missing data, in order to avoid the impact of missing data on the prediction model training, the data items involving missing values were eliminated for this paper. In the end, the amount of remaining data in each city was 42,312, a total of 1763 days. Then, we divided the dataset according to the number of days. Among them, 1733 days of data were divided into the training set (41,592), and the remaining 30 days were divided into the test set (720).

4.3. Feature Construction

Air pollution has the characteristics of regional diffusion; that is to say, the pollutants in the current place A will spread to place B with the passage of time and airflow, and then have an impact on the ozone concentration of place B. Therefore, when constructing the prediction feature set, we should consider not only the time-delay effect between local pollutants, but also the flow of pollutants between regions. Based on the above reasons, when constructing the ozone prediction feature set in a certain area, the air pollution data of 11 cities in the study area will be analyzed through the feature construction algorithm proposed in this paper, and the possible spatio-temporal effects will be mined. Finally, a relatively complete prediction feature set is built for the prediction method. The key parameters of the construction algorithm are set as τ M A X = 288 , meaning 24 h a day, a total of 12 days of delay in detection; c ¯ = 0.6 means that when the correlation coefficient is greater than or equal to 0.6, we consider the feature to be highly correlated with ozone and that it can be included in the feature set. Table 2 shows the construction results of the feature set of O3 hourly concentration prediction in various regions. Luoyang has the largest number of features, with a total of 328 features, and Lvliang has the lowest number of features, with only 79 features.
Here, we take the O3 prediction feature set of Baoji city as an example, continue to analyze the selected features, and then discuss the possible relationship between the selected features and the time delay. Figure 5a shows the statistical results of selected features in this region by time delays. Observing the scatter points and fitting curves in Figure 5a, it can be found that when the time delay τ is 1, 22 and 44, there are three local maximum points in the number of features, and the interval between three points obviously has a period of 21~22. By observing the data of other cities, such as Luoyang (Figure 5b), Xi’an (Figure 5c) and Jinzhong (Figure 5d), it is found that these cities also have a similar rule. In addition, through observation, we can also find that with the increase in time delay τ , the number of selected O3 prediction feature sets does not linearly decrease, but there is an obvious fluctuation when τ is close to 22, and the number of selected features will be greater than at the time when τ is 1. Therefore, it is not a wise choice to directly take the influential factor set with τ = 1 as the prediction feature set of O3, which will lead to the abandonment of many valuable features, especially in the face of O3, which has a certain periodic prediction object.

4.4. Result Analysis

In order to intuitively observe the prediction effect of the FC-LsOA-KELM method, we used a scatterplot to describe the relationship between the observed and the predicted O3 concentrations per hour in 11 cities. We used the actual value of O3 in each city as the abscissa, and the predicted value obtained using the FC-LsOA-KELM method as the ordinate. The drawn scatterplots are shown in Figure 6.
It can be seen from each scatterplot that there was a relatively obvious positive linear correlation between the actual value and the predicted value in each region. Among them, the linear regression coefficient value of Baoji was up to 0.9728, and the performance of Lvliang was relatively poor in various regions, which also reached 0.8826. It can be seen that the FC-LsOA-KELM method performed well in O3 prediction in various regions, which can be further confirmed from the fitted R2 in each scatterplot. By comparing the value of R2 in each scatterplot, we found that the R2 of the prediction results in various regions was in the range of 0.9093~0.9608. According to the definition of R2, the closer R2 is to 1, the stronger the correlation between the actual value and the predicted value, and the more accurate the prediction method is.
Although it can be found from the previous analysis that the performance of the FC-LsOA-KELM method in O3 prediction of 11 regions in the Fenwei Plain is generally satisfactory, whether its prediction ability is competitive cannot be judged. To illustrate this point, we used the 11 comparison methods listed in Section 3.5 to predict the O3 in the study area. The analysis results of prediction error are shown in Table 3, Table 4 and Table 5. For the convenience of comparison, all the minimum values were marked in bold in these tables. According to Table 3 (MAPE), compared with the other 11 methods, the FC-LsOA-KELM method showed obvious advantages, ranking first in prediction error MAPE for each city. To observe the overall performance of these methods in the study area easily, the last row in Table 3 listed the mean value of the prediction error MAPE of these methods in each city. Looking at these mean values, it can be found that, except for FC-LsOA-KELM, the rankings of the remaining methods were FC-LsOA-KELM (2), BPNN (3), DT (4), MLR (5), KRR (6), GPR (7), SGD (8), GCN (9), GAT (10) and SVR (11).
Then, we continued to observe Table 4, which shows the RMSE values of various prediction methods. According to the data in Table 4, FC-LsOA-KELM had an outstanding performance in other regions except Lvliang, ranking first. This is somewhat different from the performance of FC-LsOA-KELM in MAPE, which is reflected in the fact that FC-LsOA-KELM ranked only fifth in RMSE for Lvliang, but the method ranked first in MAPE for Lvliang. When further analyzing the data of Lvliang, we found a special phenomenon. MLR, which had the best performance in RMSE, ranked only sixth in MAPE for Lvliang, and its MAPE value was 72.50%. This value was twice the MAPE value of the first-ranked FC-LsOA-KELM (31.71%). To further analyze the reasons for the conflicting conclusions of the two indicators, we calculated the percentage errors of FC-LsOA-KELM and MLR on the prediction results of O3 in Lvliang, and integrated the two sets of values with the actual value of O3 in Lvliang. A comparison chart of the predicted percent error and the actual value was drawn, as shown in Figure 7. Comparing the curves in Figure 7, it can be clearly seen that the percentage error of MLR was significantly larger than that of FC-LsOA-KELM, which confirms the conclusion in Table 4 that the MAPE value of FC-LsOA-KELM is much smaller than that of MLR. Then, we had an illusion that the RMSE value of FC-LsOA-KELM should be smaller than that of MLR. However, the data in Table 4 show that the RMSE value of MLR (4.3497) was significantly smaller than that of FC-LsOA-KELM (6.3164). Why, therefore, is there such a contradiction between the indicators? When predicting Lvliang, we found that the prediction accuracy of MLR was higher than that of FC-LsOA-KELM. The main reason we considered this was that Lvliang is located at the northern end of the Fenwei Plain (as shown in Figure 7, the red area is the location of Lvliang), and its climatic characteristics are quite different from those of other cities. Moreover, the correlation between ozone concentration and other urban pollutants is low. This can be verified with the number of predicted features constructed at different thresholds. We used the feature construction algorithm to construct the feature set of each region and acquired statistics on the number of features under different thresholds in each region. The statistical results are shown in Table 6. The second, third and fourth rows in the table are the number of features of each city under thresholds of 0.5, 0.6 and 0.7, respectively. Observing the data in Table 6, it can be found that under various threshold conditions, the number of features obtained in Lvliang was the lowest, and there was a large gap with the penultimate Tongchuan, which may be one of the main reasons for the poor O3 prediction effect of FC-LsOA-KELM in Lvliang.
In addition, we also tried to analyze the reasons why MLR was better than FC-LsOA-KELM in RMSE and R 2 , but worse than FC-LsOA-KELM in MAPE from the perspective of model design. After analysis, we found that the objective function design of the optimization algorithm in this paper is based on MAPE; the primary optimization indicator of the optimization algorithm is thus MAPE, so that FC-LsOA-KELM was far superior to MLR in Lvliang’s MAPE, but its performance was poor according to the other two indicators.
We continued to observe Figure 8 and found that when the percentage error of MLR was relatively high (i.e., when the blue curve is significantly higher than the orange curve), the O3 concentration value was often low, which means that although the prediction percentage error was very high at this time, its absolute error was small. This explains the large difference between the MAPE ranking and the RMSE ranking.
Subsequently, we continued to observe the data for R 2 listed in Table 4, and found that the performance of FC-LsOA-KELM on R 2 was consistent with its performance on RMSE, and it ranked first in all regions except Lvliang. At the same time, we also found the same problem after observing Lvliang; that is, the R 2 value of FC-LsOA-KELM also ranked fifth in the overall ranking of Lvliang, and its ranking was quite different from that of MAPE. The reason is consistent with the previous analysis of RMSE.

4.5. Statistical Analysis

When comparing the performance of various prediction methods, it is often necessary to conduct statistical tests on experimental results. To this end, we also conducted a statistical analysis on the analysis results of these methods listed in Section 3.5 using MAPE, RMSE and R 2 .
Before performing our statistical analysis, the Friedman test was used to calculate the average ranking of various prediction methods. The results are shown in Table 7. According to the Friedman mean rank, considering the 11 cities, FC-LsOA-KELM performed the best, ranking first in the three indicators of MAPE, RMSE and R 2 , which were 1.00, 1.36 and 1.36, respectively. These results also illustrate the superior performance of FC-LsOA-KELM.
We then used a statistical test for the comparison of multiple methods. First, a non-parametric Friedman test was used to determine whether there were significant differences in the performance of all methods under each indicator. To avoid biased conclusions caused by a single indicator and a single region, we tested 11 cities on three different indicators. Where the indicator used in the first group was MAPE, the second group was RMSE, and the third group was R 2 . First, the Friedman test requires the average ranking to be calculated. Then, the Friedman test should consider the critical value obtained at the significance level (α = 0.05, 0.1), and compare the critical value with Friedman’s statistical results to determine whether there is evidence that the null hypothesis is false. For the tests of these three indicators, we found that the results rejected the null hypothesis, which indicates that the performance of each group of methods has significant differences.
After determining that the various prediction methods differed in performance, we used the Bonferroni–Dunn test [46] to analyze statistical differences between the performance of these methods. This test compared the proposed method, FC-LsOA-KELM, with the other 11 methods; that is, the average ranking difference of each method was compared with the critical difference (CD). If the difference was greater than the critical difference, the method with a good average ranking was statistically superior to the method with a bad average ranking; otherwise, there was no statistical difference between the two. The calculation formula of the critical difference is as follows:
C D = q α k ( k + 1 ) 6 N
where k is the number of methods used for comparison, N is the number of data sets, and the commonly used value of q α can be obtained by looking up the table. See Appendix A for the table.
The results of the Bonferroni–Dunn test are shown in Figure 9. The bar chart shows the average ranking of the 12 methods on these three indicators, and the value corresponding to the horizontal line is equivalent to the threshold, that is, the ranking of the comparison method plus the value of the CD. This part defined two thresholds at the significance levels of 0.05 and 0.1, respectively. Each group used a different color to identify (the first group was blue, the second group was red, the third group was green), and the threshold was also distinguished by different colors. FC-LsOA-KELM obtained the same Friedman mean ranks on RMSE and R 2 , and the thresholds (CD) at significance levels of 0.05 and 0.1 were also the same for the two methods, which caused some of the threshold lines in Figure 8 to overlap. In the end, only three lines appear in Figure 9. The first pink line is 6.38, which is the corresponding threshold line for the second and third groups at a significance level of 0.05. The second green line is 6.02, which is not only the threshold line corresponding to the first group when the significance level is 0.05, but also the threshold line corresponding to the second and third groups when the significance level is 0.1. The last blue line is 5.66, which is the threshold line corresponding to the first group at a significance level of 0.1.
By looking at Figure 9, it can be seen that the prediction performance of FC-LsOA-KELM can outperform those methods whose mean rank is above the threshold line (i.e., the height of the bar exceeds its corresponding line). In Group 1, FC-LsOA-KELM significantly outperforms GPR, SVR, KRR, SGD, GCN and GAT at the significance levels of 0.05 and 0.1, respectively. For LsOA-KELM, FC-LsOA-KELM is superior to this method only at the significance level of 0.1. In Group 2, FC-LsOA-KELM is significantly superior to GPR, SVR, DT, SGD, GCN and GAT at both significance levels. In Group 3, FC-LsOA-KELM is significantly better than the other six methods except MLR, BPNN, KRR, LsOA-KELM and FC-LsOA-KELM--, at the significance levels of 0.05 and 0.1, respectively.
In addition to the Bonferroni–Dunn test, this study also considered Holm’s method [47]. The steps are: calculate the p value, sort the p value and compare the p value with α / i . If p < α / i , reject the null hypothesis; that is, the difference is significant. α is the significance level and i is the sorting result number of the method. As for the Friedman test and Bonferroni–Dunn test mentioned above, we also carried out tests for the three groups based on MAPE, RMSE and R 2 . The test results for these three groups are shown in Table 8, Table 9 and Table 10, respectively. It can be seen from the data in these tables that, for these three indicators, FC-LsOA-KELM was also significantly better than other methods at the significance levels of 0.05 and 0.1, respectively. The above test results once again verify the superiority of the FC-LsOA-KELM model.

5. Conclusions

To optimize the accuracy of O3 concentration prediction, this paper proposed a combined prediction model of O3 hourly concentration, FC-LsOA-KELM, which integrates multiple machine learning methods. The prediction performance of this model was tested using the historical air pollution data for several cities in the Fenwei Plain of China. By analyzing the experimental results, we can draw the following conclusions:
(1)
The selection and use of prediction features have a significant impact on the prediction performance of the prediction model. In this paper, when we used LsOA-KELM to train unselected and reconstructed air pollution data to build a predictive model to predict future values of O3 concentration, the prediction results were not ideal. In the evaluation of MAPE, RMSE and R 2 , LsOA-KELM was worse than BPNN, MLR and other methods. However, when LsOA-KELM was faced with the air pollution data reconstructed by the FC, its prediction performance was significantly improved.
(2)
The prediction feature set constructed by the feature construction method (FC) can not only mine the potential relationship between air pollutants, but also analyze the impact of historical pollutants on future pollutants, which is helpful for enriching the source of O3 prediction features, thereby helping the prediction model improve the accuracy of O3 predictions.
(3)
Using historical data to revise the prediction results can reduce the outliers in the model prediction caused by insufficient training of the prediction model, thereby helping the prediction model to improve the prediction accuracy.
In the future, we will consider using FC-LsOA-KELM for financial market forecasting and renewable energy generation forecasting, etc., to determine whether FC-LsOA-KELM can also perform well in other fields.

Author Contributions

Conceptualization, D.L. and X.R.; methodology, D.L.; software, D.L. and X.R.; validation, D.L. and X.R.; formal analysis, D.L.; resources, D.L.; data curation, D.L.; writing—original draft preparation, D.L. and X.R.; writing—review and editing, D.L.; visualization, D.L. and X.R.; supervision, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Scientific Research Program Funded by Shaanxi Provincial Education Department (Program No.20JG031), and the Postgraduate Innovation Fund Project of Xi’an University of Posts and Telecommunications (CXJJWY2020001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

C ( ) record candidate feature informationFC A coefficient vectorLsOA
a coefficient vector
c ¯ threshold for correlation coefficient C coefficient vector
c correlation coefficient C F adaptive parameter
m number of subsequences of the time series D i distance between the lioness i and the prey
n number of historical time points E _ Team hunting team’s “center circle”
V multivariate time series Elite _ lioness elite matrix
V i the i th subsequence I t e r current iteration number
V i candidate features of the i th subsequence M constant
F feature set M a x _ i t e r maximum iteration number
τ a time delay   Prey   position vector of the prey
τ M A X maximum time delay P constant
b j bias of the j th hidden nodeKELM r i random vectors between [0, 1]
C regularization coefficient RB random number vector of Brownian motion
g ( ) activation function R uniform random vector in [0, 1]
H output matrix of the hidden layer RL random number vector of Levy’s flight
H + Moore-Penrose generalized inverse of matrix H t current iteration number
I identity matrix Top _ lioness _ pos position vector of the elite lioness
K ( ) kernel function X i position vector of the lioness i
Q total number of samples X A position vector of the lioness A with the best fitness
w j input weight vector X B position vector of the lioness B with the second highest fitness
x i input vector X C position vector of the lioness C with the third highest fitness
y i expected output vector X D position vector of the lioness D with the fourth highest fitness
Y output matrix of the output layer X a position vector adjusted by lioness A
β output weight matrix X b position vector adjusted by lioness B
β j output weight vector X c position vector adjusted by lioness C
η inverse of the regularization coefficient X d position vector adjusted by lioness D
Ω KELM kernel matrix X a v e the mean of X a , X b , X c , X d

Appendix A

Table A1. Critical values for the two-tailed Bonferroni–Dunn test.
Table A1. Critical values for the two-tailed Bonferroni–Dunn test.
Methods234567
q0.051.9602.2412.3942.4982.5762.638
q0.101.6451.9602.1282.2412.3262.394
Methods8910111213
q0.052.6902.7242.7743.2193.2683.313
q0.102.4502.4982.5392.9783.0303.077
Where Methods is the number of methods used for comparison.

References

  1. Hemming, B.L.; Harris, A.; Davidson, C.; U.S. EPA. Air Quality Criteria for Lead (2006) Final Report; U.S. Environmental Protection Agency: Washington, DC, USA, 2006; EPA/600/R-05/144aF-bF. [Google Scholar]
  2. Khatibi, R.; Naghipour, L.; Ghorbani, M.A.; Smith, M.S.; Karimi, V.; Farhoudi, R.; Delafrouz, H.; Arvanaghi, H. Developing a predictive tropospheric ozone model for Tabriz. Atmos. Environ. 2013, 68, 286–294. [Google Scholar] [CrossRef]
  3. Ordieres-Merè, J.; Ouarzazi, J.; Johra, B.E.; Gong, B. Predicting ground level ozone in Marrakesh by machine-learning techniques. J. Environ. Inform. 2020, 36, 93–106. [Google Scholar] [CrossRef]
  4. Yang, L.; Xie, D.; Yuan, Z.; Huang, Z.; Wu, H.; Han, J.; Liu, L. Quantification of regional ozone pollution characteristics and its temporal evolution: Insights from the identification of the impacts of meteorological conditions and emissions. Atmosphere 2021, 12, 279. [Google Scholar] [CrossRef]
  5. Bell, M.L.; Peng, R.D.; Dominici, F. The Exposure–Response Curve for Ozone and Risk of Mortality and the Adequacy of Current Ozone Regulations. Environ. Health Perspect. 2006, 114, 532–536. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Mills, G.; Buse, A.; Gimeno, B.; Bermejo, V.; Holland, M.; Emberson, L.; Pleijel, H. A synthesis of AOT40-based response functions and critical levels of ozone for agricultural and horticultural crops. Atmos. Environ. 2007, 41, 2630–2643. [Google Scholar] [CrossRef]
  7. Riga, M.; Stocker, M.; Ronkko, M.; Karatzas, K.; Kolehmainen, M. Atmospheric Environment and Quality of Life Information Extraction from Twitter with the Use of Self-Organizing Maps. J. Environ. Inform. 2015, 26, 27–40. [Google Scholar] [CrossRef]
  8. Duenas, C.; Fernandez, M.C.; Canete, S.; Carretero, J.; Liger, E. Stochastic model to forecast ground-level ozone concentration at urban and rural areas. Chemosphere 2005, 61, 1379–1389. [Google Scholar] [CrossRef]
  9. Kumar, K.; Yadav, A.K.; Singh, M.P.; Hassan, H.; Jain, V.K. Forecasting Daily Maximum Surface Ozone Concentrations in Brunei Darussalam—An ARIMA Modeling Approach. J. Air Waste Manag. Assoc. 2004, 54, 809–814. [Google Scholar] [CrossRef] [Green Version]
  10. Hubbard, M.C.; Cobourn, W.G. Development of a regression model to forecast ground-level ozone concentration in Louisville, KY. Atmos. Environ. 1998, 32, 2637–2647. [Google Scholar] [CrossRef]
  11. Kovač-Andrić, E.; Sheta, A.; Faris, H.; Gajdošik, M.Š. Forecasting ozone concentrations in the east of Croatia using nonparametric neural network models. J. Earth Syst. Sci. 2016, 125, 997–1006. [Google Scholar] [CrossRef] [Green Version]
  12. Allu, S.K.; Srinivasan, S.; Maddala, R.K.; Reddy, A.; Anupoju, G.R. Seasonal ground level ozone prediction using multiple linear regression (MLR) model. Model. Earth Syst. Environ. 2020, 6, 1981–1989. [Google Scholar] [CrossRef]
  13. Iglesias-Gonzalez, S.; Huertas-Bolanos, M.E.; Hernandez-Paniagua, I.Y.; Mendoza, A. Explicit Modeling of Meteorological Explanatory Variables in Short-Term Forecasting of Maximum Ozone Concentrations via a Multiple Regression Time Series Framework. Atmosphere 2020, 11, 1304. [Google Scholar] [CrossRef]
  14. Oufdou, H.; Bellanger, L.; Bergam, A.; Khomsi, K. Forecasting daily of surface ozone concentration in the Grand Casablanca region using parametric and nonparametric statistical models. Atmosphere 2021, 12, 666. [Google Scholar] [CrossRef]
  15. Pawlak, I.; Jarosawski, J. Forecasting of Surface Ozone Concentration by Using Artificial Neural Networks in Rural and Urban Areas in Central Poland. Atmosphere 2019, 10, 52. [Google Scholar] [CrossRef] [Green Version]
  16. Kumar, P.; Lai, S.H.; Wong, J.K.; Mohd, N.S.; Kamal, M.R.; Afan, H.A.; Ahmed, A.N.; Sherif, M.; Sefelnasr, A.; El-Shafie, A. Review of Nitrogen Compounds Prediction in Water Bodies Using Artificial Neural Networks and Other Models. Sustainability 2020, 12, 4359. [Google Scholar] [CrossRef]
  17. Spellman, G. An application of artificial neural networks to the prediction of surface ozone concentrations in the United Kingdom. Appl. Geogr. 1999, 19, 123–136. [Google Scholar] [CrossRef]
  18. Chaloulakou, A.; Saisana, M.; Spyrellis, N. Comparative assessment of neural networks and regression models for forecasting summertime ozone in Athens. Sci. Total Environ. 2003, 313, 1–13. [Google Scholar] [CrossRef]
  19. Sousa, S.; Martins, F.G.; Alvim-Ferraz, M.; Pereira, M.C. Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations. Environ. Model. Softw. 2007, 22, 97–103. [Google Scholar] [CrossRef]
  20. AlOmar, M.K.; Hameed, M.M.; AlSaadi, M.A. Multi hours ahead prediction of surface ozone gas concentration: Robust artificial intelligence approach. Atmos. Pollut. Res. 2020, 11, 1572–1587. [Google Scholar] [CrossRef]
  21. Faris, S.; Alivernini, A.; Conte, A.; Maggi, F. Ozone and particle fluxes in a Mediterranean forest predicted by the AIRTREE model. Sci. Total Environ. 2019, 682, 494–504. [Google Scholar] [CrossRef]
  22. Luna, A.S.; Paredes, M.; Oliveira, G.; Corrêa, S.M. Prediction of ozone concentration in tropospheric levels using artificial neural networks and support vector machine at Rio de Janeiro, Brazil. Atmos. Environ. 2014, 98, 98–104. [Google Scholar] [CrossRef]
  23. Quej, V.H.; Almorox, J.; Arnaldo, J.A.; Saito, L. ANFIS, SVM and ANN soft-computing techniques to estimate daily global solar radiation in a warm sub-humid environment. J. Atmos. Sol.-Terr. Phys. 2017, 155, 62–70. [Google Scholar] [CrossRef] [Green Version]
  24. Faleh, R.; Bedoui, S.; Kachouri, A. Ozone monitoring using support vector machine and K-nearest neighbors methods. J. Electr. Electron. Eng. 2017, 10, 49–52. [Google Scholar]
  25. Su, X.; An, J.; Zhang, Y.; Zhu, P.; Zhu, B. Prediction of ozone hourly concentrations by support vector machine and kernel extreme learning machine using wavelet transformation and partial least squares methods. Atmos. Pollut. Res. 2020, 11, 51–60. [Google Scholar] [CrossRef]
  26. Lu, W.Z.; Wang, D. Learning machines: Rationale and application in ground-level ozone prediction. Appl. Soft Comput. J. 2014, 24, 135–141. [Google Scholar] [CrossRef]
  27. Domanska, D.; Wojtylak, M. Application of fuzzy time series models for forecasting pollution concentrations. Expert Syst. Appl. 2012, 39, 7673–7679. [Google Scholar] [CrossRef]
  28. Yafouz, A.; Najah, A.; Zaini, A.; El-Shafie, A. Ozone Concentration Forecasting Based on Artificial Intelligence Techniques: A Systematic Review. Water Air Soil Pollut. 2021, 232, 79. [Google Scholar] [CrossRef]
  29. Vautard, R.; Beekmann, M.; Roux, J.; Gombert, D. Validation of a hybrid forecasting system for the ozone concentrations over the Paris area. Atmos. Environ. 2001, 35, 2449–2461. [Google Scholar] [CrossRef]
  30. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  31. Huang, G.B.; Wang, D.H.; Lan, Y. Extreme Learning Machines: A Survey. Int. J. Mach. Learn. Cybern. 2011, 2, 107–122. [Google Scholar] [CrossRef]
  32. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the IEEE International Joint Conference on Neural Networks, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
  33. Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2012, 42, 513–529. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Holland, J.H. Genetic algorithms. Sci. Am. 1992, 267, 66–72. [Google Scholar] [CrossRef]
  35. Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, MHS’95, Nagoya, Japan, 4–6 October 1995; pp. 39–43. [Google Scholar] [CrossRef]
  36. Faramarzi, A.; Heidarinejad, M.; Mirjalili, S.; Gandomi, A.H. Marine Predators Algorithm: A Nature-inspired Metaheuristic. Expert Syst. Appl. 2020, 152, 113377. [Google Scholar] [CrossRef]
  37. Yuchi, W.; Gombojav, E.; Boldbaatar, B.; Galsuren, J.; Enkhmaa, S.; Beejin, B.; Naidan, G.; Ochir, C.; Legtseg, B.; Byambaa, T.; et al. Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city. Environ. Pollut. 2019, 245, 746–753. [Google Scholar] [CrossRef]
  38. Cao, Q.D.; Miles, S.B.; Choe, Y. Infrastructure recovery curve estimation using Gaussian process regression on expert elicited data. Reliab. Eng. Syst. Saf. 2022, 217, 108054. [Google Scholar] [CrossRef]
  39. Wang, L.; Zeng, Y.; Chen, T. Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar] [CrossRef]
  40. Brereton, R.G.; Lloyd, G.R. Support vector machines for classification and regression. Analyst 2010, 135, 230–267. [Google Scholar] [CrossRef]
  41. Cawley, G.C.; Talbot, N.; Foxall, R.J.; Dorling, S.R.; Mandic, D.P. Heteroscedastic kernel ridge regression. Neurocomputing 2004, 57, 105–124. [Google Scholar] [CrossRef]
  42. Banerjee, K.S.; Carr, R.N. Ridge regression-Biased estimation for non-orthogonal problems. Technometrics 1971, 12, 55–67. [Google Scholar] [CrossRef]
  43. Ighalo, J.O.; Adeniyi, A.G.; Marques, G. Application of linear regression algorithm and stochastic gradient descent in a machine-learning environment for predicting biomass higher heating value. Biofuels Bioprod. Biorefining 2020, 14, 1286–1295. [Google Scholar] [CrossRef]
  44. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  45. Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  46. Zar, J.H. Biostatistical Analysis. Q. Rev. Biol. 2010, 18, 797–799. [Google Scholar] [CrossRef]
  47. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar] [CrossRef]
Figure 1. The construction process of the “central circle”.
Figure 1. The construction process of the “central circle”.
Sustainability 14 05964 g001
Figure 2. The construction process of the elite matrix.
Figure 2. The construction process of the elite matrix.
Sustainability 14 05964 g002
Figure 3. A flow chart of the prediction model.
Figure 3. A flow chart of the prediction model.
Sustainability 14 05964 g003
Figure 4. A schematic diagram of the research area.
Figure 4. A schematic diagram of the research area.
Sustainability 14 05964 g004
Figure 5. Statistical results of prediction features in various regions.
Figure 5. Statistical results of prediction features in various regions.
Sustainability 14 05964 g005
Figure 6. Scatter plots of observed and predicted O3 concentrations in the study area.
Figure 6. Scatter plots of observed and predicted O3 concentrations in the study area.
Sustainability 14 05964 g006aSustainability 14 05964 g006b
Figure 7. Location of Lvliang.
Figure 7. Location of Lvliang.
Sustainability 14 05964 g007
Figure 8. Comparison between the predicted percentage error and the actual value of MLR and FC-LsOA-KELM.
Figure 8. Comparison between the predicted percentage error and the actual value of MLR and FC-LsOA-KELM.
Sustainability 14 05964 g008
Figure 9. Bonferroni–Dunn test of different methods at significance levels (α = 0.05 and α = 0.1).
Figure 9. Bonferroni–Dunn test of different methods at significance levels (α = 0.05 and α = 0.1).
Sustainability 14 05964 g009
Table 1. Parameters and differences of comparison methods.
Table 1. Parameters and differences of comparison methods.
Key ParametersParameter Introduction
(Accessed on 1 May 2022)
Advantages and Disadvantages
MLRfit_intercept=True, normalize=‘False’https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegressionadvantages: simple modeling; easy explanation; fast running speed
disadvantages: does not fit nonlinear data very well
SVRkernel=‘poly’, C=1.1, gamma=‘auto’, degree=3, epsilon=0.1, coef0=1.0https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVRadvantages: robust to outliers; solve high-dimensional problems; excellent generalization ability
disadvantages: not suitable for large-scale data; sensitive to missing data
BPNNhidden layer nodes=30advantages: self-learning and adaptive ability; high-speed optimization; parallel processing capability
disadvantages: a large number of parameters; difficult to explain; a risk of falling into local optimal
GPRkernel=DotProduct() + WhiteKernel(), random_state=0https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html#sklearn.gaussian_process.GaussianProcessRegressoradvantages: fits nonlinear data; predicted values are probabilistic; interpretability
disadvantages: determination of covariance function; nonparametric model; high complexity when the amount of data is large
KRRalpha=1, kernel=‘linear’, gamma=None, degree=3, coef0=1https://scikit-learn.org/stable/modules/generated/sklearn.kernel_ridge.KernelRidge.htmladvantages: kernel function, which is more flexible; fits nonlinear relationships well; data can be mapped to high-dimensional space
disadvantages: high computational cost and large amount of computation
DTcriterion=‘squared_error’https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressoradvantages: easy to understand and explain; easy to implement; insensitive to missing values;
disadvantages: prone to overfitting
SGDloss=‘squared_error’, penalty=‘l2’, alpha=0.0001, max_iter=1000, tol=0.001https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressoradvantages: fast running speed
disadvantages: poor convergence performance; local minimum may be obtained, and the accuracy is not high
GCNhidden layer nodes=6advantages: suitable for nodes and graphs of any topology;
disadvantages: all neighbor nodes are assigned the same weight; completely dependent on the graph structure
GAThidden layer nodes=6advantages: using the attention mechanism, different weights can be assigned to different neighbor nodes; not completely dependent on the graph structure
disadvantages: when the neighborhoods are highly overlapping, a lot of redundant computations are involved
Table 2. Statistical table of the construction results of prediction feature sets in various regions.
Table 2. Statistical table of the construction results of prediction feature sets in various regions.
RegionBaojiJinzhongLinfenLuoyangLvliangSanmenxiaTongchuanWeinanXi’anXianyangYuncheng
Number of Features12621519532879232111218278202309
Table 3. Values of MAPE with different methods for different cities.
Table 3. Values of MAPE with different methods for different cities.
CITYMLRGPRBPNNSVRKRRDTSGDGCNGATLsOA-
KELM
FC-
LsOA-
KELM--
FC-
LsOA-
KELM
Baoji18.43%20.17%15.01%36.08%18.08%21.42%27.51%46.33%35.05%18.52%10.99%10.96%
Jinzhong29.28%46.11%23.71%36.53%36.53%30.27%34.71%50.20%48.77%29.78%20.48%20.09%
Linfen38.40%73.25%40.18%140.49%40.37%34.52%59.60%52.49%79.49%96.91%23.79%22.00%
Luoyang31.09%40.56%31.79%95.34%36.67%29.74%44.18%69.16%82.85%32.58%20.91%20.83%
Lvliang72.50%97.01%61.00%297.32%76.58%38.91%85.12%57.07%75.03%76.66%34.53%31.71%
Sanmenxia32.28%44.91%25.61%103.66%35.37%36.45%39.43%36.86%50.31%31.79%20.79%20.66%
Tongchuan24.97%28.82%19.37%42.41%27.23%29.80%30.19%46.30%38.13%24.93%17.12%16.77%
Weinan45.83%49.57%33.18%175.69%50.94%41.61%61.73%60.27%88.39%42.19%22.00%21.63%
Xi’an32.58%41.42%21.21%58.56%32.33%25.75%57.70%54.91%83.77%20.82%17.04%16.33%
Xianyang39.36%48.00%38.23%121.86%49.11%30.35%66.77%52.84%66.73%37.68%20.46%20.38%
Yuncheng21.75%31.47%19.27%55.65%25.23%21.25%27.95%36.46%33.42%23.09%14.69%14.56%
Average35.13% (5)47.39% (8)29.87% (3)105.78% (12)38.95% (6)30.92% (4)48.63% (9)51.17% (10)61.99% (11)39.54% (7)20.25% (2)19.63% (1)
Table 4. Values of RMSE with different methods for different cities.
Table 4. Values of RMSE with different methods for different cities.
CITYMLRGPRBPNNSVRKRRDTSGDGCNGATLsOA-
KELM
FC-
LsOA-
KELM--
FC-
LsOA-
KELM
Baoji6.03876.23404.99517.31185.99497.64107.113013.730611.02596.06343.89033.8899
Jinzhong6.73077.94156.41267.14337.14339.52987.521111.463310.43036.62805.46995.4377
Linfen7.628610.52957.487713.71677.98379.30668.893410.642912.316212.51365.85525.7987
Luoyang5.73156.29755.20579.24035.88187.68216.744811.872513.69975.80184.04874.0454
Lvliang4.34976.57085.829514.06665.642010.68364.69329.93928.62616.99006.38996.3164
Sanmenxia6.56107.60686.340810.49896.81079.35117.372910.31819.38246.48935.27705.2595
Tongchuan6.78977.10106.58608.21936.81899.82237.398214.892811.27976.71315.20505.1898
Weinan6.82717.37006.015412.98037.10698.17257.63579.621111.69636.40874.83954.8250
Xi’an5.00655.42754.14726.84884.96356.11546.699210.032211.87993.87262.98502.9415
Xianyang5.85226.28945.636611.01736.16987.31277.82439.13668.98095.87973.92723.9186
Yuncheng7.33658.42556.959510.19737.44158.40458.376214.127912.70377.33625.23925.2369
Average6.2593 (4)7.2540 (7)5.9651 (3)10.1128 (10)6.5415 (5)8.5474 (9)7.2975 (8)11.4343 (12)11.0928 (11)6.7906 (6)4.8297 (2)4.8054 (1)
Table 5. Values of R 2 with different methods for different cities.
Table 5. Values of R 2 with different methods for different cities.
CITYMLRGPRBPNNSVRKRRDTSGDGCNGATLsOA-
KELM
FC-
LsOA-
KELM--
FC-
LsOA-
KELM
Baoji0.11910.12690.08150.17460.11740.19070.16530.61580.39710.12010.04940.0494
Jinzhong0.11230.15630.10190.12650.12650.22510.14020.32580.26970.10890.07420.0733
Linfen0.13320.25380.12840.43080.14590.19830.18110.25930.34730.35850.07850.0770
Luoyang0.08140.09830.06720.21160.08570.14620.11270.34930.46510.08340.04060.0406
Lvliang0.04340.09910.07800.45430.07310.26210.05060.22680.17080.11220.09370.0916
Sanmenxia0.12920.17370.12070.33090.13920.26250.16320.31960.26420.12640.08360.0830
Tongchuan0.11520.12600.10840.16880.11620.24100.13670.55410.31780.11260.06770.0673
Weinan0.13800.16080.10710.49890.14960.19780.17260.27410.40510.12160.06930.0689
Xi’an0.11970.14070.08210.22400.11770.17860.21430.48070.67400.07160.04260.0413
Xianyang0.11730.13550.10890.41590.13040.18320.20980.28600.27640.11840.05280.0526
Yuncheng0.10760.14190.09680.20790.11070.14120.14030.39910.32270.10760.05490.0548
Average0.1106 (4)0.1467 (7)0.0983 (3)0.2949 (10)0.1193 (5)0.2024 (9)0.1533 (8)0.3719 (12)0.3555 (11)0.1310 (6)0.0643 (2)0.0636 (1)
Table 6. The number of features under different thresholds in each city.
Table 6. The number of features under different thresholds in each city.
CITYBaojiJinzhongLinfenLuoyangLvliangSanmenxiaTongchuanWeinanXi’anXianyangYuncheng
0.5617206615533043365135458997213569012737
0.612621519532879232111218278202309
0.7182329386271831572834
Table 7. Friedman test ranking results.
Table 7. Friedman test ranking results.
NO.Friedman Mean RankMAPERMSE R 2
1Multiple Linear Regression(MLR)5.184.454.45
2Gaussian Process Regression(GPR)8.457.557.55
3Back Propagation Neural Network(BPNN)3.823.183.18
4Support Vector Regression(SVR)11.4110.4110.41
5Kernel Ridge Regression(KRR)6.775.415.41
6Decision Tree(DT)4.919.009.00
7Stochastic Gradient Descent(SGD)9.097.277.27
8Graph Convolutional Network(GCN)9.3611.0011.00
9Graph Attention Network(GAT)10.2710.7310.73
10LsOA-KELM5.735.275.27
11FC-LsOA-KELM--2.002.362.36
12FC-LsOA-KELM1.001.361.36
Table 8. Holm’s method test results of the first group.
Table 8. Holm’s method test results of the first group.
FC-LsOA-KELM vs.Rankz-Valuep-Valuea/i(0.05)a/i(0.1)
MLR5.18−2.9340.003350.004550.00909
GPR8.45−2.9340.003350.0050.01
BPNN3.82−2.9340.003350.005560.01111
SVR11.41−2.9340.003350.006250.0125
KRR6.77−2.9340.003350.007140.01429
DT4.91−2.9340.003350.008330.01667
SGD9.09−2.9340.003350.010.02
GCN9.36−2.9340.003350.01250.025
GAT10.27−2.9340.003350.016670.03333
LsOA-KELM5.73−2.9340.003350.0250.05
FC-LsOA-KELM--2.00−2.9340.003350.050.1
Table 9. Holm’s method test results of the second group.
Table 9. Holm’s method test results of the second group.
FC-LsOA-KELM vs.Rankz-Valuep-Valuea/i(0.05)a/i(0.1)
GPR7.55−2.9340.003350.004550.00909
SVR10.41−2.9340.003350.0050.01
DT9.00−2.9340.003350.005560.01111
GCN11.00−2.9340.003350.006250.0125
GAT10.73−2.9340.003350.007140.01429
LsOA-KELM5.27−2.9340.003350.008330.01667
FC-LsOA-KELM--2.36−2.9340.003350.010.02
BPNN3.18−2.8450.004440.01250.025
KRR5.41−2.8450.004440.016670.03333
SGD7.27−2.8450.004440.0250.05
MLR4.45−2.9340.020800.050.1
Table 10. Holm’s method test results of the third group.
Table 10. Holm’s method test results of the third group.
FC-LsOA-KELM vs.Rankz-Valuep-Valuea/i(0.05)a/i(0.1)
GPR7.55−2.9340.003350.004550.00909
SVR10.41−2.9340.003350.0050.01
DT9.00−2.9340.003350.005560.01111
GCN11.00−2.9340.003350.006250.0125
GAT10.73−2.9340.003350.007140.01429
LsOA-KELM5.27−2.9340.003350.008330.01667
FC-LsOA-KELM--2.36−2.9340.003350.010.02
BPNN3.18−2.8450.004440.01250.025
KRR5.41−2.8450.004440.016670.03333
SGD7.27−2.8450.004440.0250.05
MLR4.45−2.4900.012790.050.1
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, D.; Ren, X. Prediction of Ozone Hourly Concentrations Based on Machine Learning Technology. Sustainability 2022, 14, 5964. https://doi.org/10.3390/su14105964

AMA Style

Li D, Ren X. Prediction of Ozone Hourly Concentrations Based on Machine Learning Technology. Sustainability. 2022; 14(10):5964. https://doi.org/10.3390/su14105964

Chicago/Turabian Style

Li, Dong, and Xiaofei Ren. 2022. "Prediction of Ozone Hourly Concentrations Based on Machine Learning Technology" Sustainability 14, no. 10: 5964. https://doi.org/10.3390/su14105964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop