1. Introduction
In recent years, rapid economic development has brought about rapid population growth and the increase of vehicle occupancy per capita, which impose a heavy burden on transportation infrastructure, such as insufficient parking spaces. According to the 2020 National Economic and Social Development Statistical Bulletin (
http://www.gov.cn/xinwen/2021-02/28/content_5589283.htm, accessed on 22 October 2022), the total number of civilian vehicles in the country has increased by 7.41% year-over-year, and the total number has exceeded 280 million. The continuous increase in the number of motor vehicles has brought many problems to society, such as traffic congestion, a waste of resources, economic losses, excessive commuting times, and frequent traffic accidents. In addition, the pollution caused by the large number of cars may threaten human health [
1]. Since traffic flow can reflect the number of vehicles that pass a point in a certain period of time [
2], accurate traffic flow forecasting is of great significance to management departments and individuals, which can optimize the design and operation of transportation systems to improve traffic efficiency and safety. Thus, traffic flow predictions over a short period of time, i.e., short-term traffic flow predictions, have attracted much attention of scholars due to the randomness and dynamics of traffic conditions [
3]. Many machine-learning methods [
4,
5] have been applied to short-term traffic flow predictions, which can be divided into parametric models, non-parametric models, and hybrid models.
As for parametric models, they quantitatively describe the relationship between inputs and outputs through an explicit model and estimate the parameters in the model. Common parametric models have Historical Average (HA) models and Auto-Regressive Integrated Moving Average (ARIMA) models. HA relies on the cyclical nature of traffic flow and only uses the average value of past traffic volume to predict future traffic flow [
6]. Therefore, HA is simple in calculation and easy to apply in real life. Stephanedes et al. [
7] employed HA to forecast future traffic volume and applied the model to the urban traffic control systems. Kaysi et al. [
8] used HA for the traveler information systems. However, HA is unable to respond to dynamic changes in traffic systems, especially traffic accidents. To overcome this shortcoming, later models introduce real-time data into the prediction process, such as time-series method ARIMA. ARIMA interprets the past behavior of time series through mathematical models and applies the model to predict future traffic flow [
9,
10]. Van Der Voort et al. [
11] introduced a new method of short-term traffic forecasting KARIMA by combing Kohonen maps with ARIMA time series models for solving the problem that ARIMA cannot deal with nonlinear traffic data. Considering the balance between the increased complexity and the increased forecast accuracy, Williams et al. [
12] raised an ARIMAX model through combining ARIMA with explanatory variables for improving forecasting performance. Considering a huge historical database of traffic flow, Kumar et al. [
13] proposed a Seasonal ARIMA (SARIMA) model for short-term traffic flow predictions. The model only utilized the prior three days of flow observation to predict the next-day flow values. However, the explicit models presupposed by the above methods are difficult to fit the real traffic flow data. Furthermore, they cannot reflect the emergent traffic conditions well. Consequently, non-parametric models are widely concerned.
Non-parametric models are a class of data-driven methods, which explore implicit relationships between inputs and predictions through large amounts of data without providing explicit functions. Some common non-parametric models include K-Nearest Neighbor (KNN), Support Vector Regression (SVR), and Artificial Neural Networks (ANN). KNN does not need prior knowledge, and it performs better than linear-model algorithms in terms of predictive performance. Zhang et al. [
14] established a short-term urban expressway flow prediction system based on KNN from the historical database, the search mechanism and algorithm parameters, and the predication plan. Hou et al. [
15] used a two-tier K-nearest neighbor algorithm to forecast short-term traffic flow considering the problem of calculation speed and parameter flexibility. Cai et al. [
16] presented a sample-rebalanced and outlier-rejected k-nearest neighbor regression model for short-term traffic forecasting in order to handle the problem of imbalance and noise. In addition, SVR is also widely used in the nonlinear regression and time series problems. To improve traffic prediction accuracy, Lin et al. [
17] put forward a method for screening spatial time-delayed traffic series based on the maximal information coefficients, which adopted the combination of support vector regression method and the k-nearest neighbors method for traffic flow prediction. Hong et al. [
18] put forward a SVR traffic flow forecasting model, which employs the hybrid genetic algorithm-simulated annealing algorithm to determine its suitable parameter combination. Hong [
19] applied SVR to seasonal trend time series data and proposed a traffic flow forecasting model SSVRCIA that combines the seasonal support vector regression model with a chaotic immune algorithm. Furthermore, since ANN has strong self-learning and self-adaptation abilities, many scholars proposed short-term traffic flow prediction models based on ANN [
20]. Tang et al. [
21] raised Neighbor Subset Deep Neutral Network (NSDNN) to forecast spatio-temporal data, which can extract useful inputs from nearby roads by conjoining a deep neutral network and the subset selection method. Considering the spatial correlation of traffic flow, the paper [
22] proposed a method to predict the spatio-temporal characteristics of short-term traffic flow by combing the k-nearest neighbor algorithm and bidirectional long–short-term memory network model. However, a single short-term traffic flow prediction model is difficult to meet various situations in real life. Therefore, to improve the prediction ability and prediction accuracy, hybrid prediction models have received extensive attention, which takes full advantages of different models.
Some hybrid methods are raised for forecasting short-term traffic flow by combining several techniques [
23,
24,
25]. Considering the forecasting performance is seriously deteriorated by non-Gaussian noises inside the traffic flow sequence, Fang et al. [
26] presented an error distribution free deep learning for short term traffic flow forecasting. Liu et al. [
27] put forward a hybrid short-term traffic flow forecasting method combining the neural networks and KNN. In order to improve the forecasting accuracy of short-term traffic flow and provide precise and reliable traffic information for traffic management units and travelers, Liu et al. [
28] raised a hybrid forecasting model based on KNN and SVR. Luo et al. [
29] proposed a spatiotemporal traffic flow prediction method by combining KNN and long–short-term memory network (LSTM), called KNN–LSTM. However, the above-mentioned methods ignore the uncertainty in the traffic flow data, which affects the accuracy and robustness of the traffic flow prediction model. The uncertainty involves ambiguity and randomness, and they often appear at the same time [
30]. It is worth noting that fuzzy systems can describe the ambiguity well. Therefore, researchers often combine fuzzy systems with ANNs, which are called fuzzy neural networks or neuro-fuzzy models. Zhou et al. [
31] proposed a novel deep-learning model for short-term traffic flow prediction by considering the inherent features of traffic data. In addition, a novel approach of the estimation of uncertainty is proposed, which is based on the notion of Intuitionistic fuzzy set (an extension of the Fuzzy set of Lotfi Zadeh) and an intuitionistic fuzzy traffic characterization [
32].
Considering the Fuzzy Inference System (FIS) has the ability to autonomously imitate the human brain for reasoning, the Adaptive Neuro-Fuzzy Inference System (ANFIS) was developed by Jang Roger [
33]. The system combined the learning mechanism of neural networks and the reasoning ability of FIS. ANFIS can adaptively extract network inference rules from data samples with the help of the neural network’s autonomous learning advantages. It shows unique characteristics and has been successfully applied in many fields. Keskin et al. [
34] used the synthetic sequence generated by the ARIMA model as the training set of ANFIS and developed a flow prediction method based on the combination of ANFIS and the stochastic hydrological model. Ahmadianfar et al. [
35] adopted the integration of an adaptive hybrid of differential evolution and particle warm optimizations with an adaptive neuro fuzzy inference system model for EC prediction. Acakpovi et al. [
36] used ANFIS to predict the reliability of power demand. Mohiyunddin et al. [
37] introduced a novel ANIFS for data protection to improve and determine the degree of security. Chen et al. [
38] proposed a short-term traffic flow prediction based on ANFIS. Ghenai et al. [
39] developed a short-term and accurate energy consumption forecast for educational building. This aims to balance the supply from renewable power systems and the building electrical load demand. Although ANFIS can describe the ambiguity in the traffic flow data, it cannot reflect the randomness of the data. The cloud model proposed by Li et al. [
40] can simultaneously capture multiple uncertainties, especially randomness. In order to describe the ambiguity and randomness of traffic flow data simultaneously and improve the prediction performance of the model, we combine cloud models and FIS to solve traffic flow forecasting problems in ANN. In summary, the main contributions of our work are listed below:
(1) The cloud model and fuzzy inference system are combined to describe the ambiguity and randomness in the traffic flow. Put the cloud model in the network of the fuzzy inference system for training instead of using the inference rules to perform simple mapping between two cloud models.
(2) By calculating the weight of the historical time series of traffic flow, a weighted multi-dimensional cloud model is generated.
(3) Based on the weighted multi-dimensional cloud model, the improved fuzzy prediction system is constructed for short-term flow predictions; the system can describe the randomness problems and ambiguity of the data at the same time. It overcomes the shortcomings of fuzzy inference systems, which cannot capture the timing characteristics of long sequence data well.
The following content includes four sections.
Section 2 introduces the basic knowledge related to the paper.
Section 3 describes the improved fuzzy inferenced systems and explains the input layer, the cloudification fuzzy layer, the cloudification rule layer, the standardization layer, the inverse cloudification layer, and the output layer in detail.
Section 4 demonstrates experiments for verifying the effectiveness of the raised model.
Section 5 summarizes the whole paper.
2. Preliminaries
2.1. Fuzzy Inference Systems
A Fuzzy Inference System (FIS) is a system with the ability to handle fuzzy data based on fuzzy set theory and fuzzy logic methods, which simulate the fuzzy reasoning process of human beings by applying fuzzy sets and fuzzy rules to input data to generate fuzzy output results. Next, we will give the definition of fuzzy rules.
Definition 1 [41]. Suppose the input–output data records of fuzzy rules are given,
where
() is the input,
is the output, and denotes the pth sample. Then, single fuzzy IF–THEN rule performs as follows: IF is , THEN is , where and are fuzzy sets defined in R. Fuzzy systems mainly consist of a fuzzy input layer, a fuzzy inference method, a fuzzy rule base, and a defuzzification layer [42]. The fuzzy layer is responsible for mapping the exact values entering the fuzzy system to a fuzzy set over a given theoretical domain. Fuzzification methods include the fuzzy single value method, the triangular membership function method, and the Gaussian membership function method. Since the Gaussian membership function has a good anti-interference ability and the fuzzification results are closer to human cognition, it is mostly used in research.
The fuzzy rule base, which is the core part of the fuzzy inference system, consists of all the fuzzy rules in the system. It has two forms, including one-dimensional fuzzy rules and multi-dimensional fuzzy rules. The fuzzy inference engine is mainly responsible for calculating the incentive intensity of the rules in the rule base.
The defuzzification layer is to determine the best accurate value that can represent the fuzzy set. The method of defuzzification is not unique, as it mainly includes the maximum membership method, the center of gravity method, and the center average method.
2.2. Cloud Model
Inspired by probabilistic mathematics and the fuzzy set theory, Li et al. [
40] created the cloud model, which is a new method to recognize uncertainty and an important way to realize two-way cognitive conversions between qualitative semantics and quantitative values. The cloud model allows for a certain degree of deviation between random phenomena and normal distribution and measures the deviation between them. At the same time, cloud models can describe the inherent correlation between randomness and fuzziness in uncertainty. Next, the definition of the cloud model is given.
Definition 2 [42]. Assume a universe , where is an exact value and there exists a set of linguistic terms, with . If is a random instance on T, and the degree of certainty of x for is a random number with a stable tendency within the interval , then the distribution of is called a cloud on , and each random instance of is called a cloud drop on domain . The cloud model is generally described using three characteristic values: , , and . Among them, is the expectation, representing the expectation of the sample with a membership degree of 1 in and reflecting the center position of the sample; is entropy and is hyperentropy, both of which are determined by the correlation between randomness and fuzziness within simultaneously. The entropy can be used to measure the degree of randomness in the sample, manifested as the width of the cloud model (that is the distribution range of cloud droplets on the horizontal axis within the universe). Hyperentropy can reflect the degree of dispersion of , manifested as the thickness of the cloud model, i.e., the degree of condensation of cloud droplets within the universe. A cloud model can be labeled as . The Gaussian cloud model, based on the Gaussian distribution function and Gaussian membership function, is the most important cloud model, which is defined as follows.
Definition 3 [42]. Let U be the universe of discourse and T be a linguistic terms set in U. If is a random instantiation of concept T and satisfies , , then the certainty degree of x belonging to T satisfies where y belongs to [0, 1].
The distribution of X in the universe U is named a one-dimensional normal cloud, and the cloud drop can be written as (x, y). The cloud can effectively describe both fuzziness and randomness of a concept by three quantitative variables, i.e., expectation Ex, entropy En, and hyper entropy He.
The one-dimensional cloud was originally applied to solving the problem of decision-making evaluations. When the number of evaluation factors increases, the evaluation results deviate significantly from the actual situation. Therefore, a multi-dimensional cloud model is proposed to overcome the above-mentioned problems. The multi-dimensional cloud model is an extension of the one-dimensional cloud model, which adopts the one-dimensional cloud method for each attribute of the multi-dimensional cloud [
43]. In the following, we give the definition of the multi-dimensional cloud model.
Definition 4 [43]. Let be a set of samples where , , and T be a qualitative concept on the domain . , there is a membership degree of with respect to . That is: . Definition 5. Assuming that the dimensions in the universe of discourse are independent of each other, then the m-dimensional cloud has 3m numerical eigenvalues: . Where is the expectation, is the entropy of the multidimensional normal cloud, and is super-entropy. A multi-dimensional cloud model can be expressed by the following formula, which is called MEHS (Mathematical Expected Hyper Surface): 2.3. Cloud Inference Algorithm
2.3.1. Front Part Cloud and Back Part Cloud
The foundation of uncertainty reasoning is uncertainty knowledge, and the uncertainty information contained in uncertainty knowledge is often extracted using IF–THEN fuzzy rules. IF–THEN fuzzy rules include one-dimensional fuzzy rules and multidimensional fuzzy rules. Among them, the one-dimensional fuzzy rules are: If x is , then y is , which is called an uncertainty inference machine. The condition corresponds to the linguistic terms set of universe , which is called the one-dimensional front part; the conclusion corresponds to the linguistic terms set of universe , which is called the one-dimensional back part. In cloud-reasoning algorithms, is called a one-dimensional front part cloud for determining the membership degree of x the linguistic terms set of , and it generally uses the X conditional cloud generator. is called a one-dimensional Back Part Cloud, and a Y-condition cloud generator is used to determine the membership degree of x belonging to the linguistic terms set of .
The one-dimensional precursor cloud generator, shown in
Figure 1, converts the input data into cloud droplets and obtains the distribution range and pattern of the data. The mapping relationship between the input data and the membership degree is established. In the process of generating the membership degree, normal random numbers based on expectation and variance are used, and it considers the fuzziness and randomness of data in the overall calculation process. The detailed algorithm is as follows:
Input: A cloud model and a quantitative x.
Output: The membership degree μ of the quantitative values of x.
- (1)
Produce a normal random entropy based on the entropy and the hyperentropy .
- (2)
Calculate the cloud droplet at the specified value x, .
The accuracy of the one-dimensional back-part cloud generator depends on the amount of data in the model. When the number of cloud droplets is large enough, its three parameter values can be calculated according to the statistical characteristics. The greater the number of cloud droplets, the better the statistical effect.
2.3.2. Cloud Model Inference Rule Generator
By connecting an antecedent cloud generator to a consequent cloud generator, a single rule generator is constructed. The operating mechanism of a single rule generator is to connect the two in sequence, so that the combination of the two conditional cloud generators can realize the preservation and transmission of the uncertainty of the data and complete the uncertainty inference. The execution process of the algorithm is as follows:
Step 1: Generate a normal random number with as the expected value and as the mean squared deviation.
Step 2: Calculation of the membership degree:
Step 3: Generate a normal random number with as the expected value and as the mean squared deviation.
Step 4: When the quantization value
, the antecedent cloud activates and rises along, the latter also activates and rises along this direction
Step 5: When the quantization value
, the cloud of antecedents is activated and descends along, then the latter also activates and descends in this direction.
In practice, the multi-rule inference algorithm is generally used, as shown in
Figure 2. Through a logical calculation, uncertainty reasoning for multi-rule reasoning can be achieved. In the actual operation process, the number of conditions and rules is determined based on the specific manifestations of different datasets. For different inference rules, logical computation operators are mainly divided into “soft AND” and “soft OR” operators. When the result of reasoning needs to meet the requirements of all conditional attributes, a logical “AND” operation is performed, which is called the “soft AND” algorithm. When the inference result satisfies one or more of the conditions, a logical “OR” operation is performed, which is called the “soft OR” algorithm. In order to simplify the calculation, it is necessary to minimize the possibility of multiple conditions and rules appearing during the inference process. As the number of multiple conditions and rules in the system increases, the number of rules will rapidly increase, and the computational difficulty will then significantly increase. Therefore, it is necessary to perform a certain degree of “dimensionality reduction” on inference rules, split complex rules that are difficult to calculate. Thus, they reduce the computational workload and complexity of the model. In researching the literature, the “max” function is generally used to take the maximum value, and the “prod” function is used to calculate the cumulative result for the “soft sum” calculation to obtain the comprehensive membership degree.
3. Improved Fuzzy Inference System
Fuzzy inference systems usually use the membership function in the fuzzification layer to project the exact value of the input values into the fuzzy set. Common fuzzy membership functions include the triangular membership function, trapezoidal membership function, generalized bell-shaped membership function, Gaussian membership function, joint-Gaussian membership function, etc., among which the Gaussian membership function is most widely used. However, due to the different driving habits of drivers, there is a certain degree of randomness in the traffic flow data, and the above-mentioned functions cannot describe the randomness of the traffic flow data well, so this paper introduces the cloud model as the membership function.
For ease of understanding,
Figure 3 shows the improved fuzzy inference system. The fuzzy inference system consists of five network layers, namely the input layer, the cloudification fuzzy layer, the cloudification rule layer, the standardization layer, the inverse cloudification layer, and the output layer, where
denotes a time sequence of the observed traffic flow data and the output result represents the predicted traffic flow at time
t + 1.
The execution function of each layer of the improved fuzzy inference system is as follows (for the convenience of symbolic representation, let the input of neuron i in network layer k be denoted as and the output as ):
(1) Input layer: Each node on the input layer is directly connected to the clouded fuzzy layer and is primarily used to receive traffic flow data with a time window.
In addition to the strong cyclical correlation that traffic flow demonstrates with the same day of each week, there is also a cyclical similarity in traffic flow on a daily basis. If only daily variation is considered, the overall trend of traffic flow over a 24-h period is reflected; if only the weekly cyclical variation is considered, the overall trend of traffic flow over a 24-h period on the same day of each week is reflected. If only one of the above is considered, it does not fully reflect the traffic flow pattern and needs to be considered in a comprehensive manner. In order to model the daily and weekly periodicity of traffic flow, the periodic input matrix for time t is given as follows:
where
x represents the time series,
and
represent the traffic flow data for the previous d days and w weeks, respectively. Therefore, the input layer of the fuzzy system is responsible for passing each component of the traffic flow history data
to the clouded fuzzy layer.
Input: ; Output: ; n = d + w, indicating the total number of nodes in the first layer of the network.
(2) Cloudification fuzzy layer: It performs uncertainty processing on the data and maps the exact traffic flow values to the uncertainty space. Each node in the clouded fuzzy layer represents a sub-subordinate cloud model generated by the time value t and the X-conditional cloud generator, which calculates the degree of certainty of each input temporal component. In this layer of network, the input time series data is clustered first according to the chapter fuzzy clustering algorithm. The number of membership functions in the system is equal to the number of clusters, and the initial parameters of the membership functions are determined by the clustering results.
Input: ;
Output: , where ; , denotes the jth cloud model affiliation function corresponding to the time series of the input system; mi denotes the number of discrete sub-clouds into which is divided.
(3) Cloudification rule layer: The rule layer is mainly responsible for cloud rule matching, and each fuzzy rule has a corresponding node in this layer. t-mode “AND” and “OR” are the most commonly used operators for fuzzy set combination, and the soft “AND” operator is activated on the rule layer. The activation of each cloud rule can be determined by the soft “AND” operator, where . The soft “AND” calculation process refers to the multi-dimensional normal cloud generator introduced in Definition 3 to calculate the membership degree of the multi-dimensional normal cloud.
This paper argues that the closer the historical traffic flow series is to the prediction time point, the higher the similarity with the prediction time period. Thus, this paper gives higher weights to the time series with high impact in the historical traffic flow sequence for compensating the lack of learning ability of the fuzzy inference system. Assuming that the input sequence is
, we assume that the
ith time series of
has a high impact on the prediction result [
44]. Therefore, we calculate the corresponding weights to be assigned to each time series to improve the prediction accuracy. Then, we perform multiple linear regression using multiple time series data, calculated as follows:
where
is the corresponding weight and b is the bias. The weight and bias parameters in the cloud rule can be obtained by minimizing the equation
.
is the predicted value. Finally, the weights can be obtained as:
where
is the weight of the
nth day before the prediction time point. In this paper, the Softmax classifier function is used to ensure that the sum of all weights is 1.
The multi-dimensional normal cloud, processed by the fuzzy-rule enhancement mechanism, is calculated as follows.
The fuzzy inference rule for this layer is: If the sub-subordinate cloud function has a membership degree of , then the combined membership of the rule is .
Input: ; Output: .
(4) Standardization layer: It is mainly responsible for the standardization operation of values. The following formula is used to calculate the normalized activation intensity
corresponding to the activation degree
passed into this layer.
Input: ; Output: .
(5) Inverse cloudification layer: Quantitatively transform the fuzzy membership degree and generate a subsequent cloud by the Y conditional cloud generator. Then output the inference result and its corresponding traffic flow value qk.
Input: ; Output: .
(6) Output layer: The results of the inverse clouding layer are averaged and weighted. Then, output the final results.
Input: ; Output: .