3.3.1. Data Preprocessing
Data preprocessing includes three parts: data elimination, data set partition, and data normalization.
In order to guarantee the validity of the model, it is necessary to eliminate abnormal data according to criterion. Exactly speaking, we assume a set of test data consists of random error only, then calculate standard deviation and partition probability interval. The data that exceed the probability interval, usually caused by gross error, are considered as abnormal data to be eliminated. By collecting and observing mass operation data in field, we find that input variables conform to the normal distribution when the sample data size is large enough. It is eligible to eliminate abnormal data and calculate sample normal characteristics. If the data is out of tolerance scope, it would be eliminated on the basis of criterion.
After that the reserved sample data is divided into training set and test set. Training set is the key to training model. Particularly, the merits and size of training set determine the training effect, while the test set aims to check the validity of the model. In this paper, we adopt fixed partition method, which is easy to compare and analyze validity of different model subsequently. The collected data will be partitioned according to the ratio of M (training set) to N (test set). As the proportion of M increases, the training performance of the model will be better, with strong generalization and high accuracy.
After partitioning the data set, the data needs to be normalized, that is, the sample is mapped to the interval [0, 1]. Later on, the contribution of each input variable can be balanced, so that the model convergence speed will be accelerated.
3.3.2. Determination of Parameter Initial Value for GWO-SVR Model
There are two parts about the determination of parameter initial value for GWO-SVR model, including that for SVR model and GWO model respectively.
(1) Determination of parameter initial value for SVR model
Penalty coefficient , kernel function category of SVR, and their internal parameter should be confirmed during the process of modeling. Furthermore, these parameters can control the accuracy and generalization ability of SVR model, so parameter determination is an important step during modeling. For Equation (4), the size of will affect the noise tolerance straightforwardly and then affect model accuracy.
The selection of kernel function mainly depends on the number of characteristics and the size of sample data. Specifically, it is the time when the number of sample characteristics is far more than the sample size, with characteristic dimension high enough, that the data is usually linear separable and the linear kernel function can be considered; when the sample size is common and sample characteristic dimension is not high, we generally choose radial basis function (RBF). Regarding the energy consumption estimation model in this paper, it has only four characteristics and mass sample data, therefore we will take RBF as the kernel function for the proposed SVR model.
In Equation (15), there is only one unknown parameter (controlling the noise tolerance of model) inside the RBF, and different sample should select different parameter .
Due to the difficulty to carry out visible measurement about the variables SOC, trip travel time, average environment temperature, air-conditioner operation time, and the high-dimensional space characteristics of energy consumption, as well as the unavailability of applicable parameters directly, thus it is necessary to utilize GWO to select the optimal parameters.
(2) Determination of parameter initial value for GWO model
SVR model has two uncertain parameters that need to be optimized based on GWO. During the process of optimizing, the primary parameters to be set are the number of grey wolves in the population , the maximum iteration times , position information, hunting zone, and step length.
The number of grey wolves in the population refers to the number of wolves. As this number grows, the hunting zone for the wolf pack to seek optimal solution expands gradually. In general, the more the number, the wider the hunting zone, and the convergence speed will become faster, however the computation burden will increase greatly. For the energy consumption model in this paper, the value range of should be 10 to 20.
The maximum iteration denotes the times that grey wolves search for prey. Iteration times should be determined according to the sample data size. To be specific, when the sample size becomes large, the iteration times will also increase, adding the burden to computation. In this regard, we value between 20 and 50 for our energy consumption estimation model.
At the same time, setting the range of search parameters will improve the accuracy of the optimal parameters and search efficiency of the optimal solution, in addition, reduce the time to select the optimal parameters.
3.3.3. Steps for GWO-SVR Model
Step 1. Set the position information of wolf and the initial value of fitness.
After setting the internal parameters of grey wolves, we should preset position information and initial value of fitness. In terms of GWO algorithm, initial population will make a big impact on the global search speed and the quality of the solution. It is common that GWO algorithm sets the position information as the random value and sets the fitness as positive infinity to initiate iteration. In this paper, Consideration of the large sample size for the energy consumption, if we adopt a randomized policy to set initial value, the convergence speed would be slower. As a result, the position information of wolf and are given based on the method of dimension estimation.
Generally, the hunting zone of penalty parameter C is given as follows.
where dim is the dimension of the input parameter.
and
are input parameters in our model, so dim equals 4.
can has the same hunting zone as
, and its population initial value can be obtained from Equation (30).
Taking the upper quartile, lower quartile, median, and the recommended value of as the initial value, which can obtain three sets of and . Making use of these three sets of data, the root-mean-square error (RMSE) which performs as the fitness function could be calculated through SVR model, and the performance of the model is better with RMSE decrease. Next, the position information will be updated according to the order of the first, second, and third optimal values of RMSE.
Step 2. Determination of the initial value of collaborative vector.
It is required to preset
, and
to calculate the initial value of collaborative vector. Both
and
are random vector in the range of [0, 1]. Parameter
is defined by the following equation.
where
is the maximum iterations.
According to Equation (20), the coefficient vector A and B of GWO can be obtained.
Step 3. Update the position information and fitness of wolf and .
The position information and fitness will be recalculated based on Equations (22) and (23), then through comparison, select the smaller adaptability to update step by step.
For example, as the calculation of the fitness of grey wolf , the first thing is to calibrate the parameters of SVR model employing position information. Position information consists of parameter and , which have acquired initial value through dimension calculation in Step 1. Let the initial value of and as the position information of grey wolf , and then the position information as the parameters of SVR model. Next, we will train SVR model with setting in the training set as the input variables and as the output variables. After this, in the test set are taken as input variables to estimate energy consumption. Finally, the error between estimated value and actual value in the test set will be analyzed and take RMSE as the fitness of this position information.
Step 4. Update iteration.
Referring to Equation (18), the position information and fitness of each grey wolf are recalculated. RMSE is an important index to evaluate the validity of the model. In particular, the smaller the RMSE, the better the model. If the RMSE of grey wolf is less than that for a certain grey wolf in the grey wolves of , replacement will be carried out. For example, the RMSE of grey wolf is smaller than that of grey wolf , and then the position information and fitness of grey wolf will be replaced by grey wolf . At the same time, grey wolf and exchange their position information and fitness with each other.
This step aims to achieve the optimal replacement of position information. With the continuous renewal and iteration of parameters, also for the RMSE, it is further to find parameter and which are closer to the optimal value.
Then according to Equation (20), compute the value of and for GWO taking advantage of the randomness of parameter and . Repeat step4 ~step5 and conduct iteration.
Step 5. Result of GWO algorithm.
With the continuous renewal of the position information and fitness of grey wolf and in the population, the GWO algorithm is over when iterations come to the maximum. Meanwhile, the position information of grey wolf is selected as the final model parameters of and .
Step 6. The final training of SVR model.
and are selected as the final parameters for SVR model. Like step 3, in the sample training set are input variables and is output variable. Finally, the SVR training model can be obtained.
Step 7. The energy consumption estimation and evaluation based on SVR model.
Taking SOC, trip travel time, average environment temperature, and air-conditioner operation time as input variables, the estimated value of energy consumption can be calculated and renormalized. By comparing the actual value with the estimated value, we will obtain the evaluation index of this model and conduct further analysis.