2. Feature Selection Algorithm Based on Relief
Suppose is a complete feature sample data set of original features, where the original feature set and the set of selected feature subset where denotes the number of features in the selected feature subset. Therefore, the selected feature subset constructs -dimensional feature subspace . Suppose is the number of sample data with labels in the database, because each feature is a random variable and the data that have label sample data corresponding dimensions are the feature values of the corresponding features. is used to represent the specific value of feature random variables in sample data with labels in the database. Therefore, the sample data with labels in the database can be written as the vectors in the space that is constructed by original features’ complete feature set. Similarly, denotes a specific data sample point of sample data with labels in the database in the -dimensional feature subspace that is constructed by selected feature subset. In addition, letter denotes the category information of the sample, denotes the category of sample .
The improved Relief algorithm proposed in this paper first redefines the distance formula of two points in
-dimensional feature subspace
that is constructed by selected feature subsets, and to make the obtained results more regular, a normalization method is adopted:
where
, and
denotes the Manhattan distance of two input vectors, the meaning represented by
is shown as Formula (2):
According to Formulas (1) and (2), we define the closest data point that belongs to the same category as
as
the closest data point that belongs to different category from
as
. Therefore, in the
-dimensional feature subspace
that is constructed by selected feature subsets, the difference
between
and
is shown as Formula (3), Among them, the letter
represents the category information of the sample, and
represents the category to which sample
belongs:
The difference
between
and
is shown in Formula (4):
Through Formulas (3) and (4), we can obtain the current feature subset
’s feature subset weight that is computed through single sample
in
-dimensional feature subspace
that is constructed by selected feature subsets. Formula (5) is shown as follows:
Finally, we can obtain the feature weight Formula (6) of the current selected feature subset
through Formula (5):
The feature selection algorithm based on improved Relief weight improved the feature evaluation weight of classic Relief, thus it has the ability of evaluating the feature subset. For the evaluation of feature subset using Feature selection based Improved Relief Weight (FSIRW), we randomly select sample data with labels from the sample data set in the database, and search for the closest sample data point of the same category for each sample in this set of feature subspace. At the same time, we look for the closest sample data point of the different category for the sample data in the subspace; then we calculate the differences and between sample data and and between and respectively through formulas (3) and (4) respectively, then we continue to plug and into Formula (5) to obtain one of the current samples last we calculate all the sampled samples and plug them into Formula (6) to carry out accumulation to obtain the feature weight value of this set of feature subsets under this set of feature subspace.
It is not enough to only have the evaluation function in the feature selection method; feature search strategy is also needed to construct the integrated feature selection method [
14]. FSIRW method that is improved based on classic Relief filtering feature selection algorithm also needs a corresponding feature search strategy to make the algorithm integrated. However, in the high-dimensional feature space, it is a
problem to search for the minimum and optimal feature subset through method of exhaustion, and the original feature space with
complete feature space sets have
non-space feature subsets. Therefore, the method of exhaustion cannot be used to carry out a feature search for FSIRW method, and it is correct and necessary to use a local search strategy. To select the optimal feature subset which meets our needs with strong ability of distinguishing category in low-dimensional feature subspace, the FSIRW algorithm adopts sequential forward search strategy to carry out feature search.
The Algorithm 1 evaluates the selected feature subset through the calculation of renewed feature subset weight . Combining with corresponding research strategy, we can obtain the complete FSIRW algorithm. Algorithm 2 is as follows:
Algorithm 1: Evaluate —FSIRW feature subset evaluation algorithm. |
Input: Sample instance data and the parameter of sampling recursion times of the for the renewal. Output: Feature subset’s weight . Flows:- 1
Randomly select sample from sample data set ; - 2
Look for sample ’s and respectively; - 3
Calculate the values of and ; - 4
Renew ’s value according to Formula (3)–(7); - 5
Renew ’s value according to Formula (3)–(8); - 6
Return to Step 1 if the times of sampling is less than .
|
Algorithm 2: FSIRW feature selection algorithm |
|
From the calculation process of Algorithm 2, we can know that the initialization makes the optimal feature subset null at the beginning of the calculation. Through the sequential forward search method and combining FSIRW feature subset evaluation algorithm, we constantly look for the features which make the current feature subset weight maximum in the rest of features and add them to the optimal feature subset, and we will not stop the algorithm until we find the feature subset that corresponds to the maximum and output the ’s corresponding feature subset to obtain the results we want.
3. Feature Evaluation Function Based on Mutual Information
Within the scope that is based on Shannon information entropy, the calculation of the mutual information between features must calculate the probability distribution
and joint probability distribution
of the corresponding features in advance, and even needs to calculate the probability density function of the sample feature. However, the calculation of probability density function and joint probability density is a complex calculation process with heavy computation load and low computation efficiency. Therefore, the mutual information calculation using the method within the scope of Shannon information entropy is complex and low-efficient. However, the resolution of feature selection method into the mutual information based on another famous information entropy theory—Renyi entropy—can solve the problem that appears in mutual information calculation based on Shannon entropy, especially the problem of high degree of computation complexity. The calculation Formula (7) of Renyi entropy-based mutual information is as follows:
where
in Formula (7) denotes the Renyi entropy-based mutual information,
denotes quadratic Renyi entropy’s information entropy,
denotes quadratic Renyi entropy-based joint information entropy.
As a wider definition through expansion, when Renyi entropy’s quotient
’s limit, the Renyi entropy is equivalent to the Shannon information entropy that is used widely and known by more scholars. Because the definition of Renyi entropy is wider than information entropy, aiming at different conditions, through the valuing of different Renyi entropy quotients, we can obtain Renyi entropy that is more suitable to corresponding conditions. The quadratic mutual information based on Renyi entropy mutual information has corresponding effective application. The calculation Formula (8) of Renyi entropy is as follows. Through the formula, we can know that Renyi entropy can be achieved through expansion by adding an extra parameter
through Shannon information entropy.
This paper uses the quadratic Renyi information entropy method to calculate the mutual information between features. The quadratic Renyi information entropy is the parameter in Renyi entropy Formula (8). At this time, the Renyi information entropy formula is equivalent to Shannon entropy. At the same time, when calculating the probability density, the method of sampling can be used directly from the original data set to replace the process of calculating the density function, so the time consumption is reduced, i.e., information potential function can be obtained through the calculation by Formula (9), where function
in Formula (9) denotes Gaussian kernel function.
Therefore, using data samples and the replacement of complex probability density function to calculate numerical integration value, the quadratic entropy value that uses Renyi entropy can be written as the form of Formula (10), and the Formula is shown as follows:
Similarly, the joint information entropy that uses Renyi entropy can be written as the form of Formula (11):
Hence, through the further derivation of Formula (11), we can obtain the form of Formula (12). Through the formula, we can know that in the process of calculation of mutual information using Renyi entropy, we do not need to first solve for the probability density or probability distribution function of each feature, rather we can directly obtain the value of mutual information between two sample features through estimation based on the summation of sample data, and therefore we overcome the problem of heavy computation load and low computation speed brought by the calculation of feature’s probability density.
Through derivation and analysis, we can know that the use of data sample summation can replace the process of using complex probability density function to calculate numerical integration values when calculating Renyi information entropy’s quadratic mutual information, which decreases the calculation quantity and calculation difficulty, and overcomes the disadvantage of the necessity of calculating feature’s probability density function based on Shannon information entropy. Therefore, this paper adopts Renyi quadratic mutual information method to calculate mutual information between features when calculating the mutual information between samples.
The Quadratic Joint Mutual Information (QJMI) feature selection evaluation function criterion proposed in this paper is based on Renyi entropy-based quadratic mutual information. We can directly obtain the value of mutual information between features using Renyi entropy’s quadratic mutual information according to the data in the data set and avoid the calculation of corresponding feature’s probability distribution or probability density function when we calculate mutual information of Shannon information entropy. When we evaluate the feature redundancy and correlation through mutual information, if the newly added feature is added to the selected features, making a new feature subset have larger mutual information value with final output and the currently selected feature subset when the selected features and newly added features have lower redundancy, then the features in feature selection are ideal features that should be added to the selected feature set. Therefore, this paper proposed a QJMI evaluation function based on this idea. The evaluation function takes all the features of the candidate feature set
into consideration, and examines the relationship between each candidate feature and the selected feature subset. The calculation formula of the algorithm is as follows. The evaluation criterion puts the candidate feature
with the largest value in the candidate feature set into the selected feature subset. QJMI evaluation function formula is shown as Formula (13).
When combining feature selection methods to carry out calculation, this function will evaluate each set of possible candidate feature subsets and choose the feature subset with the largest quadratic mutual information. However, it is unrealistic to take the selection of all the feature subsets into consideration when using the evaluation criterion to evaluate the features because of the time-consuming nature of the operation and heavy load of computation. Most of the feature selection algorithms add the candidate features into the selected feature subsets one by one to judge the relationship between candidate feature and selected feature subsets, and add the most suitable candidate feature into selected feature subset according to evaluation criteria. On the one hand, one of the advantages is that the mutual relationship between candidate feature and selected feature subset will be considered and to avoid redundancy; on the other hand, we can select the features with important values when jointly used into the selected feature subset, even though they are far from the selected feature subset, to avoid the omission of them. Therefore, the application of QJMI evaluation criterion should take the content mentioned above into consideration to make the algorithm better. Make the algorithm carry out iterative selection of the features, and we will not stop calculation selection until we reach stop criterion.
In addition, at the beginning stage of the application of the QJMI evaluation function’s algorithm, we should make the selected feature subset null, then QJMI evaluation criterion only needs to consider the relationship between the features in the candidate feature set and output rather than the mutual role between features and selected feature subset. In the following calculation, QJMI evaluation criteria are composed of two parts. For the first part, for the correlation between candidate features and output in the premise that the candidate features are in the selected feature subset, the evaluation function carries out weighting. For the second part, the evaluation function evaluates the correlation between candidate features and selected feature subset. The calculation values of the first part minus the values of the second part to construct the whole evaluation criterion. On the one hand, the use of QJMI evaluation criterion can ensure that candidate features have high correlation with output results and that the combination of these two parts is not the simple summation of respective information values and more correlation information can be obtained. On the other hand, the evaluation criterion can avoid the features which have redundancy features with output features that will be added, and can further guarantee that there is lower redundancy between the selected features through the evaluation function. At the same time, because the QJMI evaluation function uses mutual information calculation Formula (12) based on quadratic Renyi entropy to calculate the mutual information between samples, the function has the advantage of high computation speed.
In the above steps, this paper believes that when a single evaluation criterion is used for feature evaluation, it will always be impossible to achieve the most accurate evaluation of the feature subset because of the unity of the evaluation criterion. Therefore, in the flows, the improved Relief weight is combined with the QJMI evaluation function based on the quadratic Renyi entropy to calculate the mutual information. The algorithm uses the sequence forward search strategy to search for candidate features, and adds the features that have the largest composite correlation value and have a gain effect on the overall weight value after being added to the selected features. Until the remaining features no longer have gain information added to the selected features, the calculation is stopped.
Through the calculation process of the Algorithm 3, we can know that feature selection algorithm based on improved Relief weight has the progressive and combining use relationship with evaluation function based on mutual information. Therefore, through the combination of improved Relief weight and QJMI evaluation criterion proposed in this paper, we can obtain the feature selection algorithm based on Relief and mutual information. The algorithm evaluates each feature in the candidate feature set through two evaluation criteria—distance and improved joint mutual information function—and adds the features with gain to overall feature subset weight into selected feature subset, until reaching the stop condition. In this algorithm, using complex evaluation criterion to comprehensively consider the features in the feature set, the feature subset will have a stronger ability of evaluation.
Algorithm 3: FSIRQ (Feature selection algorithm based on Improved Relief and Joint mutual information) feature selection algorithm |
Input: Sample instance data set and sampling times . Output: Final feature subset . Initialization:- 1
Optimal feature subset ; - 2
Optimal feature subset weight (the minimum value of integer); - 3
th feature weight value . Flows:- 1
- 2
For : + : ; - 3
return to step 1; end calculation
|