In the feature-selection problem, on the one hand, it is necessary to identify features that contain as much information as possible about all features in order to achieve better accuracy in later classification. On the other hand, it is important to reduce the redundant features as much as possible. When evaluating redundancy, if there is a high correlation between two feature points, then there is redundant information. This approach can effectively reduce the cost and computational complexity of subsequent algorithms and improve efficiency.
3.1. The Concept of Dominating Set
From the process of constructing the feature network in the previous section, it can be seen that there are edges between nodes in the feature network, indicating that there is a correlation between nodes and that they contain each other’s information. The higher the weight of the edges, the higher the correlation between the nodes. To find the minimum set of features that contains all the information, it is required that the selected set of feature points in the feature network have edges to all the remaining points, which is consistent with the concept of a dominating set in graph theory. The concept of the dominating set is introduced below.
Definition 1. In an undirected graph G, if S ⊆ V (S ≠ ∅), and for ∀x ∈ V − S, x are directly connected to at least one node in S, then S is considered to be a dominating set of G.
According to the analysis in previous sections, the feature subset that contains all the information with the least correlation is selected to be the dominating set in the feature network, which can also be denoted as the optimal feature set.
Figure 2 illustrates the dominant set with two typical dominating sets of a graph represented by the blue nodes. It can be observed that a graph may have multiple dominating sets. Both network features (a) and (b) have 11 nodes
, but their dominating sets differ.
has four nodes, and
has three nodes. It is evident that, for a symmetric complex network
, there is no unique dominating set. Consequently, it remains unclear which obtained dominating set represents the optimal feature set. In the selection process of the dominating set, when the connected edges (i.e., whether there is a nonlinear relationship between features) are considered but their weights (i.e., the correlation coefficients between the features) are not, the correlations between the features cannot be reflected. Feature selection hopes that the identified nodes contain as much classification information as possible [
24]. Therefore, we choose to construct the connected edges based on the correlations between the features. If the weight of a connected edge connected to a node is larger, it means that the number of nodes connected to the node is larger, and the correlation is larger. This indicates that this node contains more information. The node strength of a selected feature node is explained as follows.
Definition 2. The node strength of node vi is the sum of the edge powers wij of the connected edges through the node, denoted as S(i). Equation (5) is used to calculate the node strength:where is the set of nodes that form a connected edge with node . The node strength reflects the total amount of influence that neighboring nodes have on it. The higher the node strength, the higher the correlation between the feature node and the features, indicating that it contains more information. Therefore, when selecting the dominant set, it is necessary to find as many nodes with high node strength as possible as the final features. 3.2. Optimal Dominating Set Based on BPSO Algorithm
The optimal dominating set is defined as the minimum number of nodes within a graph required to form a set that covers every node in the graph. This set, known as the dominating set, can be either directly or indirectly connected to other nodes in the graph. The objective of identifying the optimal dominating set is to locate a set of nodes that dominates the entire graph while utilizing minimal resources and incurring minimal cost. The set must be composed of the least possible number of nodes to ensure comprehensive coverage.
Taking the above considerations into account, when the feature network is analyzed to find the optimal set of features, the selected set of features needs to meet the following requirements.
- 1.
The selection of features should be minimized to improve the classification efficiency, which is beneficial for later stages;
- 2.
The node strengths of the selected features should be as large as possible so that the selected features can contain more classification information.
Meanwhile, the selected set of feature nodes must be the dominant set of the whole network.
The essence of this problem is to find a subset of features from N feature nodes that meets the requirements. Thus, the solution space of the problem can be transformed into the selection of an N-dimensional vector. The vector x has N dimensions, representing N features. The value of the dimension corresponds to whether the node in the feature network is selected, with one indicating the node is selected and zero indicating the node is not selected. In this way, the problem is transformed into a 0–1 programming problem. In order to solve this problem, the binary particle-swarm optimization algorithm (BPSO algorithm) is selected here.
In this paper, we outline the steps to solve the optimal dominating set based on the BPSO algorithm; they are as follows:
Step 1: Construct the adjacency matrix.
Use the MIC to create the adjacency matrix A for the feature network. The element in the matrix indicates whether node i and node j have connecting edges. If a connecting edge exists between i and j, the value of is 1. Otherwise, the value of is 0;
Step 2: Generate Random Binary Strings.
Create multiple random binary strings with a size of n, where n denotes the number of nodes. These strings pertain to the possible nodes in a dominating set with a length of n. For the i-th node, the i-th bit equates to one to signify that it will serve as the dominating point and zero to signify that it will not;
Step 3: Encode the objective function.
Record the optimal set of domination points as the mechanics optimization objective;
Step 4: Update string velocity and position.
Transcoding is accomplished by using the Sigmoid function to determine the ideal location of each particle and the overall optimal position by updating the velocity and position of each binary string while maintaining the original set of dominant optimal points;
Step 5: Iterative updating.
Iterate until the maximum number of iterations is reached or the optimal solution is found. After each iteration, the current set of dominating points should be compared to the previous iteration’s set, and the optimal set of dominating points should be updated if the current set is better;
Step 6: Return the optimal set of features.
The global set of optimal dominating points formed by these nodes, i.e., the dominating set, is used to cover all nodes in the graph.
The BPSO algorithm is based on the basic particle-swarm algorithm, which specifies that zero and one are only considered as the values that the particles can pick up and change in the state space and is transcoded by the Sigmoid function. Each dimension of the velocity
represents the possibility of taking one for each bit of the position
. Therefore, the
update formula in the continuous particle swarm remains the same, but the individual extremum as
and the global optimal solution
consist of zero and one only. Equations (6) and (7) are used to calculate the position updates.
In Equation (7), r is the random number generated in the distribution.
Equation (8) is used to calculate the velocity updates.
where
, respectively, represent the individual optimal position, individual current position, global optimal position, and the current position of the population. The rand() is the random number that is the same as the r in Equation (7).
Based on the analysis in
Section 3, the optimization objective is to find a dominating set from all the points in the network, with as few points as possible, while maximizing their total point strength. As mentioned earlier, the solution x is a binary string of length 41, and its value corresponds to whether the corresponding feature is selected or not. Therefore, the number of selected features is
. The node strength of the selected feature node represented by x is the sum of its corresponding row in the weighting matrix B, denoted as
.
Equation (9) is used to calculate the optimization objectives.
where k is a coefficient to regulate the relationship between the number of samples selected and the amount of information contained.
The constraint is that the selected set of points must be a dominating set. According to the definition of a dominating set, all points in the network must have a direct connection with at least one point in the selected set S. The adjacency matrix corresponding to the points in set S is represented by the rows of A, denoted as
. Equation (10) is the merging operation for
.
Let t be the number of the selected features and Q be a 1 × n-dimensional 0–1 type row vector. According to the definition of a dominating set, we can get that all positions should be one except for the position of element in the dominating set S, where the corresponding value can be zero.
Therefore, we can obtain the final optimization objective. Equation (11) is used to calculate the final optimization objective.
where
denotes the set of connected dominating nodes for a given graph
, and
denotes the set of nodes with element 0 in the row vector Q.
3.3. Algorithm Flow
According to the analysis above, the flow of the proposed algorithm is shown as follows.
Step 1: Information Collection and Processing.
Relevant data is collected through existing network-management systems, and specific network features are obtained through data processing such as data cleaning and normalization;
Step 2: Construct the feature network.
Perform a correlation analysis on the processed set of features, calculate the MIC between the features, and determine whether two nodes (features) are connected based on their MIC values. This process enables the construction of a feature network;
Step 3: Finding the Optimal Dominating Set.
In the constructed feature network, the optimal set of dominating nodes is searched using the BPSO algorithm, with the optimization objective defined by Equation (9). In this paper, if the set of solutions does not correspond to the dominating set of the network, the objective value is set to a very large number (e.g., 10,000);
Step 4: Solve for the optimal feature set.
Map the nodes found in the dominating set back to features to obtain the optimal feature set.
The pseudo-code for Algorithm 1 is shown below.