*4.2. Selection Principle*

The principle of ''minimum redundancy—maximum correlation" which is similar to the famous supervised feature selection method is adopted [15], and the selection of the mth feature is based on:

$$l\_m = \arg\max\_{f\_i \in \mathcal{U}\_m} \{ \text{Rel}(f\_i) - \frac{1}{m - 1} \sum\_{f\_t \in \mathcal{S}\_{m-1}} \text{Red}(f\_{i\cdot} f\_t) \} \tag{1}$$

where *Um* represents the set of unselected features in the current step

*fi* represents a feature in the unselected feature set in the current step;

*Rel*(*fi*) represents Relevance of feature *fi*, which is the average mutual information between feature *fi* and any other one in the whole feature set is defined as *Rel*(*fi*). *Rel*(*fi*) can be calculated with Formula (2).

$$Rel(f\_i) = \frac{1}{n} \sum\_{t=1}^{n} I(f\_i; f\_t) = \frac{1}{n} (H(f\_i) + \sum\_{1 \le t \le n, t \ne i} I(f\_i; f\_t)) \tag{2}$$

*Sm*−<sup>1</sup> is the selected feature set in the current step;

*Red*(*fi*, *ft*)is the redundancy of feature *fi* relative to selected feature *ft*. *Red*(*fi*, *f*)*<sup>t</sup>* can be calculated with Formula (3).

$$\text{Rel}(f\_l, f\_l) = \text{Rel}(f\_l) - \text{Rel}(f\_l|f\_l) \tag{3}$$

Rel(*ft*| *fi*) is conditional relevance of *ft* with *fi*, Rel(*ft*| *fi*) can be calculated with Formula (4).

$$\text{Rel}(f\_l|f\_i) = \frac{H(f\_l|f\_i)}{H(f\_l)} \times \text{Rel}(f\_l) \tag{4}$$

### *4.3. Relationship with Supervised Algorithms*

When the data type is supervised, the labels of the class can represent the information of the whole feature set.

Then relevance of feature *fi* can be defined as

$$\text{Rel}(f\_i) = I(f\_i, c) \tag{5}$$

where *c* in Formula (5) is the class label [16].

Redundancy between feature *fi* and the selected feature *ft* is defined as

$$\text{Red}(f\_{i\prime}f\_{l}) = I(f\_{i\prime}f\_{l})\tag{6}$$

According to the principle of mathematics [17], relevance in an unsupervised algorithm is the lower bound of relevance in a supervised algorithm, and redundancy in an unsupervised algorithm is proportional to the redundancy in a supervised algorithm. When the initial feature set is approximately equal to the labels of the class, the sequence features obtained by the unsupervised algorithm are highly correlated with the sequence features obtained by the supervised algorithm.

#### **5. Experiment and Validation**
