**3. A Mixed-Integer Linear Programming Model for Rough Set-Based Classification with Flexible Attribute Selection**

There is no doubt that the decision-making model based on the rough set has the congenital defect of the rough set; thus, adding variable precision to extend the rough set into the rough set with variable precision is necessary while building the model. Nevertheless, adding variable precision only broadens the range of the upper and lower approximations. The choice of precision is often subjective and lacks scientific basis. Above all, variable precision can only be used as an auxiliary method to improve the ability of the rough set model to reduce noisy data's bad influence on accuracy.

In this study, we build a mixed-integer linear programming model for a rough set-based classification with flexible attribute selection, which has a strong ability to overcome the noise sensitivity of the rough set model. Meanwhile, this study explains the rough set model, which is often considered to be nonlinear, by using a linear model for the first time. It is also an extension of the rough set.

#### *3.1. Rough Set Model Based on Mixed Integer Linear Programming*

Applying mixed integer linear programming to optimize the rough set model is essentially explaining the definition that is related to the rough set by linear programming. The rough set in a linear model enables the maximum accuracy of dividing the equivalence class, so that the decision-making system based on the rough set can correctly determine the correlation between the conditional attribute set and the decisive attribute set.

This model focuses on the rough set based on the similarity relation, and compares the similarity of each attribute in the attribute set. Next, the elements that satisfy the similarity threshold on each attribute are selected as the elements to be divided into an approximate equivalence class.

This model can also screen out attributes in the attribute set and take the attribute makes a considerable impact in dividing the universe into the final attribute set to reduce the dimension of the attributes.

We use the following notations:

#### *I*: Universe of elements.

*kc*: A set of approximate equivalence classes obtained by partitioning the conditional attribute set in the universe.

*kd*: A set of approximate equivalence classes obtained by partitioning the decisive attribute set in the universe.

*C*: Conditional attribute set.

*D*: Decisive attribute set.

*N*: Minimum support number of the conditional attribute set.

*β*: Variable precision.

*αc*: Similarity threshold of the conditional attribute set.

*αd*: Similarity threshold of the decisive attribute set.

*M*: A large number.

*Xci*: Value of each element in each conditional attribute.

*Xdi*: Value of each element in each decisive attribute.

*ω*\_*cij*: For any two elements *i* and *j* in universe *I*, if *ω*\_*cij* = 1, *i* and *j* are in the same approximate equivalence class divided by the conditional attribute set; otherwise, *ω*\_*cij* = 0.

*slc*: *slc* = 1 if attribute c will be selected as a new attribute set to divide universe; otherwise, *slc* = 0.

*qik*: For any element *i* in universe *I* and any approximate equivalence class *k* in the set of approximate equivalence classes divided by the conditional attribute set, *qik* = 1 if *i* belongs to *k*; otherwise, *qik* = 0. *ssijc*: Any two elements *i* and *j* in universe *I* and any attribute *c* in the conditional attribute set. *ssijc* = 1 if value of *i* and *j* on attribute *c* satisfies the corresponding similarity threshold *αc*; otherwise, *ssijc* = 0.

*Qk*: Number of elements in the approximate equivalent class *k*, which is obtained from the partition of the conditional attribute set to the universe.

*ω*\_*dij*: ω\_*dij* = 1 if any two elements *i* and *j* belong to the same approximate equivalence class divided by the decisive attribute set; otherwise, ω\_*dij* = 0.

*sl <sup>d</sup>*: *sl <sup>d</sup>* = 1 if an attribute *d* in decisive attribute set will be selected as a new conditional attribute set to divide the universe; otherwise, *sl <sup>d</sup>* = 0, and *d* will be eliminated.

*q ik* : *q ik* = 1 if any element *i* in universe belongs to the approximate equivalent class *k* ; otherwise, *q ik* = 0.

*ss ijd*: *ss ijd* = 1 if value of any two points *i* and *j* on attribute *d* satisfies the corresponding similarity threshold *αd*; otherwise, *ss ijd* = 0.

*Q <sup>k</sup>* : Number of elements in the approximate equivalent class *k* , which is obtained from the partition of decisive attribute set to the universe.

*eikk* : *eikk* = 1 if point *i* not only belongs to the approximate equivalent class *k* of the conditional attribute set but also belongs to the approximate class *k* of decisive attribute set; otherwise, *eikk* = 0. *Ekk* : The number of elements is not only the approximate equivalence class *k* of the conditional attribute set, but also the approximate equivalence class *k* of the decisive attribute set.

*fk* : *fk* = 1 if the number of elements in the approximate equivalence class *k* of the conditional attribute set satisfies the minimum support threshold, so that the approximate equivalence class *k* can be a lower approximation set; otherwise, *fk* = 0.

*Lkk* : *Lkk* = 1 if the approximate equivalence class *k* in *kc* is the lower approximation set of the approximate equivalence class *k* in *kd*; otherwise, *Lkk* = 0.

*Yk*: If the approximate equivalence class *k* in *kc* is the lower approximation set, *Yk* is the number of elements of lower approximation set *k*.

The objective function and constraints of the model are as follows: Objective function: *Maximize* (∑*Kc <sup>k</sup>*=<sup>1</sup> *Yk*) Subject to:

1) *M* ∗ *ssijc* ≥ *α<sup>c</sup>* − |*Xci* − *Xcj*|, *i* ∈ *I*, *j* ∈ *I*, *c* ∈ *C*;

2) *M* ∗ (1 − *ssijc*) ≥ |*Xci* − *Xcj*| − *αc*, *i* ∈ *I*, *j* ∈ *I*, *c* ∈ *C*;

3) ω\_c*ij* ≤ *ssijc* + (1 − *slc*), *i* ∈ *I*, *j* ∈ *I*, *c* ∈ *C*; 4) *ssijc* ≥ 1 − *slc*, *i* ∈ *I*, *j* ∈ *I*, *c* ∈ *C*; 5) *<sup>ω</sup>*\_*cij* <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>∑</sup>*<sup>C</sup> <sup>c</sup>* (1 − *ssijc*), *i* ∈ *I*, *j* ∈ *I*, *c* ∈ *C*; 6) *M* ∗ *ss ijd* ≥ *α<sup>d</sup>* − |*Xdi* − *Xdj*|, *i* ∈ *I*, *j* ∈ *I*, d ∈ *D*; 7) *M* ∗ (1 − *ss ijd*) ≥ |*Xdi* − *Xdj*| − *αd*, *i* ∈ *I*, *j* ∈ *I*, d ∈ *D*; 8) *ω*\_*dij* ≤ *ss ijd* + (1 − *sl <sup>d</sup>*), *i* ∈ *I*, *j* ∈ *I*, d ∈ *D*; 9) *ss ijd* ≥ 1 − *sl <sup>d</sup>*, *i* ∈ *I*, *j* ∈ *I*, d ∈ *D*; 10) *<sup>ω</sup>*\_*dij* <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>∑</sup>*<sup>D</sup> <sup>d</sup>* (1 − *ss ijd*), *i* ∈ *I*, *j* ∈ *I*, d ∈ *D*; 11) *q*<sup>11</sup> = 1; 12) ∑*kc <sup>k</sup> qik* = 1, *i* ∈ *I*; 13) *qik* + *qjk* ≤ 1 + *ωcij* , *i* ∈ *I*, *j* ∈ *I*, k ∈ *kc*; 14) *Qk* = ∑*<sup>I</sup> <sup>i</sup> qik*, *k* ∈ *kc*; 15) *q* <sup>11</sup> = 1; 16) ∑*kd <sup>k</sup> q ik* = 1, *i* ∈ *I*, *k* ∈ *kd*; 17) *q ik* + *q* <sup>j</sup>*k* ≤ 1 + ωd*ij* , *i* ∈ *I*, *j* ∈ *I*, *k* ∈ *kd*; 18) *Q <sup>k</sup>* <sup>=</sup> <sup>∑</sup>*<sup>I</sup>* <sup>i</sup> *q ik* , *i* ∈ *I*, *k* ∈ *kd*; 19) 2 ∗ *eikk* ≤ *qik* + *q ik* , *i* ∈ *I*, *k* ∈ *kc*, *k* ∈ *kd*; 20) *Ekk* = ∑*<sup>I</sup> <sup>i</sup> eikk* , *k* ∈ *kc*, *k* ∈ *kd*; 21) *N* ∗ *fk* ≤ *N* + (*Qk* − *N*); 22) *card*(*I*) ∗ *Lkk* ≤ *card*(*I*)+(*Ekk* − *Qk* ∗ *β*), *k* ∈ *kc*, *k* ∈ *kd*; 23) *Lkk* ≤ *fk*, *k* ∈ *kc*, *k* ∈ *kd*; 24) *Yk* ≤ *Qk*, *k* ∈ *kc*; 25) *Yk* <sup>≤</sup> *<sup>M</sup>* <sup>∗</sup> <sup>∑</sup>*kd <sup>k</sup> Lkk* , *k* ∈ *kc*.

In MILP-FRST, the objective function and constraints are critical parts. These parts introduce the concept of the rough set and the way to complete related theories.

The objective function in the model is the number of elements that belong to the conditional attribute set and the decisive attribute set. For MILP-FRST, it is obvious that the maximum accuracy is essentially the number of elements in the maximum region by integrating the objective function with the definition of precision in the rough set. The goal of constructing this objective function is to determine the method of division to find a more accurate correlation between the conditional attribute set and decisive attribute set.

The description of concepts related to the rough set and the complement of related theories are both completed in the process of setting constraints. These descriptions and complements consist of filtering out attributes from the conditional attribute set and the decisive attribute set, dividing the universe by the decisive attribute set, dividing the universe by the conditional attribute set, calculating the lower approximation set, calculating the number of elements, and limiting the coverage of the lower approximation set. Each constraint will be explained as follows.

The process of choosing the attributes and dividing the universe will be completed in the model. *ssijc* = 1, if the distance between two elements of the attribute *c* is closer than the corresponding similarity threshold *αc*; otherwise, *ssijc* = 0. The constraints are established as follows:

$$M \* ss\_{ij\cdot c} \ge u\_c - |Xc\_i - Xc\_j|, \ i \in I, j \in I, \mathcal{c} \in \mathbb{C} \tag{9}$$

$$M\*(1 - \text{ss}\_{\text{ij}\hat{c}}) \ge |Xc\_i - Xc\_j| - a\_c, \text{ i} \in I, j \in I, \mathfrak{c} \in \mathbb{C} \tag{10}$$

where *i* and *j* are two elements of the same condition attribute *c*, and both *i* and *j* are natural numbers.

If attribute *c* is selected to divide the universe, *slc* = 1, we can establish constraint (11). Otherwise, *slc* = 0, as shown in constraint (12), and attribute *c* has no influence on dividing the universe, that is, the two elements have an indiscernibility relation on attribute *c*. Constraint (11) is defined as the necessary condition that indicates that when classifying two elements into an approximate equivalence class, it is not enough to make *ω*\_*cij* = 1. The condition of *ω*\_*cij* = 1 means that all of the attributes of the attribute set meet the corresponding similarity threshold, so that constraint (13) is established. Elements *i* and *j* have an indiscernibility relation under the condition that all *ssijc* in attribute set *C* are 1:

$$
\omega\_{\text{-}\text{i}j} \preceq\_{\text{sc}} \text{ss}\_{\text{i}\text{k}} + (1 - \text{sl}\_{\text{c}}), i \in I, j \in I, \mathfrak{c} \in \mathbb{C} \tag{11}
$$

$$\text{loss}\_{ijc} \ge 1 - \text{sl}\_c, i \in I, j \in I, c \in \mathbb{C} \tag{12}$$

$$\omega\_-\mathfrak{c}\_{ij}\geq 1 - \sum\_{\mathfrak{c}}^{\mathbb{C}} (1 - s\mathfrak{s}\_{ij\mathfrak{c}})\_\prime i \in I\_\prime j \in I\_\prime \mathfrak{c} \in \mathbb{C} \tag{13}$$

Constraints (9)–(13) initially divide the universe by the conditional attribute set, and select attributes from the conditional attribute set. Attribute sets divide the universe in accordance with the similarity between the elements of the attribute.

The processes of dividing the universe and filtering out attributes are almost the same for the conditional attribute set and decisive attribute set. Therefore, we establish constraints (14)–(18) to divide the universe by the decisive attribute set and filter out attributes from the decisive attribute set:

$$\mathbf{M} \ast \mathbf{s} \mathbf{s}\_{ijd}^{\prime} \ge \mathbf{a}\_d - |\mathbf{X}d\_i - \mathbf{X}d\_j|, i \in I, j \in I, \mathbf{d} \in D \tag{14}$$

$$M\*(1 - ss\_{ijd}') \ge |Xd\_i - Xd\_j| - \mathfrak{a}\_d, i \in I, j \in I, \mathbf{d} \in D \tag{15}$$

$$\text{l.o.d}\_{-}d\_{ij} \le \text{s}s\_{ijd}^{\prime} + (1 - \text{s}l\_d^{\prime}), i \in I, j \in I, \mathbf{d} \in D \tag{16}$$

$$\text{ss}'\_{ijd} \ge 1 - \text{sl}'\_{d'} \, i \in I, j \in I, \mathbf{d} \in D \tag{17}$$

$$
\omega\_{\text{-}d\_{\text{ij}}} \ge 1 - \sum\_{d}^{D} (1 - \text{ss}\_{\text{ijd}}'), i \in I, j \in I, \mathbf{d} \in D \tag{18}
$$

We can obtain *ω*\_*c* and *ω*\_*d* through constraints (9)–(18), but there is much to be done to fulfil the process of dividing the universe. Each element in the universe should be allocated into *kc* or *kd*.

To complete model building, we need to specify the initial element and the initial equivalence class, and set the initial element belong to the initial equivalence class. As the initial element and the initial equivalence class are only numbers, there is no specific meaning, and so this set will not affect the results of the model calculation. According to the definition of *qik*, *i* = 1 is the number of elements, *k* = 1 is the number of the approximation equivalence class, and *q*<sup>11</sup> = 1 means dividing this element into this approximation equivalence class. We can establish constraint (19):

$$q\_{11} = 1\tag{19}$$

Each element belongs to only one approximate equivalence class. However, not every predetermined approximate equivalence class has its own elements. When the number of approximate equivalence classes is unknown, the number of approximate equivalence classes in the set of approximate equivalence classes may be redundant. If the number of the provided approximate equivalence classes is less than the number of actual approximate equivalence classes, the model will not be solvable, so we establish constraint (20):

$$\sum\_{k}^{k\_c} q\_{ik} = 1, i \in I \tag{20}$$

Only when it is confirmed that the two elements *i* and *j* can be classified into the same approximate equivalence class can elements *i* and *j* be classified into an approximate equivalence class. The value of *qik* and *qjk* can be 1 at the same time only when *ω*\_*cij* = 1. We establish constraint (21):

$$q\_{ik} + q\_{jk} \le 1 + \omega\_{\varepsilon\_{ij}}, i \in I, j \in I, \mathbf{k} \in k\_{\varepsilon} \tag{21}$$

Variable *Qk* counts the number of elements allotted into each approximate equivalence class divided by the conditional attribute set. We establish constraint (22):

$$Q\_k = \sum\_{i}^{l} q\_{ik} \, k \in k\_{\mathcal{L}} \tag{22}$$

Similarly, constraints (23)–(18) implement the process of allotting the element of the decisive attribute set:

$$q\_{11}' = 1\tag{23}$$

$$\sum\_{k'}^{k\_d} q'\_{ik'} = 1, i \in I, k' \in k\_d \tag{24}$$

$$1\_{\dot{}}q'\_{\dot{}k'} + q'\_{\dot{\}}{}^{\prime} \le 1 + \omega\_{\text{d}\_{\dot{\text{I}}j}} i \in I, j \in I, k^{\prime} \in k\_d \tag{25}$$

$$Q'\_{k'} = \sum\_{i}^{I} q'\_{ik'}, i \in I, k' \in k\_d \tag{26}$$

The above constraints complete the process of selecting attributes and dividing the universe.

Constraints (27)–(31) implement the process of defining the lower approximation set and setting the minimum support threshold.

If one element belongs to the approximate equivalence class *k* and the approximate equivalence class *k* on the basis of the definition of the lower approximate set, this element will be selected, so we establish constraint (27):

$$2\*e\_{ikk'} \le q\_{ik} + q'\_{ik'}, \ i \in I, k \in k\_c, k' \in k\_d \tag{27}$$

The number of elements obtained by constraint (19) should be counted, so we establish constraint (28):

$$E\_{kk'} = \sum\_{i}^{I} e\_{ikk'\prime} \; k \in k\_c, k' \in k\_d \tag{28}$$

The minimum support threshold requires that the lower approximation set should meet the requirement of the minimum support number. Constraints (29) and (31) complete the limitation of the minimum support number for the lower approximate set. In constraints (29) and (31), *fk* shows whether the number of elements in the corresponding approximate equivalence class satisfies the minimum support number; if *Qk* < *N*, then *fk* must be 0. MILP-FRST introduces variable precision as an auxiliary method of improving the ability of resisting noisy data. Constraint (30) realizes the process of defining the lower approximate set:

$$N \* f\_k \le N + (Q\_k - N) \tag{29}$$

$$
\epsilon \operatorname{card}(I) \ast L\_{kk'} \le \operatorname{card}(I) + (E\_{kk'} - Q\_k \ast \beta), k \in k\_c, k' \in k\_d \tag{30}
$$

$$L\_{kk'} \le f\_{k'}k \in k\_{\varepsilon'}k' \in k\_d \tag{31}$$

Finally, the number of elements in the lower approximate set is counted. If the approximate equivalence class obtained by conditional attribute set does not belong to any approximate equivalence class obtained by the decisive attribute set, this approximate equivalence class will be deemed to be an uncertain region, so the number of elements in the certain region is 0. Otherwise, this approximate equivalence class is a certain region, so the number of elements in the region equals the number of element points in this approximate equivalence class. Above all, we establish constraints (32) and (33):

$$Y\_k \le Q\_k, k \in k\_c \tag{32}$$

$$\mathcal{Y}\_k \le M \ast \sum\_{k'}^{k\_d} L\_{kk'}, k \in k\_c \tag{33}$$
