**1. Introduction**

Diesel engine is the power core of the ship. In the manufacturing process of the diesel engine, the assembly quality affects the performance indexes of the diesel engine, which is an important factor to measure the quality of the whole engine. Previous studies on the relationship between assembly clearance and machine quality have mainly focused on the mechanical principle, while the data mining method is still less used to mine the relationship between assembly clearance and machine quality. With the development of data mining technology and the accumulation of a large number of raw data, it is possible to apply data mining methods to solve this problem.

With the development of computers and the internet, the amount of data is increasing. Data always contain noise; therefore, it is necessary to handle noisy data to obtain accurate results. Some researchers have put forward methods to address this issue. Among these methods, the two classic methods are fuzzy set [1] and evidence theory [2]. However, these methods sometimes require additional information or prior knowledge, such as fuzzy membership functions, basic probability assignment functions and statistical probability distributions, which are not always easy to obtain.

Rough set theory provides a new way to address vagueness and uncertainty [3]. The core concept of rough set theory is to deduce imprecise data, or to find the correlation between different data by representing the given finite set as an upper approximation set or a lower approximation set.

Despite the advantages of rough set theory, some challenges need to be overcome in practical applications. These problems can be classified into two categories: (1) There are certain limitations of a rough set in practical applications. Many extension and correction theories to the classical rough set have been developed. For example, a model integrating distance and partition distance with a rough set on the basis of a rough set based on a similarity relation was proposed [4]. This model also provided a new understanding of the classification criteria of rough set equivalence classes. Φ-Rough set, another extension of the rough set theory based on the similarity relation, was proposed [5]. Φ-Rough set replaces the indiscernibility relation of the crisp rough set theory by the notion of the Φ-approximate equivalence relation. Use of dynamic probabilistic rough sets with incomplete data addressed the challenge of processing such dynamic and incomplete data [6]. Based on a three-way classification of attributes into the pair-wise disjoint sets of core, marginal, and non-useful attributes, the relationships between the corresponding classes of classification-based and class-specific attributes were examined [7]. (2) A rough set is sensitive to noisy data. The accuracy of a decision-making model based on a rough set is low when applied to the analysis of datasets containing noisy data [8]. To strengthen its ability to resist noisy data, the variable precision rough set (in short, VPRS) was proposed [9]. VPRS has been applied in several fields, such as data mining [10], decision systems [11], and expert systems [12], and provides satisfactory results. Similarly, knowledge reduction is an important research direction of VPRS. However, the related methods and theories are not mature. The most popular study based on VPRS is attribute reduction. In addition, variable precision threshold beta is usually determined by experts. Hence, some researchers have proposed the selection method of beta, which can reduce the difficulty of beta determination due to a lack of prior knowledge [13,14]. Zavareh M. and Maggioni V. proposes an approach to analyze water quality data that is based on rough set theory [15]. Bo C. studies multigranulation neutrosophic rough sets (MNRSs) and their applications in multi-attribute group decision-making [16]. Akram M., Ali G. and Alsheh N. O. introduce notions of soft rough m-polar fuzzy sets and m-polar fuzzy soft rough sets as novel hybrid models for soft computing, and investigate some of their fundamental properties [17]. Jia X. et al. propose an optimization representation of decision-theoretic rough set model. An optimization problem is proposed by considering the minimization of the decision cost [18]. Cao T. et al. discusses the use of parallel computation to obtain rough set approximations from large-scale information systems where missing data exist in both condition and decision attributes [19].

The linear programming method is a classical mathematical method, and its principle is to solve a series of linear constraint equations or inequalities on the premise of satisfying the method of solving linear functions of extreme objectives. Though its mathematical model is simple, it is widely used in various fields, such as location problems, route planning problems, manufacturing problems, marketing problems, and resource allocation problems. Application of the linear programming method can provide an effective and feasible decision-making basis for the aforementioned problems.

The linear programming problem (LP) is a problem of solving the maximum or minimum value of a linear function under a set of linear equality and inequality constraints [20]. The general linear programming model is composed of the following elements parameters and variables, objective functions, and constraint conditions.

There has been much research on optimization problems based on linear programming theory, such as the theory of linear programming and establishment of a mathematical model [21], as well as the combination of practical production, and enterprise and establishment of a linear programming model to solve the problem of allocating enterprise production resources [22]. Linear programming methods have also been used to optimize input–output models, and to establish a multi-objective linear programming model to maximize economic benefits and to minimize resource utilization [23]. One of the most important application fields of linear programming are location problems. A mixed integer linear programming model has been built to select the location of renewable energy facilities [24], and to study a multi-stage facility location problem [25]. To solve the vehicle routing problem of a distribution center, a two-stage solution was proposed.

One of the preprocessing methods of noisy data is regression. Hence, it is obvious that the linear programming model has a strong ability to resist noisy data, and designing a decision-making model based on rough set achieves the ideal accuracy. The integration of the rough set with the linear programming model will not only improve the inadequacies of the rough set, but will also make the decision-making model reach optimal accuracy, theoretically. There are a few studies on the integration of the rough set and linear programming model so far. Zhang et al. proposed a multi-objective linear programming method based on the rough set, to develop a classification for data mining. Based on their model, an improved model to predict hot spots of protein interactions was proposed [26]. However, among all of the above studies, the rough set was only used to reduce the attribute set. Because nonlinear models are considered to be the only way to describe the rough set, there are no studies on the application of linear programming methods to optimize decision-making models based on the rough set.

The biggest weakness of the decision model based on the rough set is its sensitivity to noisy data. VPRS only broadens the requirement of the upper and lower approximations in the definition, and the selection of precision often has strong subjectivity and lacks scientific evidence. VPRS can only be used as an auxiliary method to improve the resistance of the rough set model to noisy data, rather than the main method. Therefore, in this study, we extend the rough set theory via mixed-integer linear programming and we propose a model called the mixed-integer linear programming model for rough set-based classification with flexible attribute selection (in short, MILP-FRST). This model includes the advantages of MILP in resisting noisy data, and it has the ability to select attributes flexibly and automatically. MILP-FRST is able to divide the universe by attribute sets, calculate the lower approximation set under the condition of the presupposed variable precision and the minimum support number, and calculate the decision accuracy and screen out attributes. We set the maximum number of elements in the determination area as the objective function of the model. The processing of attribute selection, and partitioning of the attribute set for the universe are maximized by the objective function. During implementation, attributes that have a significant influence on the accuracy of the decision system will be selected, and the attribute set partition scheme is calculated to achieve the highest accuracy of the decision-making system. In addition, rough set models are often considered to be nonlinear. This paper first describes the related concepts and theories using linear models, which are an extension of rough set theory.

Next, we use the model to mine the correlation between the assembly clearance of diesel engine and the quality of the whole engine based on the dataset, which contains 28 attributes of the assembly clearance parameters and the whole machine quality of 29 diesel engines. Before applying the model, we carry out data pretreatment, and we screen out 15 principal components. These components cover the vast majority of information on the assembly clearance parameters of all diesel engines. Then, we input these data into the model. The experimental results verified the effectiveness and advantages of MILP-FRST.

The rest of the paper is organized as follows. In Section 2, we introduce the concept of the rough set and functional dependence. In Section 3, we build a mixed-integer linear programming model for rough set-based classification with flexible attribute selection. In Section 4, we use the clearance parameter data of 29 diesel engines and the quality data of the whole engine to verify the validity and accuracy of the model. Finally, in Section 5, conclusions are presented.

#### **2. Rough Set Theory**

#### *2.1. Concepts and Definitions of Rough Sets*

Consider a rough set based on an information system [27]: *IS* = (*I*, *A*), where *I* is the universe; *A* is the attribute set. Both *I* and *A* are nonempty finite sets.

If the information system meets the conditions that *<sup>A</sup>* <sup>=</sup> *<sup>C</sup>* <sup>∪</sup> *<sup>D</sup>* <sup>=</sup> <sup>∅</sup>, this information system can be called a decision-making system *DS* = (*I*, *C* ∪ *D*), where *C* is the conditional attribute set, and *D* is the decisive attribute set.

**Definition 1.** *Indiscernibility relations [27]: In an information system IS* = (*I*, *A*)*, set B is a subset of the attribute set A, binary relation IND*(*B*) = {(*x*, *y*) ∈ *I* × *I* : ∀*a* ∈ *B*, *a*(*x*) = *a*(*y*)} *is the indiscernibility relation of IS, recorded as IND*(*B*)*, where x and y are elements of the universe; a is an attribute of the attribute set; and a*(*x*) *is the value of the element x in attribute a*.

**Definition 2.** *Equivalence class [27]: In an information system IS* = (*I*, *A*)*, set B is a subset of the attribute set A. The indiscernibility relation IND*(*B*) *divides the universe I into several equivalence classes, where I*/*IND*(*B*) *is the set of all equivalence classes, and* [*x*]*IND*(*B*) *is the equivalence class containing element x*.

**Definition 3.** *Upper and lower approximation [27]: In an information system IS* = (*I*, *A*)*, set B is a subset of the attribute set A, and set X is a subset of the universe I*:

$$\underline{B}X = \{ i \in I | [i]\_{IND(B)} \subseteq X \} \tag{1}$$

$$\mathbb{Z}X = \{ \mathbf{i} \in I | [\mathbf{i}]\_{IND(B)} \cap X \neq \mathcal{D} \}\tag{2}$$

*where BX is the lower approximation; BX is the upper approximation*.

**Definition 4.** *The accuracy and membership grade [27] are:*

$$a\_B = \frac{|\underline{B}X|}{|\overline{B}X|}\tag{3}$$

$$
\rho\_B = 1 - a\_B \tag{4}
$$

*where aB is the accuracy of rough set X, and ρ<sup>B</sup> is the membership grade of rough set X*.

**Definition 5.** *The membership function [27] is:*

$$\mu(\mathbf{x}, \mathbf{X}) = \frac{| [\mathbf{x}]\_B \cap \mathbf{X} |}{| [\mathbf{x}]\_B |} \tag{5}$$

*The membership function indicates the membership degree of element x to the rough set X*.

**Definition 6.** *The accuracy of the decision-making system [27] is:*

$$\lambda = \frac{|\sum\_{k=1}^{Kc} \underline{B}X\_k|}{|I|} \tag{6}$$

Given that the strict definitions of the upper and lower approximations make the rough set sensitive to noisy data, the rough set cannot adapt well to all situations in practical applications. VPRS decreases the influence of missing data, incorrect data, and noisy data. In VPRS, an approximation variable precision *β*, which ranges from 0.5 to 1, represents the tolerance degree of the rough set to noisy data and incorrect data. *β* can be defined as follows:
