*5.3. The Theory of Rough Sets to Support the Lean Maintenance Assessment*

The rough set theory is one of the fastest growing branches of data exploration. It allows for a formal approach to all phenomena related to knowledge processing, therefore it is used as a methodology in the process of knowledge discovery from data. In particular, it can be used to test the imprecision and uncertainty in the data analysis process. It enables finding the relationship between explanatory variables (conditional attributes) and explained variables (decision attributes), which facilitates supporting decision-making based on data. It is also used to reduce dimensionality, consisting of removing from the dataset those explanatory variables that do not significantly affect the explained variables. Knowledge derived from data based on the rough set theory is recorded in the form of decision rules [55]. Details on the formal description of the rough set theory can be found, among others, at work [56]. Often, the purpose of the decision-making system based on rough sets is to search for hidden, and therefore, implicit rules that have not worked well during the selection made by an expert (or experts) [55–57]. Approximate sets are used to process the so-called unclear data with the use of intuitively understood inference rules. They can be used to search for hidden dependencies in input data, including decision support in the scope of cases that can be described with discrete attributes.

In this paper, the rough set theory was used to assess the degree of lean maintenance use. The same set of input data was used for the assessment as in the decision trees. Due to the presence of the so-called incomplete data, the use of rough sets improved the accuracy of the solution. Various types of algorithms were used to interfere with the rules.

The use of the rough set theory, and thus incomplete data, increased the number of analyzed enterprises from 24 included in decision trees to 34. An additional 10 analyzed enterprises were characterized by a set of variables, for which at least one variable did not have a specific value (no answer). By using decision tables in the rough set theory, it is possible to include more data when generating rules. This allows for the identification of new dependencies between the variables. To make the assessment, the rules were validated. In order to generate decision rules on the basis of the rough set theory, Rough Set Exploration System (RSES) software was used. The software was developed at the Institute of Mathematics of Warsaw University.

RSES software allows one to generate decision rules with the means of four algorithms: exhaustive algorithm, genetic algorithm, covering algorithm, and learning from examples module version 2 (LEM2). They were described in the works [57,58]. Furthermore, the software contains a number of other options, which, e.g., assign reductions for a given computer system. A reduct is a set of R attributes, where R ⊂ A, which allows to differentiate pairs of objects in a computer system, and at the same time, no other R proper subset possesses this property. Reductions are calculated with an exhaustive or genetic algorithm. On the basis of the assigned reductions, it is possible to create decision rules as well.

For each of decision classes, RSES software calculates three indicators, which indicate classification quality:


Accuracy and coverage are also calculated jointly for all decision classes (for the whole set of rules). Decision rules for the described variable "an average OEE value" were generated by means of all four algorithms available in RSES. The scheme of the conducted study is shown in Figure 7. The OEE symbol designates a decision table which contains 34 studied objects (enterprises). Each object is described by 17 explanatory variables: an enterprise size, production type, industry, ownership type, capital, actions undertaken to prevent unplanned downtimes (NPA number—MSI indicator), machine category, spare parts category, actions in the TPM implementation (NTPMA indicator), and mean time to repair. The described variable "an average OEE value" played in the study the role of a decision attribute. The remaining symbols in the scheme are described in Table 11.

While formulating the decision rules, the parameters of genetic and covering algorithms were chosen in such a way that the accuracy and coverage of the created set were equal to 1. Table 12 includes the information on a number of rules in each of the four sets of rules. For each of the rules, a rule match is calculated. It is equal to the number of objects from the learning set and matching the forerunner of the rule.

**Figure 7.** The scheme for the explained variable "an average OEE value".



**Table 12.** The number of rules in each of the created sets for the described variable "an average OEE value".


Each of the four rule sets was used for the classification of the data from the OEE decision table. The classification was accomplished by a standard voting method. The manner of such voting is as follows: each of the generated rules determines the value of a variable described for the considered object (an enterprise). The calculated match value of particular rules is treated during the voting as an importance—the higher the match of a given rule, the more important is its vote. That is why it is more influential on a final voting result than the vote of the rule with a lower match. Eventually, the object is assigned such a value of an explanatory variable that won the weighted voting.

The result of the classification was presented in the form of a confusion matrix. The confusion matrix that includes the results of the classification accomplished by the rule set generated by an exhaustive algorithm was presented in Figure 8. Matrix rows correspond to the real decision classes (the values of the variable described). However, matrix columns are the results of the classification that was accomplished by the generated rules. All 34 objects that are in the decision board were classified properly. It is reflected in the values located only on the main diagonal of the confusion matrix. The last three columns in the described figure show the information on the number of objects belonging to a given decision class (no. of obj.), accuracy, and coverage. The last row of the table is the true positive rate calculated for each class individually. The bottom part of the window presents the number of all studied objects and the accuracy and coverage calculated for all decision classes altogether.


**Figure 8.** Confusion matrix for the rules generated by an exhaustive algorithm.

Confusion matrices were also created for the classification results based on the three remaining rule sets. Each of these matrices included the same results (accuracy = 1), such as the matrix presented in Figure 4, which indicates that there are no classification errors also for other classifications.

The assessment of the developed decision rules (decision trees and rough set theory) was carried out in the following stages: generation of a decision table and confusion matrix; development of an expert system based on the generated decision rules; use of the obtained study results to test the overall classification capacity; use of the obtained study results to test the overall classification capacity of decision-making rules using the developed expert system; and qualitative assessment of the results obtained using classification quality measures.

The results of the surveys from 20 companies of the Podkarpackie Voivodship were reused to validate the decision-making rules. Among the companies analyzed, the largest group included large companies (85%) from the aviation industry (50%). The majority of them were private companies (95%), with a majority of foreign capital (85%). Large-batch production (45%) dominated among the companies surveyed. On the basis of the results obtained, a decision table was created. The created decision table was introduced into the RSES system. It allowed us to create a confusion matrix for the explanatory variable "an average OEE value". Maximum coverage calculated for all decision classes in total that equals 1 and accuracy of 0.30 for the explanatory variable "an average OEE value" were achieved using a set of rules generated by a genetic algorithm (Figure 9).

**Figure 9.** Confusion matrix for the rules generated by an genetic algorithm.

In order to carry out the next stage of validation, an expert system was developed. The decision-making rules generated by all algorithms were implemented in the knowledge base that

was created for the system needs. The knowledge base for the expert system was developed with PC-Shell software, which is part of the Aitech SPHINX integrated artificial intelligence suite and Aitech HybRex software.

In order to test the overall quality of the classifier for all algorithms, confusion matrices (Tables 13–16) were used. These matrices were developed by comparing the results obtained by the studied companies with the result generated by the developed expert system. The classification was carried out again according to the following voting method:



**Table 13.** Confusion matrix for the rules generated by an exhaustive algorithm.



**Table 15.** Confusion matrix for the rules generated by a covering algorithm.




When analyzing particular confusion matrices, it should be noted that the best results for the most common classes (30–50% and 70–85%) of the accuracy value were obtained for the rules generated by an exhaustive algorithm. The accuracy value was 0.6 and 0.875, respectively. In order to accurately assess the quality of the classifiers based on binary matrices, the values of the twelve indicators were calculated for each of the matrices according to the Table 9.
