**4. Results**

This section describes the experiments carried out by our proposal on the clinical datasets, which have been selected from Machine Learning Repository [62]. The used datasets deal with Heart, Hepatitis, and Dermatology diseases, which we call DS1, DS2, and DS3, respectively, and have the features listed below. Note that Number of patterns refers to the number of instances (number of rows) of the dataset while Number of attributes refers to the number of variables (number of columns) of the dataset.

	- • Number of patterns: 267;
	- • Number of attributes: 22 plus the class attribute;
	- • Number of classes: 2 classes;
	- • Attribute type: binary;
	- • Missing attribute values: No missing values.
	- • Number of patterns: 155;
	- • Number of attributes: 19 plus the class attribute;
	- • Number of classes: 2 classes;
	- • Attribute type: Categorical, Integer and Real;
	- • Missing attribute values: yes (10-nearest neighbor technique was used in this research for imputation of missing values).
	- • Number of patterns: 366;
	- • Number of attributes: 34 plus the class attribute;
	- • Number of classes: 6 classes;
	- • Attribute type: Categorical and Integer;
	- • Missing attribute values: yes (20-nearest neighbor technique was used in this research for imputation of missing values).

#### *4.1. Exploring and Analyzing the Datasets*

This section shows different linked views of the distribution and structure of the datasets, which allow us to have an initial assessment of their behavior. This can also help explain some of the results obtained for the methods applied. Starting with the DS1 dataset shown in Figure 1, we have that this dataset is represented by a diamond-shaped cloud of points. According to the point distribution in each class, Class-1 is much more compact and bigger than Class-0. Therefore, Class-0 could need more rules to classify the patterns of its class than Class-1. Since points in Class-1 are more scattered in space and both classes are intertwined, it would be more difficult for this class to find rules that do not classify patterns in Class-0 by mistake. On the other hand, at the bottom of the figure, there is the heatmap of the dataset where both classes are separated by boxes. Note that the values in this dataset are binary and, in Class-0, values 0 predominate while, in the other class, values 1 predominate. This means that, unlike Class-1, Class-0 is characterized by the absence of the property denoted by many of the attributes evaluated by the disease represented in the dataset.

Turning now to the DS2 dataset shown in Figure 2, we have that this dataset is represented by a cloud of points in tree form at the top of the figure. As in the DS1 dataset, both classes are intertwined, Class-1 is more compact and bigger than Class-0. Unlike the DS1 dataset, points are more scattered in space, which

can induce a smaller number of rules classifying each class. Note that it is more difficult to find visual differences separating both classes for the heatmap given for DS2 than for the case of DS1. The above can imply that DS2 is a difficult dataset to classify, which tests any applied classifier (method).

As for the DS3 dataset shown in Figure 3, we have that, unlike the other datasets, this one has six classes, which may increase the classification error of the methods applied to the dataset. The point cloud of this dataset, shown at the top of the figure, is T-shaped with agglomerations of points at the ends and in the center of the 3D-scatterplot. Note that the same T-structure of the dataset is maintained for the points in each class. In this case, each class may generate four rules since there are four clusters in each class. However, the greatest difficulty would be to separate the classes from the others, since the six classes are very interrelated. Finally, note that classes 0, 1, and 2 differ from classes 3, 4, and 5 in that, in the latter, the light green color predominates (values below the average value of the whole dataset), while, in the rest of classes, the representative color is brown (values above the average value of the whole dataset). This shows that classes 0, 1, and 2 share some type of similarity with the type of disease represented by each class, which makes the difference from the diseases represented in classes 3, 4, and 5. The same reasoning done for classes 0, 1, and 2 is met for classes 3, 4, and 5 of DS3.

**Figure 1.** DS1 dataset (Heart Dataset). A 3D-scatterplot is shown at the top of the figure where each point represents a column (an individual) of the dataset showed as a heatmap at the bottom. The dimension of the dataset was reduced to three components by using principal component analysis. In addition, points belonging to each class are shown in different colors. The heatmap corresponding to the same dataset is shown at the bottom of the figure. Each class of the dataset is framed in a box. The color bar shown at the top represents the color scale used in the heatmap.

**Figure 2.** DS2 dataset (Hepatitis Dataset). A 3D-scatterplot is shown at the top of the figure where each point represents a a column (an individual) of the dataset showed as a heatmap at the bottom. The dimension of the dataset was reduced to three components by using principal component analysis. In addition, points belonging to each class are shown in different colors. The heatmap corresponding to the same dataset is shown at the bottom of the figure. Each class of the dataset is framed in a box. The color bar shown at the top represents the color scale used in the heatmap.

#### *4.2. Mutation Operator Evaluation*

This section deals with the evaluation of mutation operators M1, M2, and M3 from their effectiveness and behavior under the different given datasets. The goal of this test is to carry out an analysis of the behavior of the mutation operators to select the operators having a better performance on a given dataset. Consequently, we have run the evolutionary method (ESRBC) using only the mutation operators (without the crossover operator). Then, ESRBC is run 20 times for each operator and each dataset (DS1, DS2, and DS3) in the following way: for each mutation operator applied to a dataset, ESRBC has been run 20 times, each using a different mutation probability value. In each execution, the probability value is increased in a step of 0.05, starting from 0. Then, for each mutation value, the fitness value of the most fit individual in 5000 generations has been taken out to render the graphics given in Figures A1–A3 (Appendix B) for datasets DS1, DS2, and DS3, respectively. Thus, the achieved graphics represent mutation probability values in *x*-axis versus fitness value (*y*-axis) for the best individual yielded in each mutation probability value.

As shown in these figures, each row deals with four graphics, which correspond to the same experiment repeated with the same mutation operator. Since ESRBC includes a stochastic process in the search, we have repeated the experiment four times for each mutation operator. Therefore, each row in these figures correspond to a mutation operator, i.e., the first row represents M1, the second and third rows correspond to M2 and M3, respectively. Finally, each graphic in each row represents the fitness values reached by the best individuals for the current operator mutation. Such values are represented by means of a blue curve. The mean fitness values from the four experiments carried out (in each row) for each operator are represented through the green curve, whereas the standard error bars are stressed in pink lines.

**Figure 3.** DS3 dataset (Dermatology Dataset). A 3D-scatterplot is shown at the top of the figure where each point represents a column (an individual) of the dataset showed as a heatmap at the bottom. The dimension of the dataset was reduced to three components by using principal component analysis. In addition, points belonging to each class are shown in different colors. The heatmap corresponding to the same dataset is shown at the bottom of the figure. Each class of the dataset is framed in a box. The color bar shown at the top represents the color scale used in the heatmap.

By making now an analysis of the results for these three figures, we can say that, for the DS1 dataset, the results in Figure A1 (Appendix B) show that, for operators M1 and M2, most of the reached fitness values (blue curve) are between 3 and 3.4. On the other hand, there is overlap between the error bars (pink bars), which indicates uniformity of fitness values for M1 and M2 with respect to different mutation probability values. However, the fitness values (blue curve) given in the graphics for operators M1 and M2 present more oscillations than the ones represented by the curves given for the M3 operator. In addition, the standard error bars for the M3 operator are smaller, indicating that the average value plotted is more reliable than the one of those in M1 and M2. Moreover, most fitness values with respect to mutation probability values are between 3.2 and 3.4. Thus, the M3 operator appears to be more significant for the DS1 dataset than operators M1 and M2. Hence, we can use only M3 as the mutation operator when using the evolutionary method to build a classifier on the DS1 dataset. In addition, keep in mind that, since the standard error bars overlap in the M3 operator, it is not necessary to assign a big valor of mutation probability in the running of ESRBC to build the rule-based classifier. The above improves the runtime of the method.

Unlike Figure A1 (Appendix B), graphics in Figure A2 present more oscillations according to the curve representing fitness values across from mutation probability values (blue curve). However, the error bars maintain an overlap. In addition, note that the fitness values achieved for M1 and M3 are higher than those given in Figure A1. That is, for M1, most fitness values are between 3.4 and 3.6. For M3, most fitness values are between 3.5 and 4 whereas fitness values for the M2 mutation operator are more unstable with respect to mutation probability values. Therefore, we can use operators M1 and M3 as the only ESRBC mutation operators when using the DS2 dataset.

For the results obtained from the DS3 dataset, Figure A3 (Appendix B), we have that they are like those given in Figure A1. Hence, by applying the same reasoning as the one given in Figure A1, the M3 mutation operator is the most stable and so the operator that best performs on the DS3 dataset. Once the mutation operators performing well on each dataset have been chosen, we can proceed to compare the classifiers induced by our method with other machine learning methods under the accuracy measure.

#### *4.3. Accuracy and Comparison of the Evolutionary Method*

The accuracy of the rule-based classifier yielded by our approach has been computed and compared with other methods for each introduced dataset. A stratified 10-fold cross-validation was used to measure the accuracy of all methods. The evolutionary method (ESRBC) defined was run in two stages. In the first stage, ESRBC was run by using the *f*1 fitness function, whereas *f*2 was used in second stage. The settings of ESRBC for each dataset have been listed in Table 1 and the methods used in the comparison process [55,56,63,64] have been listed in Table 2. Then, the results reached by ESRBC compared with other methods have been listed in Table 3. The best accuracy value for each dataset has been underlined. ESRBC reached the best values for the DS1 and DS3 datasets while its accuracy for DS2 was not very different from the one of the method reaching the best value. Since the number of patterns for the classes of the DS1 and DS2 datasets are unbalanced, Table 3 also shows the Youden index which deals with unbalanced classes in a dataset. This index is defined as *sensitivity* + *speci ficity* − 1.

Note that the methods listed for the DS3 dataset are different from those used in DS1 and DS2. This is because the methods used for DS1 and DS2 are for binary classification, whereas the DS3 dataset has six classes. Therefore, we need to use multiclass methods in DS3, different from the methods used in the previous datasets. On the other hand, the greatest accuracy reached for the DS1 and DS2 datasets was less than 90%, which tells us that they are difficult to classify (due to their compactness and difference in the size of their classes), as explained in Section 4.1. However, the greatest accuracy reached for the DS3 dataset was greater than 90%, although DS3 has six classes. This may be due to the distribution of

the dataset, where each class is represented by four groups of points separated from each other (which facilitates the classification), as explained in Section 4.1.

**Table 1.** Settings given to run the evolutionary method (ESRBC) to build rule-based classifiers for each dataset.


**Table 2.** Name and description of the methods used in the comparison of the approach proposed.



**Table 2.** *Cont.*


**Table 3.** Comparative table of mean accuracy for the evolutionary method (ESRBC) compared with the other machine learning methods.

#### **5. Discussion: Rule Analysis**

This section makes an analysis of rules discovered by the classifiers induced by the evolutionary method (ESRBC) for each dataset in Table 3. The aim of this analysis is to discover knowledge from those rules and identify attributes and relations relevant for the disease. In that sense, such prior knowledge would act as a starting point for experts in this field.

Appendix A lists the rules given by the best classifier found by our proposal for each dataset. The analysis carried out in this section is based on knowledge disclosed from such rules. Starting from the DS1 dataset, we have that it contains diagnoses based on 22 features, built from Single Proton Emission Computed Tomography (SPECT) images, which aim to distinguish between heart disease and normal heart operation. For this case, ESRBC found six rules for a class and only one rule for the other class. Of the 22 attributes, only five of them were not used by any rule (F1, F2, F9, F12, and F18), which implies that they are not important in the classification of the disease and may be discarded from the analysis. However, attributes F5, F21, and F22 achieved the greatest frequency of occurrence by rules in class-0 (they occurred in 42.86% of rules). Hence, such attributes are representative for class-0. Meanwhile, class-1 used a single rule with only one attribute, F8. The F8 attribute has been used in both classes, so it is not only important for class-1 but also for the disease in question.

The DS2 dataset consists of 19 attributes and two different classes, including clinical and biochemical variables. ESRBC found three rules for class-0 and 2 rules for class-1. Of the 19 attributes, 12 of them were used in the rules and seven of them were not used by any rule (STEROID, MALAISE, ANOREXIA, LIVERBIG, LIVERFIRM, VARICES and HISTOLOGY). Thus, they can be discarded from the classification process. In addition, the most frequent attributes by rules in class-0 were ALBUMIN with 100%, PROTIME, AGE and ALK PHOSPHATE with 66.67%, whereas class-1 only used the ALBUMIN and SEX attributes. Note that the ALBUMIN attribute is the only one used in both classes. Therefore, this attribute is significant for the study of the disease. This way, we can identify three groups of patients presenting different features in class-0: patients holding {*ALBUMIN* ≤ 3.99, *PROT IME* ≤ 50, *SEX* = <sup>1</sup>}, patients holding {*ALBUMIN* = 3.80, *AGE* ≥ 37, 64.88 ≤ *ALKPHOSPHATE* < 95} and patients holding {*ALBUMIN* ≥ 3.50, *PROT IME* ≥ 56.16, *ALKPHOSPHATE* > 104.77, *AGE* ≥ <sup>30</sup>}. Patients in class-1 are governmen<sup>t</sup> by attributes {*ALBUMIN* ≥ 2.9, *SEX* = <sup>2</sup>}.

The DS3 dataset presents data of patients of six different erythemato-squamous diseases. That is, *psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, cronic dermatitis* and *pityriasis rubra pilaris*. The main interest of applying our proposal to this dataset is that these diseases are difficult to distinguish, and they normally require a biopsy and present many common histologic characteristics. The classifier found for DS3 rendered 11 rules distributed in the six classes. Namely, 1 rule in classes 0, 2, 4 and 5; 2 rules in class-1 and 4 rules in class-3, which coincides with that explained in Section 4.1 for DS3. Of the 33 attributes in this dataset, 12 were filtered by the rules of the classifier, whereas 21 were not selected by the same rules. In this case, note that a significant number of attributes was not chosen by the rules of the classifier. This means that the classifier was able to filter the most relevant features (12 features, see Appendix A.3) for the diseases represented by DS3, whereas the remaining features can be removed from the analysis, since they do not provide valuable information.

By analyzing the attributes in this dataset, we have that the FIBROSIS, AGE, ITCHING, and SPONGIO attributes have the greatest frequency of occurrence. In particular, FIBROSIS appears in 50% of the classes of this dataset (classes: psoriasis, seboreic dermatitis and cronic dermatitis), whereas AGE appears in 80% of rules in class-3 (lichen planus), ITCHING, and SPONGIO appear in 60% of rules in the same class-3. In particular, patients in each class are governed by the following relationships:

**Class-0:** patients holding {*F IBROSIS* = 0, *SPONG IO* = 0, *ELONGAT ION* > <sup>0</sup>}; **Class-1:** patients holding {*F IBROSIS* = 0, *AGE* = 20, *DBORDERS* ≤ <sup>2</sup>}; **Class-2:** patients holding {*BANDLIKE* > 1, *THINNING* = <sup>1</sup>}; **Class-3:** this class supports four age-related subgroups of patients, namely,{*AGE* ≥ 18, *ITCHING* ≤ <sup>1</sup>}, {*AGE* = 27, *ITCHING* < 2, *SPONG IO* > <sup>0</sup>}, {*AGE* = 36, *ITCHING* ≤ 1, *SPONG IO* > 0} and {*AGE* = 62, *SPONG IO* > <sup>0</sup>}; **Class-4:** patients holding {*F IBROSIS* > 0, *POLYPAPULES* = <sup>0</sup>}; **Class-5:** patients holding {*PERIFOLLI* > 0, *FOLLIPAPULES* > <sup>0</sup>}.

Note that, unlike the AGE feature, a value zero for the remaining features means that such a feature is not present in the patient, whereas a value greater than zero means that the patient presents the feature to a degree associated with the value. Consequently, with the results above, we can say that the study of these attributes can contribute to gain more insight about the diseases involved in such a dataset.
