1. Introduction
Cancer is a group of diseases that cause cells in the body to change and spread out of control [
1]. Breast cancer is considered the second most common cancer among women in the United States (some kinds of skin cancer are the most common). According to [
2], among the signs and symptoms of breast cancer, we can find a lump or swelling in the breast, upper chest, or armpit; changes in the size or shape of the breast; a change in skin texture and color; rash, crusting, or modifications to the nipple. For the mentioned causes, it is critical to create simulations that help in the decision-making process for initial detection, proper therapy, and therapy [
3] to achieve a rapid diagnosis. Fuzzy systems have been used for breast cancer classification [
4,
5], among other uses. Fuzzy set theory is known as the basis of all fuzzy logic methods [
6]. Fuzzy set theory was proposed by Zadeh [
7] as an extension of the classical set theory to model sets, whose elements have degrees of membership [
8]. According to [
7], a fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns each object a category of membership ranging between zero and one. A degree of one means that an object is a member of the set, a value of zero means it is not a member, and a value somewhere in-between shows a partial degree of membership [
8]. This partial degree of membership is also known as the membership function. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established [
7].
The fuzzy set theory provides the tools to effectively represent linguistic concepts, variables, and rules, becoming a natural model to represent human expert knowledge [
9]. According to [
8], a linguistic value refers to a label for describing the experience that has meaning determined by its degree of the membership function. One of the most fruitful developments of fuzzy set theory is Fuzzy Rule-Base Systems—FRBs [
8]. The Fuzzy Decision Support System (FDSS) was developed to convert knowledge from experts based on fuzzy rules to improve decision making [
6]. For the development of this kind of Decision Support system, the Mamdani-type FIS is widely used [
10,
11]. Fuzzy Decision Support Systems are used in the knowledge field of Medicine [
11,
12,
13,
14,
15].
For these reasons, the main goal of this research work was to create different intelligent fuzzy systems using clusters and dynamic tables for the classification of the Wisconsin breast cancer dataset. To validate the proposed models, the fuzzy inference systems—FIS—were conceived to classify the mentioned dataset and contrasted with other artificial intelligent technique models obtained from the literature. The originality of this work lies in its generation of membership functions. Some authors use different approaches for generation. We can find 2N + 1 regions, FCM, neural networks, GAs, etc. In our case, we proposed using clustering methods for this step. The main difference at this stage is that no fixed or random membership functions were generated, such as those caused by those works that used classical methods or were based on evolutionary algorithms, neural networks, or swarm intelligence techniques. Another difference between this study and the related works using neural networks, evolutionary or swarm algorithms is that we did not use random numbers, or any chromosome or particle scheme. Regarding the generation of the rule base for the system, some authors also used the same previously mentioned methods. The main difference with our work is that our approach uses pivot tables instead of other techniques. Other authors initialize with random weights and bias for each hidden neuron (neural networks), adjusting them through optimization functions such as gradient descendent and non-linear activation functions. Other methods use random schemes to generate the fuzzy rules, using the objectives function to adjust membership functions and the rule base, i.e., MSE. Our study did not propose using any objective function as a minimization problem. In addition, our study did not offer to employ or calculate any distances, attractiveness, or another parameter to generate the fuzzy rule base. The only component used for this task was pivot tables. Pivot tables did not use any calculation method or random or manual parameters (only sorting options). The main job of this technique is to eliminate redundant information.
3. Results and Discussion
The following were the obtained outcomes for the cited dataset:
The confusion matrix for the mentioned data-driven fuzzy clinical decision support system (DDFCDSS) are shown in
Table 1. The performance metrics obtained with our proposed framework are shown in
Table 2. The best results for a set of five features were obtained via the Ward clustering method.
As can be seen, the DDFCDS had a specificity value of 100%, suggesting an outstanding performance predicting or classifying the true negatives cases of the WBCD. It means that all malignant cases were classified correctly. According to the confusion matrix, there are only three true positive values that are misclassified corresponding to a sensitivity value of 0.9877.
In the following pages, we are going to compare the results obtained from the literature with our results. The results shown in the tables below correspond to the same characteristics noted by the researchers using the same dataset (WBCD). We used the same data partition method, the same features.
According to the results, for the WBCD, the greatest performance belongs to Onan [
20]. The author used a classification model based on the fuzzy–rough nearest neighbor algorithm, consistency-based feature selection, and fuzzy–rough instance selection for a medical diagnosis. He used a 10-fold cross-validation method as a data partition method. As can be seen in
Table 3, the classification accuracy for his results was 99.72%, and the maximum value for classification accuracy of our results belongs to the k-means 10-fold cross-validation method. The sensitivity value for the author was 100%; however, his specificity value was 0.9947. Our results show the opposite. Our specificity value was 1.0, and the sensitivity value was 0.9703. The performance metric sensitivity indicates the true positive (TP) rate, and specificity means the true negative (TN) rate [
28]. According to [
28], in breast cancer, the TP signifies cases that are correctly categorized in the benign tumor, and the TN characterizes cases that are correctly categorized in the malignant tumor. This result shows that our model predicts 100% of the true negative values. In this case, we can state that if a tumor is malignant, the fuzzy inference system is going to be classified as malignant with 100% accuracy.
Through the comparison of the three clustering methods results, we found that McNemar’s test indicated that none of them perform significantly better than the others, indicating that all the DDFCDSS have the same classification error rates. The test results were Ward vs. k-means: = 0.0455; k-means vs. FCM: = 0.0, and Ward vs. FCM: = 0.12903, respectively.
Ref. [
29] proposed a Breast Cancer Computer Aid Diagnosis (BC-CAD) based on joint variable selection and a Constructive Deep Neural Network “ConstDeepNet”. A feature variable selection method was applied to decrease the number of inputs used to train a Deep Learning Neural Network. The authors used five-fold cross-validation as a partition data method. The classification accuracy for the set of features mentioned in
Table 4 is 96.2%. Our results were higher than those obtained for these authors. Our classification accuracy using the cross-validation data partition method with k = 5 was 98.37%. For comparison of the three clustering methods, the McNemar’s test results are as follows: K-means vs. Ward:
= 0.3636; k-means vs. FCM:
= 1.8947, and Ward vs. FCM:
= 0.5625, indicating no significant differences between them. For the case of the second set of features used by the authors (
Table 5), the classification accuracy obtained by the constructive deep neural network was 96.6%. Our results for the same set of features were higher than those obtained by the authors. Regarding the McNemar’s test results for the three clustering methods, they indicate that there is no significant difference among them. The test values are k-means vs. Ward:
= 0.3636; k-means vs. FCM:
= 1.8947; Ward vs. FCM:
= 0.5625.
Another author who works with the same dataset was [
30]. The authors introduced an automated medical data classification method using wavelet transformation (WT) and interval type-2 fuzzy logic system (IT2FLS. The authors used five-fold cross-validation as a data partition method. The classification accuracy for this set of features was 97.88% (
Table 6). The best performance of the three clustering methods was obtained for the Ward method, with 96.68% showing a better performance between the models. Regarding the McNemar’s test results, the values were: K-means vs. Ward:
= 14.0192; K-means vs. FCM:
= 0.0294; Ward vs. FCM:
= 12.5000. The values higher than 3.84 can be interpreted as a significant difference. This means that we reject the null hypothesis and accept the alternative hypothesis indicating that the algorithms do not have the same classification error rate. In this case, the DDFCDSS using the k-means and FCM have the same classification error rates.
Ref. [
31] developed a manually Mamdani-type fuzzy inference system (FIS). The authors proposed a framework for the development of fuzzy inference systems using dynamic tables and clusters; however, the framework does not support a data-driven approach. The classification accuracy for the authors was 98.58% (
Table 7), showing a sensitivity of 100%; however, the specificity is lower than our results. The best performance for our DDFCDS was obtained by the k-means method using random sampling as a data partition method. McNemar’s test indicates that k-means vs. FCM has significant difference between them. The test results values are the following: k-means vs. Ward:
= 3.0625; k-means vs. FCM =
= 8.6538, and Ward vs. FCM:
= 2.2273.
The other authors who had better results than our DDFCDSS were Abdel-Zaher and Eldeib [
32]. Ref. [
32] proposed an integration between Wavelet Transformation (WT) and Interval Type-2 Fuzzy Logic Systems (IT2FLS) to cope with both high-dimensional data challenge and uncertainty. The authors used all input variables and used random sampling (70–30%) for data partition. The classification accuracy for this author was 99.68%, with a sensitivity of 100% and a specificity of 0.9947 (
Table 8). Our best performance using the same data partition was the k-means DDFCDSS, with a classification accuracy of 98.86. McNemar’s test showed that the highest performance of the DBN model, which uses nine variables, was significantly better than our Data-Driven Fuzzy CDSS, which has the highest performance. For the comparison among the three clustering methods, the test results suggest that they have no significant difference among them. The values for the test are K-means vs. FCM:
= 2.400; K-means vs. Ward:
= 1.250; Ward vs. FCM:
= 0.
Ref. [
28] proposed a fully connected layer first CNN (FCLF-CNN), in which the fully connected layers are embedded before the first convolutional layer. The authors used two data partition methods for the experiments. The authors used a five-fold cross-validation approach. The obtained results for this scheme are presented in
Table 8. The authors also used two settings for the random sampling (train: 50%, test: 50%, and train: 75%, test: 25%). The results obtained from the random sampling were 98.57% and 98.86%, respectively. As can be seen in
Table 8, our proposed framework could obtain a better performance in the cross-validation method: The Ward method obtained a classification accuracy of 98.84%. Regarding the random sampling method, the k-means obtained the best performance with a classification accuracy of 98.86%, which was similar to the results obtained by [
28] for the same dataset and random sampling configuration.
The main differences and similarities between the mentioned related works with the proposed framework are as follows:
- (1)
Like all the mentioned works, we identified all the input and output variables for the Wisconsin Breast cancer dataset classification problem, including the related works using the same datasets.
- (2)
To generate the membership functions, the mentioned authors used different approaches, including logistic regression, support vector machine, random forest, fuzzy c-means, neural networks (MLP, DNN), K-nearest neighbor, genetic algorithms, etc. In our case, we proposed using clustering methods for this step. Among the clustering methods, we used k-means, the Ward method, and FCM. The main difference at this stage is that no fixed or random membership functions, such as those caused by those works that used classical methods or were based on evolutionary algorithms (GAs, FA, BBO), neural networks, or swarm intelligence techniques (PSO, ACO), were generated. Instead, the users can select the number of membership functions (number of clusters) they want to use for each input/output variable for each classification problem. Another difference between our framework and the related works using neural networks, evolutionary, or swarm algorithms is that we did not use random numbers, chromosomes, or particle schemes. Instead, our membership functions were obtained using well-known and recognized clustering methods. They indicate whether a sample belongs to a group, obtaining a vector with the values of one of the groups assigned to the input/output variable. Thus, the assignation of the number of groups is not random. In addition, we did not use any random population, random particles, random weights, or bias.
- (3)
To generate the system’s rule base, the main difference between our work and the mentioned works is that our approach uses pivot tables instead of other techniques. As mentioned, every method for generating the intelligent systems’ rules or connections has its own characteristics. Some initialize with random weights and bias for each hidden neuron (neural networks), adjusting them through optimization functions such as gradient descendent and non-linear activation functions. Other methods use random schemes to generate the fuzzy rules, using the objectives function to change membership functions and the rule base, i.e., Mean Square Error (MSE). Our proposed framework did not offer the chance to use any objective function as a minimization problem. Additionally, our framework did not suggest using or calculating any distances, attractiveness, or parameter to generate the fuzzy rule base. The only component used for this task was pivot tables. Pivot tables did not use any calculation method or random or manual parameters (only sort options). The primary mission of this technique is to eliminate redundant information.
Our framework’s main advantage is our algorithms’ simplicity using only primitive mathematical operators and clustering operations (Appendixes shown in Reference [
16]). Our framework’s parameters are as follows: (a) To select inputs and outputs variables. (b) To choose the clustering algorithm (k-means, Ward, FCM). (c) To select the number of Membership Functions—MFs (number of clusters)—that the user wants. (d) To adopt the data partition method (random sampling or cross-validation). (e) To select the number of features the user wants to use (feature extraction). (f) To set the parameters according to the selected data partition method. For example, if the user selects random sampling, they must determine the percentages for training, validation, and test datasets, and the number of iterations. Otherwise, the users must choose the cross-validation partition method (‘k’,’KFold’, ‘Holdout’,’LeaveOut’, or ‘Resubstitution’) and the iterations’ number.
As can be read, among the parameters, there is nothing about a lower–upper bound, any random number, any inertia, momentum, distance, weight, bias, or population size to calculate or initialize. This means that the result of each iteration for every combination (
Section 2.9.1. Combining different cluster datasets) is a fuzzy inference system because it is not necessary to adjust or optimize weights, bias, or any objective or fitness function.
It should be noted that the only parameters configured internally were those used for the clustering methods, and they are mentioned in
Section 2.7, Clusters analysis (Fuzzification process). These criteria have low computational requirements, offering precision, processing speed, and interpretability of the rules.
4. Conclusions
The main objective of this research work was to implement and validate different decision support systems founded on Mamdani-type fuzzy set theory using clusters and dynamic tables. As could be demonstrated, in some cases, the proposed fuzzy models showed the best-performing indices related to this dataset, surpassing the outcomes obtained from advanced techniques (deep learning) such as Deep Neural Network and Convolutional Neural Networks. The obtained outcomes for the used performance metrics were nearer to one, indicating a robust fit between the predicted and the observed data. The area under the curve for this dataset ranged between 0.90 and 1.0, representing an excellent classification task [
34]. The selected features shown in
Table 2 for both data partition methods were: Uniformity of Cell Size (UCSi), Marginal Adhesion (MA), Single Epithelial Cell Size (SECS), Bare Nuclei (BN), and Normal Nucleoli (NN), indicating that it is not necessary to carry out the mitosis process accelerating diagnosis and a possible treatment [
16,
31]. According to the McNemar’s test results for the three clustering methods, the k-means have significant difference at 95% of the confidence interval with the FCM clusters method (
= 5.7857), indicating that these two clusters methods have different error rate. For the other two clusters methods, the test evidenced that the clustering methods did not perform significantly differently.
We can conclude that the current framework provides a real pattern for the development of data-driven Mamdani-type fuzzy decision-support systems for classification problems. Another conclusion is the computational performance of the algorithms has homogeneous behavior when running with similar datasets.
Other main future work aims to implement this in other software development platform such as python, Scilab, and Octave, among others.