1. Introduction
Software quality is a measure of how well a software product meets the needs and expectations of its users. High-quality software is free from defects and errors that could affect its functionality or performance. It is more reliable and less likely to fail or cause errors, and easier to maintain and update, reducing the risk of bugs over time.
One important issue in software engineering is to find code smells in software before its delivery. In fact, defect prediction is a technique to control the schedule and cost of a software system. Detecting and fixing OOP defects early in the development cycle can save time and money by reducing the need for expensive rework or corrective action later in the project [
1].
Code smells [
2,
3,
4] refer to any symptom in the object-oriented program that possibly hinders software maintenance and evolution. The textual description of defects or code smells is very subjective and depends on the designer/programmer interpretation. As an example, we can present the defect
Feature Envy (i.e., this happens when a method is defined in a wrong class based on the class attributes and other method invocation). Depending on a subjective interpretation, each designer could decide in a different way which methods are candidates to present a
Feature Envy defect. In fact, the selection of those methods is based on information such as “The number of communications with a given class“. Depending on the context, the same value could be evaluated as high, medium, or even low.
One approach that has gained popularity for detecting design defects is to use the object-oriented metrics. In [
5], the authors presented an overview of 3295 papers extracted from the most popular electronic databases related to the UML model-refactoring field. Seventeen percent of the studies used OO metrics and rules-based metrics to detect design defects. Unfortunately, a single metric may not be sufficient to capture all aspects of the system’s quality, and it may lead to incomplete or inaccurate assessments of the system’s overall quality, but a combination of metrics is a powerful heuristic that can identify and standardize the way of defining and detecting code smells. The majority of works in literature focuses on combining metrics, generating a set of detection rules; they use standard object-oriented metrics as well as metrics defined in an ad hoc way [
6,
7,
8]. The accuracy of rule-based metrics is directly affected by the selected metrics. Defining object-oriented rule-based metrics can be challenging. Not only does it require a thorough understanding of OOP concepts and their relationship with software quality attributes, but it also depends on the selection of rules with the best metrics as well as the best thresholds for each metric. The complexity of this combinatory problem is significantly reduced when the number of metrics involved for the detection rule decreases.
In fact, there is no standard set of object-oriented rule-based metrics that everyone agrees upon. This can make it difficult to compare software systems using different metrics. The problem of the metrics selection is, by excellence, a Multi-Criteria Decision-Making (MCDM) problem if we consider the huge number of possibilities and combinations between the metrics and thresholds. Researchers are facing a challenge in the selection of metrics and the definition of the respective thresholds. They have the challenge to find the most relevant set of rules-based metrics.
In this paper, we propose to use the fuzzy decision-making trial and evaluation laboratory (DEMATEL) method to determine the most influential criteria and to find out the ranking of those criteria [
9,
10,
11]. The goal of this work is to reduce the number of metrics combinations leading to rules that are more accurate. This paper answers the question: “What is the best set of rules that can detect a specific defect?” Only rules with the most relevant metrics are considered. The results are validated on previous published work [
12], and show an improvement in the detection process of four design defects (i.e., The Blob, Feature Envy, Lazy Class, and Data Class defect) based on sixteen object-oriented metrics.
The paper is structured as follows.
Section 2 defines the object-oriented metrics.
Section 3 presents an overview of the Fuzzy DEMATEL method.
Section 4 details the findings.
Section 5 is dedicated to the validation and
Section 6 concludes the paper.
3. Fuzzy DEMATEL for Object-Oriented Metrics
The DEMATEL technique is an initiative of the Battelle Memorial Institute through the Geneva Research Centre [
19]. It is a comprehensive method for illustrating the structure of complicated cause-and-effect relationships. It aims to find the critical attributes through a visual structural model [
20].
We use the DEMATEL method to identify the most significant metrics that affect other metrics. It also converts the relationship between cause-and-effect metrics into a structural model. It identifies the set of cause metrics and effect metrics.
Using DEMATEL, we aim to decrease the number of metrics needed to measure the defects, which leads to improving the effectiveness of the defect detection rules using the digraph map.
3.1. Research Methodology
This research is based on Fuzzy DEMATEL, because of its suitability for evaluating expert answers. In fact, evaluating the importance of a metric is very subjective depending on each expert experience. Thus, it cannot be assessed by crisp values. The concept of fuzzy sets [
21] combined with DEMATEL methods perfectly handles the vagueness of expert answers. To deal with this imprecise decision-making problem, we adopt the triangular fuzzy number identified by Akyuz and Celik [
22]. This fuzzy representation is defined by a set of values (Lower, Medium, Upper). The influence of each metric is measured over a five-level fuzzy scale, as shown in
Table 2.
After the selection of the set of metrics and the experts in the object-oriented programming field, the experts evaluate the effect between metrics using a pairwise comparison, and we start the process of generating and normalizing the Fuzzy Direct-Relation Matrix (FDRM) as an aggregation of all expert matrices. Then, we calculate the total-relation fuzzy matrix, and we obtain as the final step the classification of metrics based on their importance and influence. Finally, we validate our findings from our previous work [
12], by refining the set of the rules-based metrics identified in [
12] regarding the metrics importance and influence found using Fuzzy DEMATEL. We present in
Figure 1 the main steps to apply Fuzzy DEMATEL.
3.2. Fuzzy Direct-Relation Matrix
Each expert generates a direct-relation matrix with a pairwise comparison M
e, where
e represents the expert,
M is a (
n ×
n) non-negative matrix, and
M(e, i, j) represents the direct impact of the Metric
i on the Metric
j for the expert
e. When
i =
j, the diagonal elements
M(e, i, j) = 0. In
Table 3, we present an example of the linguistic scores of one expert evaluation for the Blob anti-pattern.
Based on
Table 2, the fuzzy linguistic matrix is then converted into a fuzzy scaled direct-relation matrix
M.
Table 4 presents the aggregated fuzzy direct-relation matrix collected from the different experts’ judgments.
3.3. Normalized Fuzzy Direct-Relation Matrix
The first step is the defuzzification of the normalized fuzzy direct-relation matrix that is based on the Best Non-fuzzy Performance (
BPN) method [
23]. It is a technique used to generate crisp values from fuzzy values. The BNP of a triangular fuzzy number N (
Lower,
Medium,
Upper) can be expressed as:
Table 5 presents the crisp direct-relation matrix (CM). Using the Formula (2), we transformed the CM into a normalized direct-relation matrix (NMR) as shown in
Table 6. Considering the initial
matrix (CM),
aij is denoted as the degree to which the criterion
i affects the criterion
j.
3.4. Total-Relation Fuzzy Matrix
At this level, we generate the total-relation matrix (
TRM) as shown in
Table 7, using the Formula (3).
where
I is denoted as the identity matrix.
3.5. Metrics of Cause and Effect Matrix
As presented in
Table 8, the value of (D + R) represents the importance of the metric in the rule detection. The higher the value of (D + R), the more important the metric. Therefore, it should be included in the rule generation process. The value of (D − R) classifies the metrics into cause-and-effect metrics. D represents the sum of the rows and R represents the sum of the columns.
Using the Formulae (4) and (5), let TMR = (
TMRij),
i, j ∈{1,2,...,n).
In
Figure 2, we present the causal diagram. The horizontal axis represents (D + R) and the vertical axis represents (D − R). If we consider the values (D + R), it appears that some metrics are more important than others. We can split the set of metrics into three main groups, labeled Low Importance, Important, and High importance.
4. Results of Fuzzy DEMATEL Method for Object-Oriented Metrics
Table 9 presents the groups identified in the metrics causal diagram. We classified the set of metrics into cause metrics and effect metrics. In each group, metrics have importance levels (Low, Normal, and High). For example, in the cause group, the highest metrics are NIC and CM, and for the effect group, the highest metrics are ATFD, NOM NOC, and NCC.
Cause metrics indicate the implication of the influencing metrics on the effect metrics. Considering the interdependence among the metrics, the detection rules should consider both the cause metrics and the related influence on the effect metrics [
24]. Therefore, by selecting rules combining cause and effect metrics, we can improve the accuracy of design defect detection, giving priority to the metrics having the highest importance.
Based on the DEMATEL method, we classified the metrics into two orthogonal dimensions. The horizontal dimension represents the metrics importance and the vertical dimension represents the cause and effect metrics. Now, we can limit the detection rule; for example, we can exclude rules with low-importance metrics from the detection process. Based on
Table 9, it becomes clear that detection rules should contain as many metrics as possible from the “important” and “highly important” horizontal dimension. These rules should also combine metrics from the vertical dimension (i.e., both cause and effect metrics). Based on this finding, an excellent detection rule combines metrics such as NIC or/and CM with metrics such as ATFD or/and NOM or/and NOC or/and NCC.
The following section explores in depth the impact of the above finding on the accuracy of design defect detection.
5. Validation
The findings presented in the previous section are very important in the selection of metrics. In order to validate our results, we refer to our previous study in [
12], where we used the decision tree algorithm to generate defect rules. Applying our finding to our previous work by refining the set of rules-based metrics and comparing the results with the results in [
12] shows how could Fuzzy DEMATEL improves the process of identifying the best set of rules-based metric.
5.1. Reference Study
The first study we conducted [
12] represents the reference study to validate our findings. We experimented in four design defects: The Blob, Data Class (DC), Lazy Class (LC), and Feature Envy (FE) defects, using 15 object-oriented metrics:
The Blob anti-pattern or God class [
25] corresponds to a large controller class that depends on data stored in other classes. This is typically the case for large classes declaring many fields and methods and resulting in a low cohesion.
A Data Class bad smell [
24] corresponds to a class that stores data passively. This class contains data and no methods to operate on that data.
A Lazy Class bad smell corresponds to a class that is not doing enough to pay for itself. There is no need for additional classes that could increase the project complexity.
A Feature Envy bad smell corresponds to a method that uses another class excessively. It should belong to that class.
We tested five open-source projects: Xerces v2.7, ArgoUML 0.19.8, Lucene 1.4, Log4j 1.2.1, and GanttProject v1.10.2.
Table 10 summarizes the characteristics of the five projects.
5.2. Validation Methodology
In [
12], the main objective of the study was to extract rules based on the decision tree algorithm. The rules are of the form:
For a defect D: “IF metric1 is higher/lower than threshold1 AND metric2 higher/lower than threshold2…. AND metricn higher/lower than thresholdn THEN defect D is suspected”. As an example, we present three rules, R1, R2, and R3, generated for the detection of the Data Class defect:
R1: IF ATFD <= 16.5 THEN Data class = Yes
R2: IF ATFD > 16.5 AND NOA > 11.25 THEN Data class = Yes
R3: IF ATFD > 16.5 AND NOA <= 11.25 AND NC > 251 THEN Data class = Yes.
R1 and R2 combine a few numbers of metrics resulting in the generation of a huge number of suspect classes. Therefore, R3 is the most appropriate rule to be considered as an illustrative example. In [
12], we considered that the number of metrics is the only criteria that matters for the rule selection. In fact, the process of selecting rules was based on a parameter N fixed by the tester. This parameter is estimated through a series of tests on the base of examples. It represents the number of metrics in rule detection. The value of N directly affects the accuracy of the detection. A small value will generate a high number of false positives due to over-detection; we will detect more than the real existent defects. However, we get the opposite result for a high N value; it will generate a high number of false negatives and we will detect a very small number of defects compared to the existing one.
Based on Fuzzy DEMTEL and the results shown in
Table 9, we follow an alternative approach. For instance, R1 contains only one metric from the effect group, so it cannot be considered as an important rule. The decision of not considering the rule R1 as important is not based on the number of metrics but on the fact that rule should combine both cause and effect metrics. For the rules R2 and R3, they combine cause and effect important and highly important metrics. As matter of fact, the rules R2 and R3 are important rules that DEMTEL suggests to include in the evaluation process.
This experiment is based on the same set of rules generated by the decision tree algorithm proposed in [
12]. In this paper, we select rules based on the metrics importance presented in
Table 9, instead of selecting rules based on the N parameter as presented in [
12].
The selected rules for defect detection give priority to the rules that first contain both cause and effect metrics. We start by selecting rules including first the important and high important metrics. If we get a small set of rules, we include those containing only effect or cause metrics, but we choose rules with important and high important metrics.
We use the precision and recall measurements to validate our findings. To evaluate the correctness of the approach, we calculate the precision. It represents the fraction of the true design defects among the set of all detected defects (6). The precision measures the number of true identified defects. The fraction of correctly detected design defects over the set of expected defects is the recall (7).
5.3. Fuzzy DEMATEL for Object-Oriented Metrics Results Discussion and Validation
The first step in the validation process of the Fuzzy DEMATEL approach is to select the set of rules for each defect. In fact, we use the same rules generated in the reference study [
12], based on the projects PMD 5.4.3 (with 433 classes) and Nutch 1.12 (with 247 classes). However, we introduce two formulae to compare the method of selecting the rules used in the reference study (i.e., based on the parameter N) and the new proposed one (i.e., based on Fuzzy DEMATEL).
The first, Formula (8), represents the ratio of the number of rules in the new selected set divided by the number of rules in the original set.
The second, Formula (9), represents the similarity between the two sets of rules: the set of rules identified using the parameter N and the set of rules identified using the Fuzzy DEMATEL approach.
where:
R0 represents the original set of rules selected based on the parameter N;
R1 represents the set of rules selected based on metrics importance (Fuzzy DEMATEL).
As presented in
Table 11, based on Fuzzy DEMATEL, we reduced about a half the number of rules generated in [
12]. The degree of similarity varies depending on the defects. For example, the DC defect detection is based on 0.462 rules that are similar to the rules used in [
12], it means that we selected over than 50% of new rules comparing to [
12]. For LC, the degree of similarity is 0.833, which means that we almost use the same rules as those selected based on the N parameter, but we reduced the number of rules, eliminating the non-useful rules.
In
Table 12, we present the detailed detection results by project, and of each defect. In fact, we significantly increased the precision and recall. The F1 score is higher when using rules selected based on the metrics cause and effects. We can identify clearly through
Figure 3 that the accuracy of the detection is improved using rules selected based on metrics importance.
However, we notice that, only for the LC defect, the improvement in the accuracy was due to the improvement in the precision. In fact, in
Figure 3, there is no big difference in the recall curve for both selection methods. This is normal; as we can see in
Table 11, the rules similarity is very high. It means that we almost used the same rule, which implicates a similar detection rate. However, the ratio R0/R1 shows that we used approximatively only 0.6% of the rules compared to the set of rules selected based on the N parameter. The selection of rules based on their importance reduced the set of rules by 0.4%. This has a direct impact on the precision; in fact, we use less rules, minimizing over-detection, and we decrease the number of false positive detections.
6. Conclusions
Object-oriented metrics offer quantitative measures of object-oriented software. It can be a valuable tool for assessing its quality. By using these metrics, developers can identify areas for improvement, and refactoring opportunities to optimize the software’s performance, maintainability, and other important factors. However, developers need to carefully choose the right metrics, combine them in rules, and apply the rules consistently to identify code defects. This is challenging task due to the number of metrics and the multiple thresholds for each one.
In this paper, we propose to apply a fuzzy multi-criteria decision-making approach, Fuzzy DEMATEL, to identify and select the most important object-oriented metrics that can enable the detection of design defects. Compared to the findings of our previous study in [
12], the results of the current work show that the new set of rules selected based on Fuzzy DEMATEL improves the defect detection accuracy. We are convinced that the metrics importance identified in this work is useful for the entire community of researchers. In fact, generating rules based on only important and highly important metrics reduces the number of metrics combinations and consequently the number of rules, and improves the design defect detection process. This work represents the first step for software refactoring. In future work, we will go a step further; after detecting the defects, we have to correct them based on a set of refactoring rules.