1. Introduction
Formal concept analysis (FCA) was introduced in the 1980s by Rudolf Wille [
1]. Since the inception of this branch of mathematics, it has been established as a solid framework to manage, extract, and reason with knowledge in databases. As a matter of fact, FCA-based methods can be used instead of some state-of-the-art knowledge representation methods, such as building ontologies; the results can be compared to AI-based systems [
2], but FCA provides a sound algebraic structure. Currently, the focus on FCA is two-fold. Some authors are devoted to the efficiency of algorithms, since they are inherently exponential [
3,
4]; however, other groups of authors are devoted to the extension of FCA to broader frameworks, such as versatility in the incidence relation such as in many-valued or fuzzy approaches [
5] or the philosophy behind concept-forming operators, such as necessity and possibility operators in FCA [
6]. Given a formal context, there are two main knowledge structures that give a hierarchical biclustering of the knowledge within a database, namely the concept lattice and the basis of valid implications. These two structures, while different, are semantically equivalent; that is, all the information of the formal context can be retrieved from them. In this paper, the focus will be on the latter. Implications in FCA are deeply related to association rules, which are extensively used in several research areas such as machine learning, data mining, and rough set theories, but they have a richer algebraic background. The knowledge in a basis of implications can be used to reveal hidden patterns and knowledge, which is useful to solve problems in currently relevant areas such as e-learning assessment [
7,
8] or recommender systems [
9,
10].
According to the first works on FCA, the incidence relation between a set of objects and a set of attributes establishes the properties satisfied by each object; this is called positive information. Nevertheless, sometimes an object not satisfying a property is also relevant. For instance, consider a table whose objects are sea animals. The object “dolphin” does not have the property “fish”. That is, for “dolphin”, not only is the positive information “large” and “fast” relevant, but the negative information “is not a fish” is also relevant. The references [
11,
12] have already worked on managing negative attributes in association rules. In FCA, implications are exact association rules. Therefore, in this work, the main framework will be that of FCA. The first references of managing negative information in FCA appeared in the works by Missaoui et al. [
13,
14], in which the implications allowed for positive and negative information. In these first steps in this research, the implications were obtained from a double context formed by the original formal context juxtaposed with its opposite. Even though the results obtained by these authors were promising, the methods were highly redundant. Furthermore, the relationship between the positive and negative information was neglected. This approach was optimised by the algebraic proposal by Rodriguez-Jimenez et al. [
15], which generalised FCA via a new set of concept-forming operators and a Galois connection. In the cited paper, several algorithms were proposed to obtain the concept lattice in this setting; furthermore, an axiomatic system based on the simplification paradigm was proposed to manage mixed implications [
16]. Hereinafter, the FCA setting that allows for positive and negative information will be called the mixed-attributes framework.
In different research fields, such as AI, machine learning, database theory, and data mining, the huge degree of redundancy in the rules extracted from a dataset is a very common problem. It was the focus of Pérez-Gamez et al. [
17] to study redundancy elimination in implications with mixed attributes. This work is a natural sequel of the cited reference. In this paper, the automated logic-based method developed, provided in the reference, is updated with new rules in order to remove even more redundancy in a set of mixed implications. In particular, new simplified mixed implicational systems and axiomatic systems equivalent to those given in several equivalence rules are proposed in the framework of FCA with mixed attributes.
In this paper, the automated logic-based method to remove redundancy in a set of mixed implications provided in the reference is finetuned in terms of implementation. As will be shown in the Results Section, the set of implications obtained after applying the simplification algorithm presented in this paper reduced its size more than that obtained after applying the two algorithms provided in [
17]. Depending on the application in mind for this system, the cardinality of the set of implications may also be relevant. The results of this paper show that the cardinality of the set of implications was indeed more reduced with the set of rules provided herein than with the ones in the cited reference. Nevertheless, this more effective pruning has the setback of being more time-consuming. In the applications of FCA, the retrieval of concise sets of implications is usually an offline procedure; hence, it is only performed once. Whenever the experiments are time-sensitive, simplification rules can be applied a fixed number of iterations, instead of following through with the whole simplification. This method has been proven to reduce more than the sets of rules in [
17] even in one or two iterations. The study of the application of simplification only one and two times was performed independently in order to compare it to the methods in the cited reference and the total simplification procedure. As expected, one or two iterations of the simplification algorithm reduced less than the whole procedure but were consistently better than the whole simplification algorithm in the cited paper. Timewise, one or two iterations of the simplification took significantly less time than the complete simplification and even less than the second version of the simplification shown in the cited reference.
This paper is structured as follows:
Section 2 recalls the most relevant notions necessary to be able to follow this work; these include the fundamentals of FCA and implicational systems with mixed attributes. In
Section 3, a simplified mixed implicational system is presented. To perform this, a set of logical equivalences between sets of implications are presented. In addition, the simplification rules are implemented as simplification algorithms, which are proved to be sound. In
Section 4, the experimental evaluation of the results is shown. As mentioned above, the experimentation aims to compare the performance of the algorithm proposed in terms of size reduction and cardinality reduction and the time-consumption of a given set of implications. In light of some time-sensitive experiments, the algorithm might be restricted to only one or two iterations. The last part of the experimentation shows the results of applying the proposed algorithm restricted to one or two iterations. This experimental evaluation is accompanied by a detailed explanation of the data and implementation and an overall discussion of the results obtained.
Section 5 concludes the paper with some conclusions and some hints about future work. For the sake of readability, the equivalence rules obtained in [
17] are collected in
Appendix A.
2. Preliminaries
In this section, the results necessary to follow this work are presented, starting from FCA with mixed attributes, the axioms and inference rules of the underlying logic, and the latest results on equivalence rules. First of all, the main structure in this work is that of a formal context [
1].
Definition 1. A formal context, , is a tuple where G is a set of objects, M is a set of attributes, and is the so-called incidence relation.
There are two main schools in this setting; one understands that
, that is,
, means the absence of information on whether the object
g has the attribute
m or not. The other uses
as the object
g does not have the attribute
m. The latter approach could be called that of
Close-World Environments, where all the information is known and the objects either have attributes or not. This is the line considered in this work. Several authors have devoted their research to this situation; nevertheless, the approach followed in this paper will be that of Rodríguez-Jiménez et al. [
18]. Therefore, apart from the set of attributes
M, the opposite set will also be considered:
In this setting, the derivation operators relate the subsets of the objects
to the subsets of
and are two mappings,
⇑ and
⇓, which are defined as follows:
Throughout this text, following the convention in this area of research, the union of attributes and sets of attributes will be denoted by the juxtaposition; thus, will be denoted by , and for , will be denoted by , and any will be denoted by , thus omitting the brackets, the commas, and the union in order to simplify the notation.
Indeed, as expected, the pair of mappings defined above form a Galois connection [
18] and formal concepts can be defined from it.
Definition 2. Given a formal context, , a pair, , is said to be a mixed formal concept if and .
As usual, the set of mixed formal concepts can be endowed with an order relation that makes it a complete lattice, which is called the mixed concept lattice.
Definition 3. Given a mixed formal context, , and two subsets of attributes, , an implication is an expression of the form . The set of all the implications in a formal context is denoted by . An implication, , is said to be valid in the mixed formal context if .
Mixed-attribute logic (
MixAtL) was presented and proved to be sound and complete by [
18]. This logic consists of two axioms and four inference rules, for all
.
- [Inc]
Inclusion: .
- [Cont]
Contradiction: .
- [Augm]
Augmentation: .
- [Trans]
Transitivity: .
- [Rft]
Reflection: .
- [Tru]
Trust: .
Definition 4. An implication, , is said to be derived syntactically from a set of implications, Σ, as is denoted by if there exists a sequence of implications, , with . For all , is either an axiom, an implication in Σ, or can be obtained by applying inference rules to the set of implications . In such a case, the sequence is said to be a proof of .
In this paper, the focus will be on inference rules that allow the diminishing of the size of sets of implications. In this sense, the approach followed is similar to that of simplification logic (
) [
19]. This area of FCA was introduced with the aim of reducing redundancy inherent in sets of valid implications in a formal context. The axiomatic system for
in the classical setting consists of one axiom, namely
Reflexivity, and three inference rules,
Fragmentation,
Composition, and
Simplification. These rules are described as follows, for all
.
- [Ref]
Reflexivity: .
- [Frag]
Fragmentation: .
- [Comp]
Composition:
- [Simp]
Simplification: .
In the above, , defined by , is the set difference operator.
The notion of inference in is defined analogously, as in Definition 4. The distinction of the set of inference rules used is denoted by . Some rules from simplification logic which we will use in this paper are recalled below. Due to the definition of derivation, the proof is straightforward and therefore omitted. Given a set of attributes, M, the following inference rules hold for all .
- [GenRef]
Generalised reflexivity: if .
- [Augm]
Augmentation: .
In simplification logic, some rules are, as a matter of fact, logical equivalences. The main equivalence rules that we have in
are the following [
19]:
- [FragEq]
.
- [UnEq]
.
- [GenEq]
when .
- [⌀-Eq]
.
- [SimpEq]
when and .
- [RSimpEq]
if and .
The inference rules and equivalences in simplification logic given so far are the ones used in the classical approach of FCA. Even though these rules are still valid in the mixed-attributes case, there are some rules that are specific to the mixed-attributes case.
In [
18,
20], the axiomatic system of simplification logic was extended to the mixed-attributes setting. This axiomatic system contains one axiom schema and four inference rules, for all
,
.
- [Ref]
Reflexivity: .
- [Simp]
Simplification: .
- [Key]
Key: .
- [InKey]
Inverse key: .
- [Red]
Reduction: .
This axiomatic system is a proper extension of that of simplification logic; that is, all the rules in the classical
can be derived from the axioms above [
20]. The set of derived rules can be further extended by a version of the
ex-contradictione quodlibet and the
contraposition rule, for
.
- [Cont]
Contradiction: .
- [Rft]
Reflection: .
Notice that [Key] is in fact the converse of [InKey], and they provide an equivalence between the implications and , which reflects the fact that whenever the set is inconsistent, should then hold. Moreover, a version of the well-known cut rule arises as the equivalence between and the set .
- [Ref]
Reflexivity: .
- [Simp]
Simplification: .
- [Key′]
Key: if and .
- [InKey′]
Inverse key: if there exists and .
- [Red′]
Reduction: if and there exists with .
Recently, authors have still been finding new inference rules and logical equivalences that help reduce the size of implications in the mixed-attributes framework [
17]. The following result shows some of the most recent equivalence rules so far.
Theorem 1 ([
17]).
Let be a formal context. Then, the logical equivalences in Appendix A hold. The current proposal builds on these results to provide new mechanisms for the construction of more simplified implication systems. Let us recall that, in most previous works, such as [
15,
18,
20], the negative paradigm for attributes is not exploited in this sense; a system of mixed implications has always been considered “as is”, i.e., there were no methods to operate on it and provide deeper insight into the implicit knowledge it represents. Note that in [
17] the first idea about simplifying systems of mixed implications was presented, together with the computational implementation of the methods proposed therein. However, the optimal representation of
mixed knowledge in the form of implication systems is an open problem and the current work gives us a further step in that direction.
3. Simplification of Mixed Implications
In this section, a collection of logical equivalences between sets of attribute implications are presented within the simplification logic framework. These rules allow us to represent implicit knowledge in a formal context in a concise manner. An implicational system with a huge size might enclose relevant information which cannot be interpreted by a reader or would take a significant amount of time to process using a computer. Thus, the simplification of its representation without a loss of information may help both experts and the processing time in order to extract the knowledge in a more efficient way. The following example will serve as a running illustration of the simplification rules throughout this section; it shows how, from a formal context, the sets of implications obtained with state-of-the-art methods can still be redundant.
Example 1. Let us consider the formal context of Table 1. In the table, only the positive
context is presented. The mixed context can be easily built from it. The following system of implications is obtained by the means of the NextClosure algorithm [21] and it is indeed a (sound and complete) basis of all the implications valid in the mixed formal context. However, we can notice a certain amount of redundancy in the list.
In the remainder of this section, this implicational system will be used to demonstrate the use of the equivalence rules.
3.1. Equivalence Rules for Implication Simplification
Without the loss of generality, we require that all implications are consistent (that is, ) and reduced (). The first results about equivalence rules in this section continue the philosophy of simplification logic, extending the equivalence rules of to the scenario with mixed attributes. In particular, we will focus on different strategies to extend [SimpEq], [RSimpEq], and [GenEq]. We begin by presenting a technical lemma:
Lemma 1. Let be a formal context, and such that there exists , with . Then, Proof. Let us consider
under the assumptions of the lemma and suppose that
. Then,
(1) | | (by [GenRef] on our premises); |
(2) | | (by [GenRef]); |
(3) | | (by [Augm] on (1) and (2)); |
(4) | | (by premise); |
(5) | | (by [Augm] on (2) and (4)); |
(6) | | (by [Cont]); |
(7) | | (by [Simp] on (5) and (6)); |
(8) | | (by [InKey] on (7)); |
(9) | | (by [Simp] on (3) and (8)). |
□
The next result uses [SimpEq] as a basis to build an equivalence rule.
Theorem 2. Let be a formal context and let .
- [MixSimpEq]
If there exists , with . Then,
Proof. Let us consider , as above. Note that, in particular, Lemma 1 can be applied; hence, from , we can infer that .
- ⇒
It suffices to show that from , we can infer that :
(1) | | (premise); |
(2) | | (by Lemma 1); |
(3) | | (by [GenRef]); |
(4) | | (by [Augm] in (2) and (3)); |
(5) | | (by [Simp] in (4) and (1)). |
- ⇐
Now, we show that can be deduced from .
(1) | | (premise); |
(2) | | (by [GenRef]); |
(3) | | (by [Simp] in (2) and (1)). |
□
Example 2. Let us consider, in our running example, the following implications: We can apply the previous result, with and , since . Then, we can simplify as follows:and . The following lemma is a technical result that will be useful to prove the rest of the theoretical results in this section.
Lemma 2. Given that and , : Proof. It suffices to apply
[UnEq] and
[RftEq] subsequently:
□
By the virtue of Lemma 2, the following equivalence rules, which are extensions of [SimpEq] and [RSimpEq] in the case of mixed attributes, can be proven to hold.
Theorem 3. Let be a formal context and .
- [MixSimpEq′]
If there exist , , and such that , then - [MixSimpEq″]
If there exist and such that , then - [MixRSimpEq]
If there exist and such that , then
Proof. [MixSimpEq′] Let us consider the following chain:
where, in the second step, we use
[SimpEq] for
and
to produce
, which can be removed by the application of
[⌀-Eq].
- [MixSimpEq″]
Let us use the following chain of equivalences to prove the statement:
Note that, since is consistent and reduced and , we are able to apply [SimpEq] correctly above.
- [MixRSimpEq]
The proof for this last part is completely analogous just by noticing that, instead of [SimpEq], the conditions to apply [RSimpEq]are also met.
□
The use of these new rules can be exemplified with the running example of the planets dataset. This is performed in the example below.
Example 3. For
[MixSimpEq′], consider the implicationsBy selecting , , and , we can verify thatThen, we can apply[MixSimpEq′]and simplify the right-hand side of as . In this case, ; therefore, by using[⌀-Eq], we can remove from the implicational system without affecting the associated closure system. Now, for[MixSimpEq″], considerBy considering and , we obtainTherefore, we can apply[MixSimpEq″]and subtract from C and D, obtaining the new implication that substitutes the previous , and the implicational system is completely equivalent. Finally, considerwith and . In this case,Since this is the condition in which to apply[MixRSimpEq], we substitute with to obtain an equivalent system. But, , and by[⌀-Eq] we can remove this implication. In conclusion, . The next result presents a strategy to extend the [GenEq] rule to mixed attributes.
Theorem 4. Let be a formal context and . If there exist and such that , then Proof. It suffices to note that
since
[GenEq] can applied above, as
. □
Corollary 1. Let be a formal context and .
- [MixGenEq]
Let and . If , then
Proof. Suppose that
and
. Let us denote
. Note that, by Theorem 4, for all
i, we have
by the application of
[⌀-Eq]. Then, it is easy to see that
. Therefore,
which finishes the proof. □
The following example illustrates the use of the last results in the running dataset of the planets.
Example 4. Considerand and . Sincewe can apply [MixGenEq] to obtain , where Note that, in practice, for the same pair, , there might be several equivalence rules applicable, and they might give the same resulting implication set. However, as it can be easily seen from the theoretical description, in general, they are not equivalent and one should try to apply as many as possible in order to obtain the best simplification.
3.2. Simplification Algorithm
In this section, we present the algorithms (Algorithms 1–4) corresponding to the equivalence rules stated in the theoretical section above. Note that Algorithm 3 includes both [MixSimpEq″] and [MixRSimpEq], since their application conditions are similar and can be nested inside a single algorithm.
Our aim is to incorporate these algorithms into the main
Simplify-Mixed algorithm described in [
17]. The following technical lemma aids in proving that the algorithms converge.
Lemma 3. Let be a formal context and . An application of any of Algorithms 1–4 to the pair , if both implications are consistent and reduced, either keeps the implications unchanged or reduces the number of attributes in them.
Algorithm 1:
Simplify-[MixSimpEq] |
|
Algorithm 2:
Simplify-[MixSimpEq′] |
|
Algorithm 3:
Simplify-[MixRSimpEq] |
|
Algorithm 4:
Simplify-[MixGenEq] |
|
Proof. For Algorithms 1–3, it is clear that if the implications are modified, this is achieved by removing attributes on the left-hand or the right-hand sides, as can be observed in the theoretical description of [MixSimpEq], [MixSimpEq′], [MixSimpEq″], and [MixRSimpEq].
The description of [MixGenEq]does not guarantee that the number of attributes is reduced when the equivalence rule is applied since B is transformed into . For this reason, in Algorithm 4, we include a condition in lines 5 and 12 that forces the application of the rule only when (that is, when there is a reduction in the size as the number of attributes removed from D is greater than 1) or when , since in this case, the implication is removed by [⌀-Eq]. In either case, the application of the algorithm keeps the original implications or the number of attributes is reduced, thus finishing the proof. □
Theorem 5. Let us consider the incorporation of Algorithms 1–4 into the Simplify-Mixed algorithm described in [17]. For an implicational system, Σ, the new algorithm finishes after simplification steps, where denotes the aggregate number of attributes in all the implications and is the number of implications in the system. Proof. By Lemma 3, the algorithm removes at least one attribute if one equivalence rule is applied. Since the number of attributes and implications is finite, the algorithm will reach a fixed point and terminate. Note that, in any iteration of the algorithm, every pair of implications is checked to perform a simplification if they meet the necessary conditions. Therefore, there will be checks in each iteration. There will be at most iterations since the worst case is to remove just one attribute from the entire implicational system in every iteration. Thus, the worst-case scenario involves an aggregate of checks. □
Note that this extension of the
Simplify-Mixed algorithm from [
17] is intended to perform more exhaustive simplifications for the implications, maintaining its ability to obtain sm-implicational systems.
Example 5. We go back to our running example. First, we show the sm-implicational system obtained by the application of the basic algorithms in [17], corresponding to the equivalence rules [ContEq], [ContEq′] and [RedEq] :whose size is 36. Note the level of redundancy, such as by examining implications 3, 5, and 6. Then, by using the optimised rules in [17], that is, the full Simplify-Mixed algorithm, we can obtain a further simplified system as follows:Here, the size is 18 and the number of implications is reduced to 6. Now, the application of the equivalence rules in this paper (together with the previous ones) leads us to obtain the following sm-implicational system:with a size of 12 and where the number of implications is further reduced to 5. All these three implicational systems are equivalent, but it is clear that the last one presents the implicit core knowledge in a more concise manner. Example 6. Now, let us present a comparative example of our proposal with respect to the work in [20]. Particularly, in that work (see Example 2 in the referenced paper), a formal context, taken from [21], representing 130 developing countries as objects and six supranational associations of countries (group of 77, non-aligned; LDC—Least Developed Countries, MSAC—Most Seriously Affected Countries, OPEC—Organization of Petrol-Exporting Countries, and ACP—African, Caribean, and Pacific Countries) as attributes, is used to demonstrate its ability to find a concise system of mixed implications representing the implicit knowledge in the dataset. The implication system found by their method is An aggregate of 24 attributes is present in these implications. The application of the Simplify-Mixed algorithm from [17] leads to the following set of simplified implications:with an aggregate of 23 attributes. Thus, almost no reduction in size is achieved using the previous equivalence rules, and the cardinality remains the same. Finally, the application of our proposal provides the following set of mixed implications:with only 18 attributes in five implications. Therefore, these new equivalence rules are able to further refine the implicational representation of the implicit knowledge present in the dataset, outperforming previous approaches. 4. Experimental Evaluation
This section describes the experiments carried out to test the effectiveness of the proposed algorithm in relation to those previously published in [
17].
4.1. Data and Implementation
For this experimental section, different sets of mixed implications were constructed, to which various redundancy elimination schemes were applied, including both classical [
18] and more recent ones [
17], in addition to the algorithms proposed in the previous section, Algorithms 1–4.
It is important to note that the generation of datasets with mixed implications requires a specific approach. At the time of writing, there is no algorithm to obtain a basis of implications, that is, a non-redundant, sound, and complete [
18] set of implications from a mixed formal context using the Galois connection and the simplification logic mentioned in
Section 2.
Hence, the construction of datasets needs to be conducted using a different approach. One common strategy is to randomly introduce attributes with positive or negative values into the premises and conclusions. However, this approach often leads to the creation of contradictory implications, where the same attribute appears with both values. Therefore, it is essential to carefully consider this issue and define a coherent strategy for creating implications.
In order to more accurately reflect the potential outcomes that may arise in real-world scenarios, it has been proposed to create random formal contexts, denoted by the set
, and to construct the juxtaposed context, denoted by the set
. The context
is defined as the context resulting from changing the sign of the value in the relation
I from a positive value to a negative value or vice versa. From this combined context, a system of implications is calculated using the
NextClosure algorithm, as described in [
21]. The system of implications thus calculated is sound and complete in the sense that they are valid implications and that, furthermore, any valid implication in the context can be syntactically deduced from them. Therefore, this is the most promising approach to find such an implicational set, as needed for experiments.
Note that this approach of artificially
duplicating the number of attributes to find a basis of implications, as described in [
14], does not use, neither explicitly nor implicitly, the axioms and inference rules of simplification logic for mixed attributes. Particularly, it contains a certain amount of redundancy since the complementation relation between an attribute,
, and its negation,
, is not considered in this approach. The objective of this experimental investigation is to quantify the reduction in the size of these sets of implications by constructing equivalent sets using the proposed algorithms.
The experimental contexts were constructed with the number of objects fixed at and considering a number of varying attributes, . To simulate a greater number of scenarios, the density of the context, i.e., the ratio of the cardinality of I compared to the size of the context , was also considered as a parameter for its construction. This was defined as the proportion of non-zero entries in the binary table of the relation I. Three different values were considered for this variable, 0.1, 0.25, and 0.4, which modelled contexts of lower to higher densities. Finally, 10 formal contexts were constructed for each combination of parameters, in accordance with the specifications previously outlined.
Implementation Details
All methods were implemented using the R programming language [
22], leveraging the capabilities of the
fcaR library [
23], which specialises in handling formal contexts and extracting logical implications. To ensure transparency and reproducibility, the algorithmic code, experimental configurations, and generated datasets were meticulously documented and shared in a publicly accessible GitHub repository [
24].
4.2. Simplification of Implications
In [
17], two algorithms were designed, named
Simplify-Mixed (v1) and (v2), for redundancy elimination and the construction of simplified mixed implicational systems (sm-implicational systems for short). The purpose of the algorithms proposed in this work is to complement the more advanced version of that work, (v2), with more effective tools for redundancy reduction. Consequently, we will refer to (v3) as the version of
Simplify-Mixed from [
17] where the mechanisms of this work, Algorithms 1–4, were integrated.
The experimental comparison between the previous versions and (v3) were carried out considering three particular dimensions, which quantified the reduction achieved when operating on an initial system of implications, , and obtaining an equivalent system, :
The cardinality of , denoted by , allowed us to determine the reduction achieved in the number of implications.
The size of the set of implications, denoted by , was the measure of the total number of attributes appearing in the set.
The execution time of each algorithm was measured in order to allow for an empirical comparison of the computational cost of each algorithm.
It is important to note that the most interesting measures were the last two, as the size of a set of implications, rather than its cardinality, determined the magnitude of the cost of operating with those implications. This was because each operation with them (checking inclusions, performing unions or differences of sets, etc.) was carried out attribute by attribute. Consequently, in this analysis, greater importance is attached to achieving a greater reduction in the size of a system of implications and to shorter execution times.
In the following, we present the experimental results obtained by running the three algorithms, (v1) to (v3), with the constructed dataset of mixed implications. In
Figure 1, the dependency of the size and cardinality of the final set,
, is visualised as a function of the same parameters of the initial set of implications,
. A clear linear trend can be observed in all three cases. Furthermore, it is evident that, for both measures, the size and cardinality for the final set of implications obtained with version (v3) were the smallest, with a significant difference compared to the
baseline versions, (v1) and (v2), with the latter being the most efficient so far.
The linear trend allows for linear regression analysis of the data (already shown in
Figure 1), yielding the following results, where
is the coefficient of determination, which in this case is the square of the correlation coefficient:
This analysis demonstrates that algorithm (v3) was the most effective in reducing redundancy, reducing the size of the system to only 31.7% of the original and its cardinality to 46.3% of the original. Algorithm (v2) was the next most effective, reducing the size to 57.4% and the cardinality to 69.2%. The simplest version of Simplify-Mixed, (v1), reduced the size to 74.7% and the cardinality to 99.6%. This indicates that (v1) barely eliminated redundant implications but did eliminate approximately 25% of the initial appearances of attributes, suggesting that these attributes may have been redundant. In contrast, (v3) reduced the cardinality to less than half, thus eliminating a significant number of redundant implications. This was achieved by identifying implications that could be syntactically deduced from the rest. A similar outcome was observed with the attributes, where the final size was less than one-third of the initial size. In comparison to (v2), it could be said that (v3) eliminated over 50% more attributes than (v2).
The other dimension of this experiment was the execution time of each algorithm. (v3) performed a greater number of checks in order to eliminate a large number of redundant elements while maintaining equivalence with the initial implicational system. This resulted in a longer execution time, as can be seen in
Figure 2. The graph illustrates that (v3) was the slowest algorithm, with an increase in time of approximately 45% compared to (v2) and more than 200% compared to (v1). The average times (over all generated datasets) for each algorithm were as follows: (v1)
s; (v2)
s; and (v3)
s. It can be concluded that (v3) was the most effective in eliminating redundant elements according to simplification logic, but it also had the longest execution time.
It is worth mentioning at this point that each of the algorithms being compared performed a series of iterations until a fixed point was reached. This is defined as a set of implications on which the associated equivalence rules have no effect. A descriptive statistic of the number of iterations required to complete the process is as follows.
(v1): Minimum number of iterations: 1; maximum, 3; median, 2.
(v2): Minimum number of iterations: 2; maximum, 5; median, 3.
(v3): Minimum number of iterations: 2; maximum, 7; median, 2.
This information can be employed as a potential solution to the elevated execution time cost of (v3), as the performance of the three algorithms was analysed in an iteration-by-iteration manner. That is, the size (number of attributes present) of the implication set after the first, second, and subsequent iterations of each algorithm was evaluated in comparison to the original size of the input set. Consequently, the execution time of (v3) could be reduced by limiting the maximum number of iterations that it was permitted to perform.
This analysis can be visualised in
Figure 3. In order to normalise the results, the proportion of attributes that remained present after each iteration was measured relatively to the initial size, averaged over all executions. Thus, at the outset (iteration 0), the normalised value was 1, while a value of 0.7 in iteration 1 signified that 70% of the attributes remained present or 30% were eliminated as redundant. Consequently, the lower the value expressed in the graph is, a greater reduction was achieved in one iteration. As illustrated in the graph, on average, the initial iteration (v3) achieved superior outcomes not only in comparison to (v1) and (v2) after the initial iteration but also throughout the execution of the other algorithms. It should be noted that in
Figure 3, for clarity, only the first four iterations are shown, as the values obtained in the experiments began to stabilise from that iteration and the reduction in size was less steep.
In light of these considerations, it was possible to propose simplified versions of (v3) where the execution was limited to one or two iterations. These versions are denoted, respectively, as (v3.1) and (v3.2). The behaviour of these two versions was analysed with the same dataset, and the results can be observed in
Figure 4. In this figure, the outcomes of the algorithms (v1) and (v2) and the proposed and previously analysed algorithm (v3) are presented, with a higher degree of transparency to facilitate their use as a reference without obstructing the visualisation. The results of (v3.1) and (v3.2) are also displayed. On the left side of the figure, it can be observed that both reduced versions were capable of achieving a greater reduction in the size of the implication set than both the versions (v1) and (v2) from [
17]. In fact, (v3.2) achieved results that were practically comparable to those of (v3). In contrast, with regard to execution time, on the right side of the figure, it can be observed that the time taken by both algorithms was less than that taken by (v2), which obtained superior results in the study in [
17].
A regression analysis analogous to the previous one was conducted, this time utilising the outcomes of (v3.1) and (v3.2):
4.3. Discussion
In the context of mixed-attribute frameworks and implicational systems, it is necessary to distinguish between two computational phases:
The construction of a correct, complete implicational system, , which, as much as possible, is without superfluous or redundant implications.
The analysis of the syntactic derivation of an implication, , from the determined implication set, , in the previous phase.
The objective of this study was to optimise the second phase of the process, transforming the system into an equivalent one, , with a smaller cardinality and size. From a computational and practical standpoint, reducing the size of the system is relevant because it implies that the syntactic derivation can be performed more efficiently, as the number of operations (on sets of attributes) directly depends on this parameter.
In the experimentation, the only current mechanism [
14] was used to construct a correct and complete system of mixed implications, albeit with the presence of redundancy. This is because this approach used classical logic, where contradictions or complementary attributes did not appear. The performance of the new versions, (v3), (v3.1), and (v3.2), was analysed in comparison to the already-known (v1) and (v2) published in [
17].
Although it was demonstrated that the proposed model, (v3), yielded superior results compared to the previous models, (v1) and (v2), the extensive number of tests conducted resulted in a significant increase in execution time. Consequently, two new variants, (v3.1) and (v3.2), were introduced. Limiting the maximum number of iterations resulted in a significant reduction in computation time, although this was accompanied by a slight deterioration in performance in comparison to (v3). Nevertheless, although they did not achieve the optimal outcomes, (v3.1) and, in particular, (v3.2) yielded superior results compared to (v2), the preceding version that was most effective in eliminating redundancies.
5. Conclusions and Future Work
In this work, new simplification schemes for implications in the mixed-attribute paradigm are proposed. These schemes are based on the use of simplification logic, a complete and sound axiomatic system to deal with implications where attributes are marked with the sign + if the attribute is present or − otherwise. Note that in this paradigm, there are only two possibilities for an attribute, to be present or to be absent.
Classically, implicational systems with mixed attributes are constructed in such a way that they contain a lot of redundant items, that is, there are attributes and even implications that are not relevant and whose removal would produce a completely equivalent system of implications. For instance, every implication with a contradiction (of the kind , for instance) can be straightforwardly derived from the axioms of simplification logic; therefore, they can be safely removed from any system of implications.
Our proposal focuses on using this logic to obtain equivalence rules, that is, valid transformations of the system of implications that produce a new equivalent system but with a lower number of attributes, since this quantity measures the computational cost of operating with the system. Some equivalence rules and their corresponding algorithms were presented in [
17]. In this work, it was shown that the proposed algorithm, (v3), based on the equivalence rules herein developed, was able to outperform the previous simplification schemes, (v1) and (v2) in [
17], producing more reduced implicational systems. To tackle the problem of the high computation time of (v3), two versions, (v3.1) and (v3.2), which were only allowed to run for one and two iterations, respectively, were presented and analysed. Both of them improved the results of (v2), with less computation time. Thus, these versions are suitable for problems with large sizes, since they were able to obtain competitive results in few iterations.
Among the lines of future research lies the development of algorithms to check whether a system of implications contains some kind of redundancy from the standpoint of simplification logic. In this sense, the objective is to obtain a minimal set of implications equivalent to the original one. On the other hand, it becomes more apparent that there is need to design a method to directly build a basis of implications from the formal context and the given Galois connection. This way, the theory behind the mixed-attributes paradigm will need to be further extended, since some notions (such as pseudo-closed and pseudo-intents) have to be defined properly in this setting.
Related to this line of work, defining a closure algorithm for mixed implications is needed. That is, since an implicational system induces a closure operator equivalent to that of the Galois connection, it becomes necessary to define such an algorithm that computes the closure of a set, X, by the means of the complete and correct system of implications, , instead of using the concept-forming operators (⇑, ⇓). The main point would be to study the feasibility of such an algorithm, analysing simplified versions of this setting (such as considering only implications where the right-hand side size is one) to obtain and validate the first approaches.
In the long term, this work will be extended to currently relevant frameworks of FCA such as unknown information [
17] or the multiadjoint approach [
25].