1. Introduction
Research in Artificial Intelligence (AI) has followed two, different but complementary, directions: the Logical/Symbolic/Neat approach vs. the Analogical/Connectionist/Scruffy one [
1,
2]. The former [
3], based on numerical/statistical techniques, resulted in systems that are efficient and noise-tolerant, but that cannot capture the complex network of relationships existing in real-world situations, and are not understandable by humans (so-called ‘black box’). The issue is well-known and faced under the name eXplainable Artificial Intelligence (XAI) [
4,
5], also among big players, such as The Royal Society [
6], IBM [
7], and DARPA [
8]. The other approach [
9], based on symbolic/logic techniques, and specifically on the First-Order Logic (FOL) setting, can natively handle multi-relational representations and reproduce high-level human reasoning mechanisms. Both approaches are needed to handle the many different aspects involved in carrying out ‘intelligent’ behavior in the real world. The former is more suitable to reproduce perception and sub-conscious mechanisms, while the latter is more appropriate to implement conscious reasoning [
1].
With the progressively pervasive use of AI today, many tasks are being addressed that require human-compliant AI approaches and solutions, especially in critical domains, so as to enforce trustworthiness and support accountability of the automated systems. A fundamental component in this landscape is the ability to explain the system’s decisions in human-understandable terms [
10]. While in XAI many attempts are ongoing to superimpose explanations onto subsymbolic, black-box AI models, symbolic techniques can provide native transparency and explanation capabilities. Hence, the motivation for this paper to focus on AI approaches is based on logical inference.
The most classical, investigated, and for some respects obvious, kind of inference to be reproduced is deduction. Still, it is a very strict kind of inference, that can deduce absolute truths only from certain and complete knowledge. The complexity of the real world requires humans to use several other kinds of inference to tackle issues such as missing or wrong information, uncertainty, efficiency, etc. As a consequence, also the AI literature investigated many inference strategies, proposing automated procedures that can simulate the way in which they are carred out by humans. Single strategies have been typically studied separately, resulting in the definition and implementation of different inferential operators. Unfortunately, several studies demonstrated that single inference approaches suffer from significant limitations, that can be overcome only by combining many different inference strategies together, so as to leverage the advantages of each and compensate for each other’s shortcomings [
11], just like humans do. However, so far the mainstream literature on automated reasoning only focused on the combination of very small sets (most often just pairs) of inference strategies and operators. This provides the second motivation for this paper.
So, the objectives of this paper are:
To propose a research direction aimed at combining as many strategies as possible in one integrated framework;
To identify the most promising setting in logic-based AI for hosting the widest possible combination of different inference strategies;
To identify the single approaches to automated inference strategies proposed in the literature for such a setting that are compliant to each other, and amenable to a smooth integration; and
To provide a practical example of working the combined framework.
A systematic literature review was carried out, with Google and DuckDuckGo as search engines. We performed several searches using different strings, aimed at identifying relevant literature starting from the general topic and progressively focusing on more specific strategies and related approaches. First, we looked for “multistrategy reasoning”, and found that it does not exist as a research branch of AI. Thus, we tried with more explicit search strings, such as “reasoning frameworks in artificial intelligence” and “combination of inference strategies in artificial intelligence”, but only very general and non-technical pages were found as results. We then started narrowing the focus to logic, using strings such as “logic-based approaches to automated reasoning” or “logic-based reasoning frameworks in artificial intelligence”: here, accessing most relevant pages in the results and following the links they contained, we could realize that Logic Programming was the setting in which most investigation on single strategies were carried out, and that Machine Learning was the branch in which most attempts at combining different strategies were made. Next, we performed searches on the single strategies, in these settings, using strings such as “deduction, abduction, … techniques in artificial intelligence” and “deduction, abduction, … in logic programming”. Here, we mostly found the literature we were already aware of, with most recent contributions dating back to a few years ago. Finally, to further check whether new works were not found with these searches, we tried searching the titles of the papers we already knew from our previous research, to check whether and which new literature was citing them and extending their results. The most recent works we found are those cited in the next sections. After such a review we may conclude that, to the best of our knowledge, this is the first proposal and attempt to define and implement an overall solution combining in a single framework many approaches implementing different automated reasoning strategies.
Given the above landscape and motivations, this paper contributes from three perspectives:
- Position:
It posits the need for symbolic reasoning and for an overall framework combining several different inference strategies (in principle, as many as possible). Such a framework would mimic the variety and flexibility of human reasoning. Of course, an ultimate solution to this problem is beyond the scope of this paper and of the current state of the art in AI; still, this paper is an attempt to define the problem and provide a partial solution, as a starting point to be further expanded in future research to try and approximate more and more closely human reasoning.
- Survey:
Based on this position, its largest contribution is the identification of a suitable logic setting on which basing the framework, and a summarization of the literature on automated reasoning within the identified setting. Specifically, this paper proposes a two-fold survey of the current research landscape: first, it overviews the approaches developed for single inference strategies; then, it overviews the existing attempts at combining those strategies.
- Proposal:
The limitations of these attempts motivate the novel contribution, which consists in proposing a new research direction, named Multistrategy Reasoning (MSR), specifically aimed at investigating approaches that combine together as many inference strategies as possible in a single, integrated framework. We also introduce the GEAR inference engine as our proposed implementation for MSR, and the first practical contribution to this endeavor. It combines many compatible approaches described in the survey that were developed separately and are so far combined in a limited way.
We selected Logic Programming (LP) as the richest and most promising setting in which this view can be achieved, for several reasons. First, LP is the fragment of logics traditionally connected to computer implementation: logic (Prolog) programs have the same power as standard algorithmic programs, but exploit logic deduction as the mechanism for executing programs described as sets of logic formulas. Second, many different automated inference strategies and operators investigated in the literature adopt this setting, which is a pre-requisite for their tight integration. Within the LP-related literature, we reviewed and selected the works that we considered more appropriate for a smooth integration, covering more than 25 years of research on this topic. A first criterion for selection was that they all spread from, and extend, the basic LP setting, that provides deductive inference. Then, among the various proposals based on this setting, the second criterion was to select those that can more smoothly and immediately be combined, since their assumptions and mechanisms are compliant with each other.
The paper is organized as follows. After recalling the basics of inference strategies and of the formalism we will adopt, selected approaches proposed in the literature for the single strategies of interest are described. Then, the combinations of these strategies proposed in previous works are discussed from a new perspective that highlights how they can be merged in an overall framework, finally introducing the GEAR system that implements this cooperation, along with its formalism. The last section concludes the paper and outlines future work issues.
3. Inference Strategies
Let us now provide state-of-the-art definitions and settings for a number of inference strategies that can serve several needs of an automated reasoning system.
3.1. Deduction
Deduction is the kind of inference aimed at making explicit knowledge that is only implicit in the available knowledge, but is a strict consequence thereof.
One way to carry out deductive inference in LP is based on (a specific version of)
resolution. It is an inference rule by which, given two clauses
and a substitution
such that
for some
, then the clause
obtained by “
resolving on literal
”, is a logical consequence of
and
. Thanks to a Subsumption Theorem provided in [
13], this allows us to define deduction based on repeated resolution steps (called a
derivation and denoted by ⊢):
Definition 4 ([
14])
. Let Σ be a set of clauses, and C a clause. We say that C can be deduced
from Σ if C is a tautology, or there exists a clause D s.t. and Dθ-subsumes C. In traditional LP, a conjunction of literals () can be proven by refutation: if adding its negation (which is a goal ) to a program P the empty clause (i.e., a contradiction) can be obtained by repeated application of resolution steps, then the conjunction is proven in P. Thus, this is a kind of backward (or goal-driven) strategy, in which deduction is focused on proving a given goal. In the opposite perspective, given a set of facts a forward (or data-driven) strategy picks in turn all the available facts and resolves them with the clause bodies whenever possible, progressively deriving all of their possible consequences.
If the body of clauses may contain negated literals, the resulting programs are called
general logic programs. In the backward approach, such negated literals are proved using the
Negation as Failure (NAF) rule [
20]: if an atom cannot be proven by refutation, then its negation is assumed to be true. The NAF rule stems from the Closed World Assumption: only what is reported in the program is true; whatever is not reported in the program is considered as false.
3.2. Abstraction
Abstraction reduces the amount of information conveyed by a set of facts, called the
reference set [
11]. This reduces the computational load needed to process the set of facts, provided that the information that is relevant to the achievement of a goal is preserved.
According to [
21], it is based on a
reasoning context . A world
W contains various kinds of (atomic or compound) objects, each kind with specific properties.
, the world-perception, is a perception system consisting of a set of sensors, each dedicated to a specific signal and with a resolution that determines the minimum difference it can distinguish between two signals.
S, the structure, stores the information sensed by
as a set of tables in a relational database, in which each object is associated with a unique identifier. To intensionally describe the world, a logic language
L associates tables in
S with predicate and function symbols, on which reasoning can be carried out.
can be formalized as a 4-tuple
, where
O is a set of perceived objects,
A is a set of object attributes,
F is a set of functions, and
R is a set of relations. These are the same items that are considered when building the conceptualization of a world to formally describe it.
A set of values provided by the sensors in P is called a signal pattern or configuration; the set of all possible configurations that can distinguish is denoted by . Different perceptions of the same world may be obtained (e.g., due to the kind or resolution of the sensors, the focus or perspective, etc.). In ML, this corresponds to the phase of feature selection. A perception system is said to be simpler than another if it hides some information that is apparent in the other, thus offering less information to manipulate. More formally:
Definition 5 ([
21])
. Given a world W, and two perception systems for W, with configuration sets (resp.), is simpler
than if injective but not bijective An abstraction switches from a perception system to a simpler one:
Definition 6 ([
21])
. Let W be a world, and , two reasoning contexts, where is simpler than . An abstraction
is a mapping . In this definition, subscript g stands for ‘ground’ and subscript a stands for ‘abstract’. Note that an abstract configuration is uniquely determined from a ground one, but the opposite (an operation called concretion), is unfeasible unless additional information is provided. Thus, abstraction takes place at level , that is the source of information, and then it propagates to the other levels in the reasoning context only as a side-effect, in the following order: is recorded in S and then described by L. In this view, changes in the structure and language are mere consequences of different mappings in the world perception.
Abstraction happens by means of a set of operators. This set includes general operators that allow to: group indistinguishable objects into equivalence classes; group a set of objects in the ground world to form a new compound object that replaces them in the abstract world; ignore terms in the abstract world, where they disappear; merge a subset of values that are considered indistinguishable; drop a subset of arguments, thus reducing the arity or a relation; eliminate all arguments in a function or relation, so that it moves from a predicate logic to a propositional logic setting at the language level (propositional abstraction). Domain-specific operators can also be included. Each level in the reasoning context has its own set of operators: for , for S, and for L. Operators in each of these sets correspond to the operators in the others, so that, given a ground reasoning context, from one can obtain not only the perceived world abstraction , but also its abstract structure and language , by applying the corresponding operators in and , respectively.
At the language level, abstraction is defined as a mapping between representations that are related to the same reference set but contain less detail, so as to preserve some properties and discard others. It aims at helping to solve a problem in the ground description and at making the search for a solution more easily manageable [
22].
Definition 7 (adapted from [
22])
. Given two clausal theories T and built upon different languages and (and derivation rules), an abstraction
is a tuple , where f is a computable total mapping between clauses in and those in . We will call T the ground theory
and the abstract theory.
Abstraction is truth-preserving, since it is based on deduction.
3.3. Abduction
Abduction is devoted to cope with missing information, by guessing unknown facts when they are needed for a given purpose.
The Abductive Logic Programming (ALP) [
23,
24,
25] framework extends LP by allowing to guess some ‘abducible’ facts (abductive hypotheses) that are not stated in the available knowledge but are needed to solve a given problem. The problems can be observations to be explained (as in classical abduction) or goals to be achieved (as in standard LP). Of course, there may be many plausible explanations for a given observation, and thus abductive explanations are not conclusive, requiring strategies to filter and rank explanations.
Definition 8 ([
25])
. An abductive logic program
(or abductive theory
) consists of a triple , where:- P
is a general logic program;
- A
(Abducible predicates) is a set of predicates;
- I
(Integrity Constraints) is a set of formulas that must be satisfied by the abductive hypotheses.
In principle, the specific atoms on which abductions can be made should be the ‘abducibles’ [
23]. By extension, using an
abducible predicate is a quick way to say that any atom built on that predicate can be abduced. In other words, an abducible predicate is a kind of claim that may be abduced, while an abducible literal is a specific claim that may be abduced
Definition 9 (Abductive explanation [
25])
. Given an abductive theory and a formula G, an abductive explanation Δ
for G is a set of ground atoms of predicates in A s.t. (Δ explains G) and (Δ is consistent). Among the various procedures proposed in the literature to obtain abductive explanations for abductive logic programs, also when negated literals are used in the body [
26], we adopt the one proposed by [
27]. It interleaves two phases:
abductive and
consistency derivations. An
abductive derivation is the standard LP derivation extended in order to consider abducibles. When an unknown abducible literal
must be proved, it is added to the current set of abductive hypotheses, starting a
consistency derivation to check that no integrity constraints involving
is violated. In doing this, the
abductive derivation is used to solve each goal, which might further extend the set of abductive hypotheses. The procedure returns a minimal abductive explanation (according to set inclusion) if any, otherwise it fails.
Traditional ALP considered Integrity Constraints (ICs) in the form of LP goals (
), i.e., negations of conjunctions of literals (the
nand logistic operator). Reference [
28] extended traditional ALP into the
Expressive ALP (EALP) framework, allowing additional types of ICs based on other operators, and specifically:
- and
all listed literals must be true;
- nor
all listed literals must be false;
- xor
exactly one of the listed literals must be true;
- iff
the and constraints on the first and second list of literals must either both succeed or both fail;
- if
based on the equivalence: the nand constraint on the list of literals in the premise or the and constraint on the list of literls in the consequence must be true;
- or
at least one of the listed literals must be true;
- nand
at least one of the listed literals must be false.
This set of typed ICs strictly extends the expressiveness of the framework, because they cannot be simulated by the traditional ICs used in ALP. To handle them, some changes are required in the abductive procedure, and specifically in the consistency derivation [
28].
3.4. Uncertain Reasoning
Much research investigated how to combine logical and statistical inference, so that the former supports high-level reasoning strategies, and the latter improves flexibility and robustness. From an LP perspective, they resulted in the
Probabilistic Logic Programming (PLP) setting [
29].
Very relevant in PLP is the distribution semantics [
30], by which a probabilistic logic program defines a probability distribution over a set of normal logic programs (called
worlds). Examples of languages based on the distribution semantics are ProbLog [
31], PRISM [
32], LPADs [
33], and CP-Logic [
34]. They differ in the way they define the distribution. Both ProbLog and PRISM allow to set probabilities only on facts; the former allows two alternatives only (true or false). LPADs and CP-Logic offer a more general syntax than ProbLog and PRISM. Since in CP-Logic some programs to which a causal meaning cannot be attached are not valid; in the following, we will consider LPADs.
A Logic Program with Annotated Disjunctions (LPAD) consists of a finite set of disjunctive clauses in which each atom in the head is annotated with a probability, of the form:
where the semicolon is the logical disjunction operator,
are atoms,
are literals,
are such that
, and
(representing the case that none of the
’s is true) does not appear in the body of any clause. Note that, if
and
,
C is a non-disjunctive clause. Without loss of expressive power [
29], probabilistic facts are considered as independent.
Concerning the distribution semantics for Datalog programs, an atomic choice is a selection of from , where is a grounding substitution and . It represents an equation of the form where is a random variable associated with . A composite choice is a consistent set of atomic choices for the clauses in a program, meaning that the choices do not select different heads for a given ground clause. The probability of a composite choice is . A total composite choice, or selection, includes one atomic choice for every grounding of each probabilistic clause. It identifies a world (a logic program) , whose probability is . Since we work in Datalog (i.e., the program does not contain function symbols), the set of worlds is finite and is a distribution over worlds: .
The conditional probability of a query (ground atom)
q given a world
w is defined as:
if
q is true in
w (
), or 0 otherwise. By marginalization, the probability that query
q is true is:
It is worth concluding this section by mentioning a completely different approach to uncertainty, not based on a formal definition of probabilities. In fact, the mathematical theory of probability requires complex computations that are not actually carried out by humans when dealing with uncertainty in everyday reasoning. Rather, they use informal but quick ways of estimating the certainty of the information they handle. The most relevant approach aimed at simulating this behavior is perhaps the one implemented in MYCIN, one of the first successful expert systems available in the literature [
35]. It defines a certainty function
and, inspired by fuzzy logic [
36]; it computes the composition of certainties as follows.
Given two facts and , the certainty of their composition using the basic logistic operators is:
(since both pieces of information are required, the less certain affects the overall certainty);
(since the two pieces of information are interchangeable, the certainty of the most certain can be assumed);
(the certainty of a negation is the complement of the original certainty).
3.5. Argumentation
Argumentation is the inferential strategy aimed at dealing with inconsistent knowledge, in order to distinguish which of several contrasting positions in a dispute are justified. In a dispute, the participants make claims (the arguments) to support their own position, to attack the arguments for competing positions of the other participants, and to defend their position from the attacks of the others. Abstract argumentation, in particular, stemmed from ALP and focuses only on the inter-relationships among the arguments, neglecting their internal structure or interpretation.
Abstract Argumentation Frameworks (AFs) can be graphically represented as graphs, whose nodes are the arguments and whose arcs represent the relationships between pairs of arguments. As originally defined [
37], they can express only attacks among arguments. Several lines of research introduced additional features in the AFs, usually studied in isolation. The
Generalized Argumentation Framework (GAF) [
38] extends traditional AFs with bipolarity (the possibility of expressing both attacks and supports between pairs of arguments) and weights on both the arguments and the attack/support relationships (denoting their strength ). It provides a much more powerful model to carry out abstract argumentation, and is compatible with the most prominent extensions proposed in the literature.
Definition 10 ([
38])
. A Generalized Argumentation Framework
(GAF) is a tuple to begin
, where: is a finite set of arguments,
is a (possibly empty) system providing external information on the arguments in ,
assigns a weight to each argument, to be considered as its intrinsic strength, also based on , and
assigns a weight to each pair of arguments.
Quite intuitively, negative weights denote attacks (attacking an argument subtracts to its credibility) and positive weights denote supports (supporting an argument adds to its credibility). Weight 0 can be interpreted as the absence of any (attack or support) relationship (0-valued arcs are usually not drawn in the graph). Combinations of attacks and supports can also be considered:
Attacking the attacker of an argument amounts to defending (i.e., somehow supporting) that argument (known as reinstatement);
Attacking the supporter of an argument amounts to attacking that argument;
Supporting the attacker of an argument amounts to attacking that argument;
Supporting the supporter of an argument amounts to supporting that argument;
and is easily handled in GAFs using the mathematical sign rule:
(see
Figure 1 for a graphical representation). This rule also holds for sequences of mixed attacks and supports: a sequence including an even number of attacks amounts to a defense (and indeed a product involving an even number of negative factors yields a positive result); vice versa, a sequence including an odd number of attacks still amounts to an attack (and indeed a product involving an odd number of negative factors yields a negative result). 0-weight links in a sequence would bring the product at 0, meaning that the initial and final arguments do not affect each other along that sequence.
Having a bounded weight range, with fixed minimum and maximum values, can help intuition. means that the attacking argument ‘fully’ defeats the attacked one; means that the supporting argument ‘fully’ supports the supported one. The range for attacks and supports is also very intuitive.
In real-world situations, often the context of the arguments affects their reliability. Contextual factors include the community in which the argumentation takes place, and the topic about which the claims are made. To take this into account, the T-GAFs specialization of the GAF model introduces community and topics as components in :
the finite set of members of the community, possibly including the entities who put forward the arguments, and
a finite set of topics that may be involved in an argumentation (including a dummy topic ⊤ used to express the general authority and trust of a user, independent of specific topics).
In turn, these items allow to define some components to be used in , such as:
- 1.
the subjective confidence that the members of the community (including the entity which posits the argument) have in an argument:
where 1 means certainty, according to the classical probabilistic interpretation.
- 2.
the recognized degree of authority on the topic of the argument of the entity putting it forward:
where 1 means maximum authority of the user in the topic, and 0 absolutely no authority.
- 3.
the trust that the community members have in the entity putting forward an argument, relative to the topic of the argument (indeed, not just the quality of evidence, but also the credibility of the entity positing it is important):
where means total distrust, 0 means no opinion, and 1 means full trust.
(2) and (1) express the degree of expertise of an entity about a topic and its degree of confidence about a specific claim, respectively, and (3) expresses the degree of confidence by which a user’s opinions about a topic are taken into consideration by other users. The assessment of the ‘intrinsic’ reliability of an argument in the GAF may include these components in an overall formula once specific definitions for functions , and (for the various topics) are given.
3.6. Induction
The term induction refers to the inference of general rules or theories starting from specific instances. Observations are descriptions of objects or situations as ‘perceived’ from the world. Examples are labels assigned to observations to explicitly specify what are the concepts of interest (to be learned) in the observations. Examples can be positive (representing instances of the concepts) or negative (representing instances that do not belong to a concept). Based on these definitions, Inductive Learning aims, given a set of examples concerning some concept, at extracting a model (i.e., a characterization) of that concept, on whose ground trying to foresee if new observations that are available correspond to the concept or not. Specifically, examples are used in supervised learning, while unsupervised learning is based on observations only.
Inductive Logic Programming (ILP) is a branch of Machine Learning concerned with the induction of models from positive and negative examples, exploiting the clausal fragment of FOL as a representation language for both the background knowledge and the induced theories. This setting has two main advantages: first, the induced models (and the information handled in general) have an intuitive meaning for humans, which makes them highly suitable for exploitation in application domains and systems for which the human validation of the machine processing is required; second, hard real-world domains requiring the representation of relations among objects can be faced.
In ILP, the observations are expressed as sets of facts, and do not consist of a fixed number of attributes, but may be longer or shorter, depending on what information they need to express. Some approaches to concept learning require that each example is explicitly associated with all the facts that are relevant to their underlying concept (
direct relevance [
39]). Others do not provide such explicit connection, and let the inductive system extract what it considers as relevant from an overall set of facts (
indirect relevance [
40]).
Definition 11 (Inductive Learning paradigm [
13])
. A theory
T is a set of hypotheses. A hypothesis
H is a set of program clauses with the same head, i.e., defining the same concept.An example E is a ground (Horn) clause. It is called positive for a hypothesis H if its head has the same predicate and sign as H; it is called negative for H if its head has the same predicate as H but opposite sign.
Given:
A set of examples ,
where is a set of positive examples, and a set of negative ones;
A (possibly empty) background knowledge (or BK) B.
Finda theory (logic program, model) T s.t.
and moreover the following properties are fulfilled:
((prior) necessity)
(prior consistency)
(weak consistency)
As in deduction, the fundamental operation is unification [
12], allowing to find a common instance of two clauses (if any), in induction its dual,
generalization [
13], allows to find (if any) a clause of which two given clauses are both instances. Just such as when one is interested in the most general unifier, not to skip any step in the deductive process, in the inductive process one looks for the
least general generalization (
lgg). It depends on the generality order adopted on clauses. The
of any two clauses under
-subsumption, if any, is unique. Still, some efficiency and intuition issues led to the definition of a specialization of it,
-subsumption, based on the Object Identity assumption (see
Section 2.2).
Definition 12 ([
14])
. Given a theory T and an example E:T makes a commission error iff , : C is inconsistent wrt E
T makes an omission error iff : H is incomplete wrt E.
An incomplete or inconsistent theory is incorrect.
An inconsistent (or too strong) theory is too general and needs to be specialized; an incomplete (or too weak) theory is too specific and needs to be generalized.
Some inductive systems work in a batch way: they start from an empty theory and stop the inductive process when the current set of hypotheses is able to explain all the available examples. When new evidence contradicts the learned theory, the whole process must be restarted from scratch, taking no advantage of the previously learned hypotheses. Other systems can revise and refine a theory in an incremental way: they try to change the previously generated hypotheses in such a way that the changed hypotheses explain both the old and the new examples. The former approach generally builds definitions top-down (i.e., from more general to more specific ones); it yields more compact and elegant theories, but is computationally expensive. The latter can just revise an existing hypothesis on the ground of new evidence, thus working bottom-up, but the resulting theory is not as elegant as those produced by batch systems. A cooperation between the two might run batch learning from time to time to obtain a better structured theory after many incremental revisions.
In the incremental approach, when the theory is incorrect it can/should be revised, searching for either a specialization (
downward refinement) or a generalization (
upward refinement) of the incorrect part of the theory. The refinement should be
minimal [
41]. Commission errors can be solved by exploiting properly a downward refinement operator, while, dually, upward refinement operators can cope with omission errors:
Definition 13 (Refinement operators [
13])
. Let be a quasi ordered set.A downward refinement operator ρ is a mapping s.t. , i.e., it computes a subset of all specializations of C. An upward refinement operator δ is a mapping s.t. , i.e., it computes a subset of all generalizations of C.
3.7. Ontological
Born as a philosophical discipline that deals with the nature and structure of reality, ontologies were transposed to the computational domain and given more technical definitions depending on different perspectives. Here, we will adopt the perspective of “a formal, explicit specification of a shared conceptualization” [
42], where a conceptualization is “an abstract, simplified view of the world that we wish to represent for some purpose” [
9] that underlies all developments in Computer Science, and especially those of Knowledge Bases. Thus, an ontology describes the kinds of entities that are of interest in a domain, their properties and relationships. Ontologies are sometimes intended as including the specific instances of those classes and relationships. In such a case, the definitional part is called
T-box (for ‘terminological’), the factual part about the instances is called
A-box (for ‘assertional’), and their union is called a
Knowledge Base (KB).
Ontologies are important because they provide meaning and context to the symbols used in the KB. Typical ontology-based reasoning tasks of interest are satisfiability (checking if the described world may exist), instance checking (checking whether an instance belongs to a certain concept), concept satisfiability (checking if a concept may exist in the described world), subsumption (checking if a concept is a subclass of another concept), equivalence (checking if two classes are the same), retrieval (of the set of instances that belong to a certain concept), extraction of super-/sub-classes, relationships and properties of a given concept. Particularly relevant is the relationship of generalization/specialization, on which inheritance can be applied.
The research on ontologies evolved separately from LP, and relied on the Description Logics [
43] fragment of FOL. Different description logics can be defined, depending on the available operators. Adding more and different operators extends the expressive power of a DL, but may lead to indecidability. Unfortunately, DLs are only partially overlapping to LP, and the non-overlapping parts are incompatible, mainly due to the different fundamental assumption they make on unknown information (Open World Assumption in DLs vs. Closed World Assumption in LP). Ontologies can be translated to default logic [
44], one of the most famous formalisms for non-monotonic reasoning.
3.8. Similarity
Similarity computation between FOL descriptions is complex due to
indeterminacy (the possibility of mapping various portions of one description in many ways onto another description). For this reason, very few works in the literature tackled this problem. Here, we will describe the approach proposed in [
45] for
linked Horn clauses, that was successfully used in many tasks. It considers a set of parameters and defines a similarity function based on them, plus a set of criteria to assess the similarity for different clause components. The parameters it uses for comparing two objects
and
are widely accepted in the literature [
46]:
- n,
the number of features owned by but not by (residual of wrt );
- l,
the number of features owned both by and by ;
- m,
the number of features owned by but not by (residual of wrt ).
Indeed, other classical and state-of-the-art distance measures are based on these parameters: e.g., the one developed by Tverski [
47], by Dice (
) or Jaccard’s index (
) and distance (
). Instead, the similarity function is novel, to overcome the shortcomings of these functions:
where larger values denote more similarity, and
(
) allows one to weight the importance of
and
in the similarity computation.
The similarity criteria deal with increasingly complex clause components: terms, atoms, groups of atoms, clauses. The similarity of more complex components is based on the similarity of simpler components.
In FOL formulæ, terms represent specific objects, while predicates express their properties and relationships. Accordingly, two levels of similarity can be defined for pairs of FOL descriptions: the object level, concerning similarities between the terms referred to in the descriptions, and the structure one, referring to how the nets of relationships in the descriptions overlap.
In the case of Horn clauses, since the head consists of a single literal (and hence can be uniquely matched), it can be used as a starting point for the comparison. The proposed approach considers as comparable only clauses having atoms of the same arity n in the head, and the comparison outcome will be interpreted as the degree of similarity between the two n-tuples of terms in the heads.
Similarity between two clauses is computed level-wise, based on the similarity of their constituents. While the details of the computation procedure are not relevant here, we describe the principles underlying these comparisons.
- Terms
Given two terms and , their object similarity is computed based on a combination of two partial similarities: a characteristic similarity based on the properties they own (i.e., the unary predicates of which they are arguments) and a role similarity based on the roles they play in relation to other terms (i.e., the positions they occupy among the arguments of n-ary predicates).
- Atoms
Given two atoms and built on n-ary predicates, their star similarity is computed based on a combination of: (a) the predicates of the atoms that are directly connected to them via some shared argument, and (b) the similarities of the terms appearing as their arguments.
- Atom Sequences
Given two sequences of atoms and built on n-ary predicates, their path similarity is computed based on the initial parts of the sequences that can be mapped onto each other, combined with the star similarities of the mapped atoms and with the object similarity of the terms they involve.
- Clauses
whose similarity is based on the amounts of common and different atoms and terms in the two clauses, considering the least general generalization to determine an overall atom sequence for the clauses (see below).
At each level, parameters n, l and m for the objects under comparison can be extracted to apply the similarity function.
All levels except the first one are based on the structure of a clause, defined by the way in which atoms built on n-ary predicates relate the various terms in the clause. Intuitively, the star of an atom depicts ‘in breadth’ how it relates to the rest of the formula, while a path in a clause depicts ‘in depth’ a given portion of the relations it describes. Multisets are needed since a predicate can have multiple instantiations among the considered atoms.
Clauses are represented by their associated graph, in which atoms are the nodes and arcs connect two nodes when their atoms share at least one argument. The sequences are obtained by building the associated graph as a Directed Acyclic Graph (DAG), stratified (i.e., with the set of nodes partitioned) in levels (elements of the partition) as follows. The head is the only node at level 0 (first element of the partition). It represents the unique starting point for building the associated graph, which also gives a unique access point for traversing it according to precise directions represented by the directed edges. Then, each successive level includes new nodes (not present in previous levels) that have at least one term in common with nodes in the previous level. In particular, each node (atom) in the new level has an incoming edge from each node (atom) in the previous level, having some argument (term) in common with it.
As said, the overall similarity between two clauses is computed based on their least general generalization:
Definition 14 ([
45])
. Given two clauses and with heads and respectively, call :- their least general generalization, and consider the substitutions and such that , and and , respectively. The formula for assessing the overall similarity between and , called formulæ similitudo
and denoted fs
, is the following:where , , ; , , ; and is a function combining the values of the star similarities (e.g., the average). 3.9. Analogy
Analogy is the cognitive process of matching the characterizing features of two items (subjects, objects, situations, etc.). It allows one to reuse knowledge from a known item or domain (called the
base) to an unknown one (called the
target), without having to learn from scratch. After finding an analogy on some roles, the association can be extended to further missing features. Analogical reasoning is essential for producing new conclusions that can help in solving a problem [
48]. While similarity is a syntactic task that looks for exactly the same features in two items, analogy maps ‘roles’, which has to do with semantics (i.e., the meaning). The mapping is bi-directional, while in metaphors it only holds in one direction. The analogy may depend on the context, goal or perspective, and may lose its original meaning and usefulness if they change. Furthermore, the outcome of reasoning by analogy might be inconsistent with previous knowledge. “If the inferences mandated by an analogy contradict a fundamental belief, especially one that has accrued many consequent implications, then resolving this contradiction might well involve the
shock and amazement of transformational creativity” [
49]. However, a set of analogical mappings can be used to identify a
recurring meta-pattern (i.e., a common network of roles).
Analogical reasoning consists of 5 steps [
48]:
Retrieval finds the best base domain that may help to solve the problem in the target domain;
Mapping looks for a mapping between base and target domains;
Evaluation provides criteria to evaluate candidate mappings;
Abstraction shifts the representation of both domains to their roles’ schema, converging to the same analogical pattern;
Re-representation adapts one or more pieces of the representation to improve the matching.
e.g., for Evaluation [
48]:
structural soundness holds if the alignment and the projected inference are structurally consistent;
factual validity is needed to check if the projected inference preserves truth (i.e., it does not violate any constraint), which is not ensured because analogy is not a deductive mechanism (but just a hypothesis);
relevance for current goal holds if and only if the produced inference moves the knowledge towards the goal.
Several works on analogy used non-relational representations, but are out of scope for this paper, which focuses on symbolic relational representations. The main contributions adopting formal (Propositional or First-Order) Logic representations include the
Structure Mapping Engine (SME) [
50,
51,
52] and its offspring [
53,
54],
Analogical Constraint Mapping Engine (ACME) [
55],
Learning and Inference with Schemas and Analogies (LISA) [
56],
Discovery Of Relations by Analogy (DORA) [
57],
Heuristic-driven Theory Projection (HDTP) [
58]. Most of them require that the same feature names are used for analogous roles, thus having more to do with similarity than to analogy. Others can map objects and relations having different names in the two domains, but use a quite complex knowledge representation formalism (e.g., SME and HDTP use dedicated sections of knowledge to formalize types of objects, or functions). HDTP also allows that features having the same name play different roles in the two domains.
A setting specifically based on LP formalisms for analogy was provided in [
59]. Let us start from the formal definition of a role. Informally, a role represents a kind of interaction. These interactions are associated to the properties and relationships expressed by the predicates in a description. The set of roles of a term expresses the allowed interactions of that term.
Definition 15 (Roles [
59])
. A role
is a predicate or its r-th argument position, denoted ; from this we define:The set of roles of a literal l such that as ;
The set of roles of a clause C as .
Concerning a specific term t, we have:
Given a literal l such that , the literal roles of a are defined as is the i-th argument of ;
Given a clause C, the clause roles of a are defined as ;
A role of a term t in C is any element of .
So, any predicate defines roles: and ().
Being a mapping of roles across two domains, an analogy can be formalized as a mapping of predicates and terms that play the same role in two descriptions:
Definition 16 (Analogy [
59])
. Given a pair of descriptions and the associated sets of roles and , an analogy
is a one-to-one mapping for and . If such an f exists, and are analogous. After finding a mapping, the source domain/description might contain knowledge that is not present in the target one, and vice versa. In such a case, the missing knowledge in each domain might be ‘borrowed’ from the other. This would allow one to better understand each domain and in some cases to accomplish tasks (e.g., making comparisons, solving problems, etc.) that would be impossible without that additional knowledge. This opportunity is formalized as follows:
Definition 17 (Analogical perspective [
59])
. Given a pair of descriptions , representing respectively the base and target domains, and an analogy between and , let and denote the knowledge that solves a given task in and the analogous task in , respectively. f satisfies
an analogical perspective
ifwhere: and denote the knowledge aligned by f; and are the non-aligned but transposable knowledge from the base and target domain, respectively; and the function transposes knowledge from to based on f. A procedure that, applied to two descriptions, returns possible analogies between them without requiring any additional knowledge is called an
analogy operator. Reference [
59] defined an analogy operator named
Roles Mapper, that includes all of the above elements. It is based on a syntactic logical approach that, given two clauses, looks for analogical mappings in the body. The head labels the situation that is described in the body and may be used to provide a preferred perspective for which the analogical mapping is sought. If the arity of the predicates in the heads is the same, the system is bound to establish an analogy between corresponding arguments. Using 0-ary predicates in the heads disables this feature.
4. Multistrategy Reasoning
There are many interconnections among the inference strategies proposed above. Most of them may help the others in accomplishing their tasks. Their referring to the same framework, Logic Programming, enables these interactions. In the following, we will first describe existing attempts to combine partial groups of inference strategies, selected so that they may be combined all together in the spirit of true MultiStrategy Learning. We will also hypothesize new possible combinations, and quickly introduce the GEAR system that implements our vision.
4.1. Relevant Approaches to Combine Different Strategies
We will now report some examples proposed in the literature of fruitful cooperation among different strategies.
4.1.1. Ontologies and Logic Programming
While, as said, the realms of Logic Programming and Description Logics, used to specify ontologies, are partly incompatible, attempts to merge them have been made in the literature. Here, we mention
[
60], as the most powerful decidable combination of Description Logics and disjunctive Datalog rules (i.e., Datalog rules whose head may consist of a disjunction of atoms). This language distinguishes ‘Datalog predicates’, coming from the LP perspective, from ‘DL predicates’, coming from the ontological perspective. These predicates can be mixed in a clause, but DL predicates cannot be negated. Two safety conditions are set:
(Datalog safeness) every variable occurring in the rule must appear in at least one of the positive literals in the body;
(Weak safeness) every variable of the rule head must appear in at least one of the Datalog positive literals in the body.
Compared to other approaches, weak safeness allows one to express conjunctive queries in through weakly-safe rules, thus increasing the expressive power and still guaranteeing decidability for many DLs.
4.1.2. Abduction and Deduction
Abduction has been naturally defined in [
23,
24] in tight integration with deduction to allow reaching conclusions or making prediction when the available information is insufficient. Thus, we will not delve further into this specific combination.
4.1.3. Similarity for Deduction
Reference [
45] shows how the similarity assessment approach can be used for helping a subsumption procedure to converge quickly towards the correct associations, and how it can be used to weaken subsumption and obtain a
flexible matching procedure that returns a degree of matching instead of a boolean decision.
4.1.4. Abstraction, Deduction, Similarity, Abduction and Argumentation for Induction
A very interesting case of the use of several strategies in support of induction is provided by the incremental ILP system InTheLEx [
61,
62].
Abstraction is carried out as a pre-processing step that removes useless information according to the framework in [
21]. This is obtained by expressing abstraction operators as clauses, such that whenever the body is recognized in an observation, the involved facts are replaced by those in the head, which is suitably defined to hide the useless information. In fact, there are several investigations in the literature on how abstraction can be used to shift to a higher-level language when concept descriptions that can explain the examples cannot be found using the current language [
40,
63,
64,
65,
66].
Deduction is used in a saturation step that makes explicit facts that are implicit in the available description of the observations and that may be useful to correctly grasp the concept that is being learnt. To do this, deduction exploits the rules in the KB. Whenever the body of a clause in the theory is recognized in an observation, the head of the clause is added to the observation itself.
Abduction is used to check if an unexplained example/observation can be explained by assuming additional unseen information that is not present in the observations. In such a case, the guessed information is added to the example description. This prevents the refinement operators from being applied, and the theory from being changed.
Similarity is used to guide the generalization operator, by taking the paths univoquely determined according to the technique proposed in [
45], and using a greedy techinque that adds the generalization of these paths by decreasing similarity, as long as they are compatible. Further generalizations can then be obtained through backtracking [
67].
Recently, argumentation has been integrated to identify consistent portions of inconsistent observations and to choose the one to rely on, exploiting the same integrity constraints defined for abduction to identify attacks and supports (an example of a further integration of different strategies) [
68].
4.1.5. Induction for Abduction, Abstraction and Deduction
Abstraction operators, integrity constraints for abduction (and argumentation), and rules for saturation can be inductively learned from observations, as shown in [
69,
70]. Combinations of facts that never occur generate integrity constraints for abduction (that can be used also for generating abstract argumentation frameworks [
68]). Combinations that always occur generate abstraction operators. Concept definitions learned by an ILP system can be used to identify known concepts in observations when learning other concepts.
4.1.6. Argumentation and Induction for Analogy
Reference [
59] defined the
Roles Argumentation-based Mapper analogy operator, that leverages argumentation to overcome the constraint that using the same descriptors in the two domains means that they necessarily denote the same roles. All possible analogical mappings between descriptors are considered, and mappings are inconsistent if they map one feature in one domain onto many features in the other. These inconsistencies are expressed as attacks in an argumentation framework, and abstract argumentation strategies are used to select only consistent associations.
In the same paper, the use of an inductive (generalization) operator to obtain more general knowledge structures that can be mapped onto several domains is also proposed.
4.1.7. Abduction and Probabilistic Reasoning
While, from a logical standpoint, all consistent abductive explanations are equally good, in a probabilistic setting different explanations of a goal are associated to different possible worlds, and their validity depends on the validity of the rules, facts and integrity constraints used to obtain those worlds. PEALP takes into account all these items [
28,
71]:
Definition 18 ([
28])
. A Probabilistic Expressive Abductive Logic Program
(PEALP) consists of a 4-tuple , where as in EALP and is a probability function. Definition 19 ([
28])
. Given a goal G and a PEALP , a (probabilistic) abductive explanation
of (or possible world
for) G in T is a triple , where A is the set of facts in P, Δ is the abductive explanation (i.e., the set of ground literals abduced), and is the set of instances of (probabilistic) integrity constraints in I, involved in an abductive proof of G in T. A world that violates a probabilistic integrity constraint is not impossible, just differently probable. Thus, all different (minimal) abductive explanations must be obtained to identify the most likely one, and whenever the abductive procedure has a choice, it must explore the worlds associated to all different options. Given a PEALP
, the
most likely explanation for a goal
G among all possible consistent worlds associated to
G in
T,
, can be selected:
4.1.8. Cues for Further Cooperations
Interesting directions for combining different inference strategies are still to be thoroughly investigated. Here, we will envisage and propose some.
Analogy is still an open research direction, and thus its combination with other strategies is largely to be explored yet. As observed in the ITL, it is strictly connected to abstraction and deduction, since these two strategies are needed to go from a specific domain to its abstract structure and then from the latter to a new specific domain. It also has strict connections to abduction, that can be used to guess information in the target domain that is not observed but is analogous to information available in the source domain.
Uncertain reasoning is perhaps the most combinable strategy, allowing to add flexibiity to all the others. Especially promising to investigate are ways for combining it with argumentation (to determine how reliable each consistent setting is and rank different settings) and to induction (to assign a degree of reliability to the learned knowledge).
4.2. The GEAR MultiStrategy Reasoning Engine
GEAR (acronym for ‘General Engine for Automated Reasoning’ ) is an inference engine written in Prolog language, aimed at implementing the vision of LP-based MultiStrategy Reasoning envisioned and proposed in this paper. The current prototype brings to cooperation most of the strategies described in
Section 4.1. Knowledge bases handled by GEAR may include various kinds of knowledge items, including Facts, Rules, Integrity Constraints, Abstraction Operators, and Argumentative relationships. Uncertainty is handled using values in
, inspired by mathematical probability theory. Still, GEAR may adopt the more intuitive handling of uncertainty as proposed for MYCIN. In the following, we will briefly describe the main features of the formalism it uses.
The main components of a KB are facts and rules. Facts are formalized as
while rules are formalized as
where
I is the unique identifier of the fact or rule, and
is the certainty value (1 meaning ‘absolutely’ true and 0 meaning ‘absolutely’ false).
F is an atom, while
H and
B are the rule’s head and body, respectively, and
P is its priority (a number used to determine which rule should be executed first in case of conflicts).
B is a logistic expression built on the following operators:
- and([,…,])
representing the conjunction (AND) of the ’s;
- or([,…,])
representing the disjunction (OR) of the ’s;
- no(C)
representing a ‘probabilistic’ negation (NOT) of C;
- not_exists(C)
representing an ‘existential’ negation of C;
where the C’s are atoms or nested operator applications, to express complex conditions. H is one of the following:
- C
an atom;
- and([,…,])
representing the conjunction (AND) of the atoms ;
- or([,…,])
representing the disjunction (OR) of the atoms ;
- no(C)
representing a ‘probabilistic’ negation (NOT) of the atom C.
Abducibles are formalized as
where
P is the predicate name and
N is its arity, while integrity constraints for abduction and argumentation are formalized as
where
I is the unique identifier of the constraint,
C is its certainty value, and
O is one of the following:
- nand([,…,])
at least one among literals ,…, must be false (the classical ICs considered in ALP);
- xor([,…,])
exactly one among literals ,…, must be true;
- or([,…,])
at least one among literals ,…, must be true;
- if([,…,], [,…,])
if all literals ,…, are true, then all literals ,…, must also be true (modus ponens); alternatively, if all literals ,…, are false, then all literals ,…, must also be false (modus tollens);
- iff([,…,], [,…,])
either all literals ,…, and ,…, are true, or all literals ,…, and ,…, are false;
- and([,…,])
all literals ,…, must be true;
- nor([,…,])
all literals ,…, must be false.
Abstraction operators are formalized as
where
I is the unique identifier of the operator, and
A is the abstracted set of atoms that replaces the ground set of atoms
G whenever it is found in an observation.
Identifiers of knowledge items are in either of the following forms:
- I
a general unique identifier for the item in the overall KB;
-
[M,I]
I is the unique identifier of the item within knowledge module M.
Finally, argumentation works on the following predicate:
where
I,
and
are fact identifiers,
is the strength of the argument and
expresses the type (attack or support, based on the sign) and strength of the argumentative relationship.
Other predicates can be used to specify system settings (e.g., gear_flag allows to set flags that direct the system’s behavior), information related to user interaction (e.g., askable specifies information that can be asked to the user if missing in the KB), calls to pre-defined procedures (e.g., call may call Prolog to carry out some computations), and others, but they are beyond the scope of this paper.
For deduction, GEAR may work both forward (applying all the rules in the KB in order to derive all possible consequences of the initial set of facts) or backward (starting from a goal and focusing only on the deductive steps that are relevant to prove that goal). For abstraction, it works only in forward mode. For abduction, it works only in backward mode. For induction, GEAR exploits the mentioned ILP system InTheLEx: it is a supervised incremental learning system based on Datalog under Object Identity as a representation language. It can distinguish background knowledge, that is immutable, from the portion of the theory to be learned and refined. The typical information flow in InTheLEx is as follows. Every incoming example immediately undergoes abstraction, that eliminates uninteresting details according to the available operators. Then, the example is (deductively or abductively) checked against the current theory and the background knowledge. If the theory is incorrect, it must be refined, and thus the example is (abductively or deductively) saturated and the generalization or specialization operator is started to refine the theory, possibly using the abductive or deductive derivation whenever needed.
5. Conclusions
The symbolic/logic approach to AI, based on the First-Order Logic (FOL) setting, can handle relational representations of the data and reproduce high-level, conscious, human reasoning mechanisms. This allows the AI systems based on this approach to explain their behavior and decisions in human-understandable terms, which is fundamental when using AI in critical tasks of the real world, in order to enforce trustworthiness and support accountability. Much research has been carried out in the last decades on the definition and implementation of frameworks and operators that may simulate different inference strategies used by humans (deduction, abduction, abstraction, induction, etc.). Still, they were investigated separately or at most in small combinations. This paper claimed the need for an overall approach that merges all the single strategies, that we named MultiStrategy Reasoning. It identified Logic Programming as the most suitable setting to support this view, selected the most relevant and promising approaches for the single strategies developed in this setting, and described several kinds of combinations that can be merged into an overall approach, which is being implemented in the GEAR inference engine. The current version of GEAR is being used in various projects for providing explainable decision support in complex real-world tasks.
A systematic literature review was carried out and, to the best of our knowledge, this is the first proposal and attempt to define and implement an overall MultiStrategy Reasoning solution. While the set of strategies currently combined is wide and relevant, further work may expand it, improve their combination, and/or obtain more effective and efficient implementations thereof.