Next Article in Journal
Does Salience of Neighbor-Comparison Information Attract Attention and Conserve Energy? Eye-Tracking Experiment and Interview with Korean Local Apartment Residents
Previous Article in Journal
A Genealogical Analysis of Information and Technics
Previous Article in Special Issue
A Dialogue Concerning the Essence and Role of Information in the World System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Numerical Markov Logic Network: A Scalable Probabilistic Framework for Hybrid Knowledge Inference

School of Computer Science, Northwestern Polytechnical University, 127 West Youyi Road, Xi’an 710129, China
*
Author to whom correspondence should be addressed.
Information 2021, 12(3), 124; https://doi.org/10.3390/info12030124
Submission received: 26 January 2021 / Revised: 6 March 2021 / Accepted: 11 March 2021 / Published: 15 March 2021
(This article belongs to the Special Issue What Is Information? (2020))

Abstract

:
In recent years, the Markov Logic Network (MLN) has emerged as a powerful tool for knowledge-based inference due to its ability to combine first-order logic inference and probabilistic reasoning. Unfortunately, current MLN solutions cannot efficiently support knowledge inference involving arithmetic expressions, which is required to model the interaction between logic relations and numerical values in many real applications. In this paper, we propose a probabilistic inference framework, called the Numerical Markov Logic Network (NMLN), to enable efficient inference of hybrid knowledge involving both logic and arithmetic expressions. We first introduce the hybrid knowledge rules, then define an inference model, and finally, present a technique based on convex optimization for efficient inference. Built on decomposable exp-loss function, the proposed inference model can process hybrid knowledge rules more effectively and efficiently than the existing MLN approaches. Finally, we empirically evaluate the performance of the proposed approach on real data. Our experiments show that compared to the state-of-the-art MLN solution, it can achieve better prediction accuracy while significantly reducing inference time.

1. Introduction

In recent years, the Markov Logic Network (MLN) [1] has emerged as a powerful tool for knowledge-based inference due to its ability to combine first-order logic inference and probabilistic reasoning. It has been applied in a wide variety of applications, e.g., knowledge base construction [2,3,4,5] and entity resolution [6]. The state-of-the-art probabilistic knowledge-based systems (e.g., Tuffy [7], ProKB [8], and Deepdive [9]) tackle the problem of MLN inference in two steps, grounding and inference. The step of grounding constructs a Markov network by knowledge rules; it is followed by the step of inference, which searches for the Maximum A Posteriori (MAP) probability or marginal probability of the variables.
In many real scenarios, for instance the inference on phone performance as shown in Table 1, knowledge rules may involve both first-order logic and arithmetic expressions. However, the existing MLN inference techniques cannot effectively support these hybrid rules due to the following two new challenges:
  • Modeling the integration of logic formula and arithmetic expression. We note that the latest approach of Probabilistic Soft Logic (PSL) [10] enables MAP inference on continuous variables over a set of arithmetic rules such as “ r 2 : P e r f o r m a n c e ( p ) 0.2 ” by considering it as a constraint on prior probability. However, it can be observed that an arithmetic expression (e.g., F a s t C P U ( c ) 0.9 ) is not a predefined continuous logic variable; thus, it cannot be easily integrated into the objective function defined by PSL. Specifically, even though the arithmetic inequalities like “ F a s t C P U ( c ) 0.9 ” in r 3 can be regarded as a Boolean variable by PSL, computing the truth value of r 3 by the m a x function used in PSL would render its corresponding objective function non-convex. Since the inference of PSL was built on convex optimization, applying PSL inference on r 3 would lead to inaccurate results and convergence failure. Therefore, the existing MLN solutions cannot effectively support the integration of logic formula and arithmetic expression.
  • Scalability. Arithmetic expressions usually involve pair-wise numerical comparison. The existing MLN solutions would generate the combination of all the predicate variables in the grounding process. This results in the undesirable quadratic or even cubic explosion of grounded clauses, which can easily render the inference process unscalable. For instance, consider the rule r 4 in Table 1. The existing inference solutions would result in an n 2 size of clauses for n variables. It is worth pointing out that clause explosion would not only result in inference inefficiency, but also meaningless inference results. In the circumstance of clause explosion, the techniques based on Gibbs sampling [11,12] may fail because the sampler would be trapped in a local state. As shown in our experimental study, the predictions of PSL may become inaccurate because it fails to converge.
To address the aforementioned challenges, we propose a novel inference framework called the Numerical Markov Logic Network (NMLN). The framework defines the optimization objective of inference as a novel exp-loss function, which can seamlessly integrate logic and arithmetic expressions. We also present an inference approach of exp-loss function decomposition based on convex optimization and use the technique of ADMM (Alternating Direction Method of Multipliers) to parallelize the inference process for improved efficiency. The major contributions of this paper can be summarized as follows:
  • We propose a novel probabilistic framework for hybrid knowledge inference. We define the hybrid knowledge rules and present the optimization model.
  • We propose a scalable inference approach for the proposed framework based on the decomposition of the exp-loss function.
  • We present a parallel solution for hybrid knowledge inference based on convex optimization.
  • We empirically evaluate the performance of the proposed framework on real data. Our extensive experiments show that compared to the existing MLN techniques, the proposed approach can achieve better prediction accuracy while significantly reducing inference time.

2. Related Work

Probabilistic Programming Languages (PPLs) [13] seek to separate model specification from inference and learning algorithms, thus making it easy for end users to construct probabilistic models in a simple style. Recent PPL platforms, including PyMC3 [14], Edward [15], and Pyro [16], require that the user defines the model structure such as probabilistic graph models (i.e., represents a joint probability distribution for the problem in hand).
The Markov Logic Network (MLN) [1] was originally proposed for combining first-order logic inference and probabilistic reasoning. Based on the original model, several variants and significant improvements have been proposed. For example, Tuffy [7] was the first system that implemented MLN inference by RDBMS. ProKB [8] proposes a probabilistic knowledge base system allowing uncertain first-order relations and can dramatically reduce the grounding time cost in Tuffy. Deepdive [9] was also an improvement over Tuffy, which has been widely applied to different applications. It provides a powerful knowledge base construction tool and optimizes MLN inference by a combination of statistical inference and machine learning. Our previous work of POOLSIDE [17] proposed a ranking system for commercial products according to their attributes and user comments. Implemented using Deepdive, POOLSIDE provides a naive predefined function to specify the relations between attribute values. The recently proposed variant Quantified Markov Logic Networks (QMLNs) [18] extends classical MLN with a statical quantifiers * , which provides a kind of quantification describing for example most, few, or at least k thresholds. More recently, Flash [19] exploited MLN to express the Spatial Probabilistic Graphical Model (SPGM), which can perform SPGM predictions efficiently. The MLN has been widely applied to various areas, including activity recognition systems in smart homes [20], root cause analysis in IT infrastructure [21], and natural language understanding [22], to name a few. Note that these systems were all designed for inference on first-order logic rules, but they cannot effectively support the inference on hybrid knowledge rules.
The latest research is mainly focused on the applications. MLNClean [23] was proposed for data cleaning, which is able to clean both schema-level and instance-level errors. The authors of SMLN [24] proposed a framework with native support for spatial data. The paper [25] proposed R-KG, a robot intelligent service, to reason about knowledge based on a Markov logic network.
On the issue of probabilistic reasoning, the MLN mainly focuses on two aspects: inference optimization and model learning. The traditional MLN-based inference techniques suffer from the issue of scalability due to their dependence on the generative model, which embeds all the data and targets in a model. The lifted inference technique [26] was proposed to simplify the MLN network by exploiting symmetry in the model. The authors of [27] proposed a technique to enable large-scale parallel inference by making Gibbs sampling work on the divided networks. The authors of [28] also proposed a query-driven technique that can leverage the local network for query prediction. Moreover, in our previous work POOLSIDE [17], we also proposed an improved query-driven inference algorithm, which exploits the information in the known neighbors to predict the query node. Ground Network Sampling (GNS) [29] proposed in 2016 offers a new instantiation perspective, which can ground from a set of sampled paths at inference time; thus, GNS offers better scalability compared to MLN. Model learning for the MLN includes parameter learning and structural learning. Parameter learning aims to find the optimal weights for a set of rules. This is usually achieved by optimizing different metrics of the objective function [30,31,32]. Structure learning instead aims to learn both logic formulas and their weights, which use the top-down [33] or bottom-up [34] search strategy to find formulas. The authors of [35] proposed a functional-gradient boosting algorithm that learns parameter and structure simultaneously. Since feature representation using neural networks has received much attention from researchers in various domains, neural Markov logic networks [36] also propose to learn the implicit representation of rules using neural networks instead of the explicit rules specified by humans.
To represent fuzzy logic, the MLN models have been extended from the binary field to the continuous field. The hybrid MLN [37] defines and reasons about the soft equality and inequality constraints for first-order relations. Probabilistic Soft Logic (PSL) [10] extends binary variables in the MLN into the continuous range 0 , 1 . PSL uses Lukasiewicz logic [38] to compute the truth values of logic clauses. Moreover, PSL allows users to define arithmetic rules, which can be interpreted as constraints on the variables, and transforms the MAP inference into a convex optimization problem. With the help of ADMM [39], the inference can be effectively parallelized and scaled up well to the data size.
However, PSL cannot effectively support the inference on hybrid knowledge rules; the proposed inference techniques thus cannot address the clause explosion issue.

3. Hybrid Knowledge Rules

The first-order relation consists of a predicate and several predicate variables, e.g., “ r e l a t i o n ( y 1 , y 2 ) ”, where the “relation” is called a predicate, which represents the relationship between variables, while y 1 and y 2 are called predicate variables. If we replace the predicate variables of a relation with the instance data, the relation can be considered as grounded. In our inference system, each grounded relation is regarded as an inference variable or evidence, which has a truth value at 0 , 1 intervals, to indicate whether the relation is held (equal to one) or not (equal to zero).
A hybrid knowledge rule involves both arithmetic and logic expressions. Formally, we define a hybrid knowledge rule by extending the definition of the knowledge rule [10] as follows:
Definition 1.
Suppose that x denotes the set of first-order relation variables and ( x ) denotes a linear function, which consists of variables in x . A hybrid knowledge rule, r, can be represented by a disjunction form of:
t 1 t 2 t n ,
where t i denotes a term, which should be one of the following three types:
  • (1) t i is a first-order relation x or its negation ¬ x , where x x ;
  • (2) t i is a logic expression, and x i denotes its variables, where x i x ;
  • (3) t i is a linear inequality in the form of x i o r 0 , where x i x .

4. Inference Framework

To introduce our inference framework, we first define the knowledge inference problem as follows:
Definition 2.
Suppose that r denotes the set of knowledge rules, x denotes a set of variables (including the set of inference variables V and the set of evidence Λ), and Φ j denotes a function defined over variables x , which represents the constraint based on the rule r j r . The knowledge inference problem is to find a solution V for the variables, such that:
V = a r g m i n V 0 , 1 n r j r Φ j ( x )
In order to define Φ j , we use Lukasiewicz logic [38], which extends binary variables to the continuous field 0 , 1 , to represent the logic formula. Lukasiewicz logic transforms a logic operator in the following manner:
x 1 x 2 m a x x 1 + x 2 1 , 0
x 1 x 2 m i n x 1 + x 2 , 1
¬ x 1 x
Note that the latest approach of PSL can handle the clauses containing only logic formulas. Based on Lukasiewicz logic, PSL transforms a logic formula into a linear inequality ( x ) 0 , where:
( x ) = x i x β i x i + b
is a linear function, which defines the distance of a constraint from being satisfied. Given a logic formula (rule) r in disjunctive form, let I x and I + x denote the set of variables with and without the negation prefix “¬”, respectively. Formally, the linear function ( x ) can be represented by:
( x ) = 1 x i I + x i x i I ( 1 x i ) .
Based on the transformation, it then defines a Hinge-Loss Markov Random Field (HF-MRF), which extends the MLN to the continuous field. The loss function for each clause can be formally represented by:
ϕ x = m a x x , 0 p ,
where x denotes the vector of variables, p denotes a user-defined parameter, and x denotes the linear function, as shown in Equation (7).
Unfortunately, the loss function as defined in Equation (8) cannot handle a hybrid knowledge rule involving both a logic formula and arithmetic inequalities. It can be observed that directly modeling the inference of hybrid knowledge rules by Equation (8) would render its corresponding objective function non-convex.
To integrate all terms in a hybrid rule into a function, we consider the truth value of each arithmetic expression (inequalities) as a continuous logic variable in the interval of 0 , 1 , which is consistent with its semantic and logic propositions. Formally, we define the truth value for a linear inequality, 0 , as follows:
T = m i n 1 s u p , 1 ,
s u p ( ) = x i I + β x i + c ,
where s u p ( ) denotes the sum of all positive variables’ coefficients β x i and constant c. Note that the linear inequality of 0 can be equivalently transformed into 0 . Figure 1a demonstrates the functional relation between a linear function value and its truth value. As shown in the figure, with a linear inequality being normalized by its supremum, its truth value is equal to the maximal value of one when the inequality is satisfied, and it decreases to zero as the violation reaches the maximum level.
It is noteworthy that the truth value as defined in Equation (9) is consistent with the PSL transformation with regard to Equations (3) and (4). For the negation operator, we define:
T ¬ 0 = T 0 ,
Our inference framework then defines a linear function for a hybrid rule as follows:
( x ) = 1 x i I + x i x i I ( 1 x i ) i 1 i s u p i ,
where denotes the set of linear inequalities in the rule.
Note that the hybrid rules can be directly converted to the PSL loss function formulated in Equation (8), by replacing the linear functions in Equation (7) with Equation (12), such that hybrid rules’ inference can be solved by PSL, as we did in our empirical evaluation study. However, such an inference approach causes the clause explosion problem, as we discussed in the introduction. To solve the problem of clause explosion, we instead define an exp-loss function to measure the violation of a rule as follows.
Definition 3.
Let x denote the linear function defined in Equation (12), and α > 1 denote the base argument, which can be e or other constants. The exp-loss function is defined by:
α x 1 .
Lemma 1.
α x 1 is convex when α > 1 .
Proof. 
A function twice differentiable is convex iff the Hessian matrix is positive semi-definite. Take:
f ( x ) = α l ( x ) 1 = α i = 1 n β i x i + d 1 .
Computing the Hessian:
2 f ( x ) x i x j = β i β j α i = 1 n β i x i + d 1 ,
we see it is actually positive semi-definite, because for any λ i ,
i , j β i β j α i = 1 n β i x i + d 1 λ i λ j = ( i λ i α i = 1 n β i x i + d 1 ) 2 0 .
It is worth pointing out that we chose the exp-loss function to measure the violation of a rule due to following reasons:
  • The exp-loss is a natural extension to the hinge-loss function defined in Equation (8). The exponential power x guarantees a greater loss when a violation of the rule occurs. On the other hand, it can be observed that even though the function is not zero when the rule is satisfied (e.g., if α = e , the loss is e 1 if the rule is satisfied), the value of the exp-loss and its gradient becomes very small in the negative interval, which can be considered as a soft constraint of the m a x ( ) function.
  • As shown in the following section, the exp-loss function enables the scalable inference based on function decomposition. It can effectively address the challenge of the explosion of grounded clauses.
Let V denote the set of unknown variables for inference. Given a set of hybrid knowledge rules r and the weight w j with respect to r j r , the inference target is to minimize the sum of all weighted loss functions generated by all clauses as follows:
a r g m i n V 0 , 1 n j = 1 | r | w j i = 1 g r j α j x i 1 .
where g ( ) denotes the operation of grounding and x i denotes the set of variables in the i-th clause. According to Equation (15), each rule r j has the size of g r j clauses in its loss, which are generated by replacing the predicate variables in first-order relations with the possible instances in the data. This process is known as grounding in the existing MLN solution, which is usually implemented by a series of database join operations. Our framework performs grounding for inference optimization, while the MLN performs grounding to generate a factor graph.

5. Inference Optimization

5.1. Decomposition of Exponential Loss Function

In the scenario of hybrid knowledge inference, grounding the rules, which involves numerical value comparison between two predicate variables, such as the term “ P e r f o r m a n c e ( p 1 ) P e r f o r m a n c e ( p 2 ) ”, could easily result in clause explosion. To address this issue, our solution first decomposes the rule relations into groups and then grounds them separately. We illustrate the replacement process by a simple example as follows:
Example 1.
Given the rule of F r e q u e n c y ( y 1 ) F r e q u e n c y ( y 2 ) F a s t C P U ( y 1 ) F a s t C P U ( y 2 ) , its loss function (according to Equations (12) and (13)) can be represented by:
α F r ( y 2 ) F r ( y 1 ) + F c ( y 1 ) F c ( y 2 ) 2 ,
where F r and F c denote the predicates frequency and FastCPU, respectively. The total loss of the rule is estimated by the sum of all the grounded loss functions as follows:
i = 1 n j = 1 n α F r ( y j ) F r ( y i ) + F c ( y i ) F c ( y j ) 2 .
It is noteworthy that the total sum of the loss can be decomposed into:
i = 1 n α F r ( y i ) + F c ( y i ) j = 1 n α F r ( y j ) F c ( y j ) α 2 .
Suppose that y i has n instances. Compared to the original form in Equation (16), which requires a computational time of O ( n 2 ) , computing the loss function in Equation (18) only requires O 2 n .
In the general case, where the hybrid rules may contain facts and share common variables, the decomposition may be more complicated. Formally, we define the irreducible groups as follows.
Definition 4.
Suppose that a rule r contains the relations R = R 1 , , R m and y i = y 1 i , , y k i denotes the variables in R i . We call R i irreducible if R j R i , y i y j ; otherwise, there exists a relation R j with y i y j , and R i can be reduced to R j . An irreducible group consists of an irreducible relation R i and all the relations reducible to R i , and we denote it by R i ^ . The set of predicate variables shared by two or more irreducible relations is called a joint variable set, denoted by S.
For the decomposition of the exp-loss function, we first split a hybrid rule into multiple irreducible groups. We sketch the procedure for identifying all the irreducible groups and their joint variables in Algorithm 1. For each relation R i , we can find its irreducible group R i ^ if the relation exists in the groups. Note that a relation might be reduced to more than one irreducible group. However, it can only be assigned to one group. The algorithm simply assigns it to the first irreducible group it meets. An illustrative example of how to split a set of relations into irreducible groups is also shown in Figure 2. In the example, the relations R ( y 1 ) and R ( y 3 ) can be reduced to the relations R ( y 1 , y 2 ) and R ( y 2 , y 3 ) , respectively. The splitting operation results in totally three irreducible groups. It can be observed that the relations R ( y 1 , y 2 ) and R ( y 2 , y 3 ) share the variable y 2 , and R ( y 4 ) is disjoint to both R ( y 1 , y 2 ) and R ( y 2 , y 3 ) .
Algorithm 1: Find irreducible groups and joint variables.
Input: relations set R = R 1 , , R m and predicate variable set
                      y i = y 1 i , , y k i with respect to R i
Output: irreducible groups R ^ and joint variable set S .
R ^ = R 1 , , R m and S = ⌀;
for R i in R do
 for R j in R R i do
  if y i y j then
   find R ^ i R ^ where R i R ^ i ;
    find R ^ j R ^ where R j R ^ j ;
    merge the set R ^ i and R ^ j ;
   end (
  end (
end (
for R ^ i in R ^ do
  for R ^ j in R ^ R i ^ do
   if S i j = y i y j then
    add S i j to S ;
   end (
  end (
end (
Now, we are ready to describe how to leverage irreducible groups R ^ for decomposition optimization. In the proposed inference framework, the first-order relation is represented by a linear function in the exponential term in a loss function. Suppose that R ^ has k irreducible groups, which is denoted by R j ^ . Then, the linear function ( x ) can be split into k + 1 parts { 1 , , k , c } , where j is the variables and their coefficients corresponding to the relations in R ^ j , and c is the constant part. Therefore, the loss function can be reformulated as follows:
l o s s r = w i = 1 g r α x i 1 = w i = 1 g r j = 1 k α j ( x i j ) · α c ,
where x i j denotes the variables respecting the i-th grounded relations in R ^ j . To decompose the loss function, we first split all the clauses g ( r ) that share the same grounded relation in the set of joint variables S into partitions. In each partition, the grounding clause is the combination of all variables in the irreducible groups. As a result, the sum of clauses in a partition can be represented by the product of all the sums in each group. Without loss of generality, we assume all irreducible relations have n instances, and the set of joint variables S has θ instances. The decomposed loss function can be stated as follows:
l o s s r = w s = 1 θ j = 1 k i = 1 n α j ( x s i j ) · α c .
Now, we estimate the complexity of loss computation. The original loss computes all combinations of clauses of irreducible relations, which is O ( θ n k ) . As shown in Equation (20), our proposed technique of function decomposition can reduce the computational complexity from O ( θ n k ) to O ( θ n k ) .
It is noteworthy that the grouped loss function is just a deformation of the original loss function. Each rule in the form of Equation (19) can be converted to Equation (20). According to Equations (12), (13) and (15), the expansion of the loss function is the sum of exponential functions, and all exponential functions have a linear exponent. As a result, the loss function is convex. Our proposed method can effectively find the global optimal.
Now, we provide the entire process of hybrid knowledge rule inference in Algorithm 2. The algorithm first generates variables that represent the first-order relations in the dataset and then grounds the clauses for each rule r j in the form of decomposed exp-loss functions. Finally, we use the ADMM algorithm introduced in the following subsection to optimize the sum of losses for all knowledge rules.
Algorithm 2: Inference of hybrid knowledge rules.
 Input: set of hybrid knowledge rules r , relation set R = R 1 , , R m , predicate
    variable set y i = y 1 i , , y k i with respect to R i , and the instances of
    dataset D.
 Output: Solution V 0 , 1 n for the inference variables V.
 Generate the set of variables x according to R and D;
 for r j r do
(    Find irreducible groups and joint variables for r j by Algorithm 1;
   Generate the l o s s ( r j ) for r j (grounding) in the form of Equation (20);
 end
 Find the optimal solution V = a r g m i n V 0 , 1 n r j r l o s s ( r j )
We provide an example of the comparison between our framework and PSL in the Figure 3. In this example, we selected hybrid knowledge used in our experimental study for entity linking, to demonstrate the loss functions in three scenarios: the original PSL hinge-loss and the exp-loss with and without loss decomposition. As shown in the figure, the rule consists of two first-order relations. Each relation in the dataset has three instances. Since PSL does not support hybrid rule inference, we show the loss function when the linear inequality is directly regarded as a logical variable. It is easy to observe that the original PSL loss function cannot guarantee convexity. The decomposed exp-loss function reduces the number of clauses from 3 2 to 2 × 3 .

5.2. Parallel Optimization

Our decomposition-based method can effectively compute the loss function proposed in Equation (19). In this subsection, we demonstrate how to implement our method in the optimization process. In order to achieve efficient inference, we use the approach of parallel optimization based on the ADMM algorithm. ADMM is a distributed optimization technique that focuses on solving large-scale convex optimization problems. It is generally applicable to the loss function in the form of i = 1 N f i x , where each term f i x is a convex function. The main idea of ADMM is to replace the variables in each term with independent local variables and add the constraints on these variables by the augmented Lagrange method. ADMM iteratively optimizes the local variables and updates the consensus global variables until they converge. More details about ADMM optimization is shown in [39].
The total loss function of the inference is the sum of all clauses in the form of decomposed exp-loss functions. For simplicity, we define:
P ( x ) = w j = 1 k i = 1 n α j ( x i j ) · α c
as a term of the loss function, such that the total loss is the sum of all terms, which can be formulated as follows:
l o s s ( x ) = h = 1 H P h ( x h ) .
where H is the size of all terms in the loss function. By reformulating the optimization problem with local variables and related constraints by the augmented Lagrange function, ADMM transforms the MAP problem into:
l o s s z , γ , x = h = 1 H P h ( z h ) + h = 1 H γ h z h x h + h = 1 H ρ 2 z h x h 2 2 ,
where z h denotes a copy of the variables in x h , x h denotes the variables in x that correspond to z h , γ denotes the vector of Lagrange multipliers, and ρ > 0 denotes the step-size parameter. Each set of local variables in z is independent of the others, such that for any two sets of local variables z h and z h , z h z h = .
The optimization process iteratively updates the following three blocks until it converges:
γ h t γ h t 1 + ρ z h t 1 x h t 1 , h = 1 , , H
z t a r g m i n z l o s s z , γ t , x t 1 ,
x t a r g m i n x l o s s z t , γ t , x .
The optimization process converges if the local variables converge to the global variables and the global variables converge at the last iteration. Specifically, the two convergence conditions can be represented by:
r t ¯ 2 = h = 1 H z h t x h t 2 2 1 2 ϵ p r i ,
and:
s t ¯ 2 = ρ i = 1 m K i x i t x i t 1 2 2 1 2 ϵ d u a l ,
where m denotes the total number of variables in x and r t ¯ 2 and s t ¯ 2 denote primal residual and dual residual, respectively. ϵ p r i and ϵ d u a l are feasibility tolerances for the primal and dual feasibility conditions, and K i is the number of local variables for a variable x i .
Our optimization takes the same steps as shown in Equations (24) and (26). For Equation (25), we follow the traditional ADMM practice to apply the parallel optimization to each clause. However, for each clause, our method does not find the minimal result at each iteration. Instead, it iteratively updates each local variable to its minimal value while fixing the values for other variables. The gradient l o s s z corresponds to the vector composed by the first derivative of each element. Since the local variables are independent for each term:
P ( z , γ , x ) = w j = 1 k i = 1 n α j ( z i j ) · α c + γ z x + ρ 2 z x 2 2 ,
such that we only demonstrate the gradient for a single term. Let z i j denote a local variable, which belongs to the irreducible group R ^ j . The gradient of z i j can be represented by:
P ( z , γ , x ) z i j = ln α · β · α j ( z i j ) w j = 1 k i = 1 n α j ( z i j ) · α c + γ i j + ρ z i j x i j , j j
where z i j denotes the set of variables that are in the same group with z i j . For the computation of the first term j = 1 k i = 1 n α j ( z i j ) · α c , it is obvious that each z i j in the same group R j shares the same product from k 1 groups. Let f j denote i = 1 n α j ( z i j ) , such that:
P ( z ) = w j = 1 k f j · α c = w j = 1 k i = 1 n α j ( z i j ) · α c .
In order to compute the gradient, we first compute the product P ( z ) and then compute the gradient for each variable as follows:
l o s s z i j = ln α · β · α j ( z i j ) · P ( z ) · 1 f j + ρ z i j x i j , j j .
Equation (32) significantly reduces the computation in the optimization process, by sharing the product P ( z ) for every variable.

6. Experimental Study

In this section, we empirically evaluate the performance of the proposed solution by a comparative study. We compare the NMLN to PSL, which is the state-of-the-art technique for soft logic inference. PSL has been empirically shown to have the best performance on MLN inference among the existing solutions. More importantly, to the best of our knowledge, it is the only technique that is able to infer hybrid knowledge rules, even though it cannot solve the issue of clause explosion. To enable PSL inferences on hybrid knowledge rules, we replace the linear functions in Equation (7) by Equation (12), such that the rule can be converted to a linear function, which can then be solved by PSL inference. It is noteworthy that other Gibbs sampling-based methods such as Deepdive fail on hybrid rules due to the existence of extremely high-probability states. The sampler would be trapped in a local state, which requires an unacceptable time to sample the correct distribution. We evaluated the performance of different techniques on two real applications: mobile phone ranking and entity linking. We show the statistics of the datasets in Table 2.

6.1. Comparative Study

In the comparative study, we set the number of parallel threads at 6, ϵ p r i = 10 3 , and ϵ d u a l = 10 5 in all experiments. For the NMLN, we set the base of exponential function α = e and step size ρ = 0.5 as the default.
Mobile phone ranking: For this experiment, we needed to rank various mobile phones by performance for users. Since the performance evaluation of mobile phones is to some extent a subjective problem, it is difficult to obtain the ground truth. Therefore, we extracted the phone’s ranking list from a well-known benchmark website. Available online: (https://benchmarks.ul.com/ accessed on 3 June 2018), which also lists the specific details of phones such as the CPU, memory, or size. We considered the positions of phones in the ranking list as the annotations to evaluate the inference results. The test dataset contained 899 smart phones. We define the average distance to evaluate the quality of the inference results:
D r , r * = 1 1 N 2 i = 1 N r i r i * ,
where r denotes the results ranked by inference and r * denotes the annotations in the ranking list. This function takes the maximal value of one when the inference results are exactly the same as the annotations. We defined six rules, which were presented in Appendix A, for performance inference. The detailed results are presented in Table 3. They evidently show that the NMLN achieves similar performance to PSL on prediction accuracy, while it requires significantly less inference time. The two methods have a similar accuracy due the rules used in this task being simple; thus, PSL can also give a fine prediction.
Entity linking: Our empirical study was conducted on three real benchmark datasets, whose details are described as follows.
  • AIDA-CONLL: This dataset was constructed based on the source of CONLL2003 [40], which contains 1393 news articles. It consists of proper noun annotations, which indicate its corresponding entities in YAGO2 [41]. In our experiments, we evaluated all approaches on its testB dataset.
  • Wiki-Sports: This dataset contains the articles on the topic of sports extracted under the feature article page in Wikipedia. The mentions in the dataset are extracted from the anchor texts in the articles and annotated by the entities to which they link. We used the disambiguation page of Wikipedia to generate the candidates for each mention. In order to avoid the leakage of label information, we eliminated the corresponding Wiki pages during the extracting link text for the entities.
  • Wiki-FourDomains: This dataset contains the articles extracted on four topics, which include films, music, novels, and television episodes, on Wikipedia. We applied the same process on the dataset as Wiki-Sports to generate mention-annotations and candidate entities.
In the experiment, we linked a mention in the articles to the YAGO2 entity with the highest inference probability. We first extracted the following six features from the YAGO2 knowledge-base: prior, semantic similarity, coherence, syntax similarity, edit distance, and word2vector similarity. Note that we also eliminated the mention-entity pair candidates, which are obviously not matched, from the inference process. Otherwise, the large number of candidates may cause PSL memory to overflow.
To show the inference capability on a set of decision rules, we made use of the annotations from 300 documents to train a random forest. For each leaf node in the forest, we generated a decision rule, which was formulated as the logic implication of “ X Y ”, where Y is the leaf node and X is the logic conjunction of all decision nodes in the path from the root to Y. We retained in total 38 rules whose impurity (measured by Gini) was less than 0.025. In addition, we added the rule of l i n k ( m , e ) 0.2 for every target pair such that the candidates unconstrained by any rule can take a small value. The rules were presented in Appendix B.
The detailed evaluation results are presented in Table 4. It can be observed that the NMLN performs considerably better than PSL on prediction accuracy. The experiment showed that PSL cannot converge to consensus values; thus, it cannot perform well. On inference efficiency, the NMLN also performed considerably better than PSL: the NMLN ends within half an hour, while PSL takes more than 14 h.
Now, we provide an analysis of the experimental results. As mentioned in Equation (29), for each term P ( x ) in the loss function, ADMM transforms the term into P ( z , γ , x ) , by replacing x with local variables z and adds constraints to ensure the local variables converge to x . Assume that P ( x ) contains n variables and k irreducible groups. The size of the local variables in P ( z , γ , x ) is n × k . However, the original form in PSL makes the ADMM method construct n k local variables, which means that each global variable x i has k copies in the NMLN, but n k 1 copies in PSL. As a result, although the solution found by PSL is the global optimal for the dual problem in ADMM, its local variables actually do not converge to x ; thus, the NMLN outperformed PSL on all datasets.
Evaluation of convergence:
In this experiment, we compared the convergence of the two methods on the task of mobile phone ranking. The evaluation results are presented in Figure 4. According to Equations (27) and (28), the optimization process converges if primal residual r t ¯ 2 and dual residual s t ¯ 2 are approximately close to zero. It can be observed that the NMLN is able to converge quickly and stably for both conditions. In Figure 4a, the primal residual of PSL stops decreasing at the value of 64, such that the method cannot converge for both conditions.

6.2. Scalability

To evaluate the scalability for the NMLN, we generated synthetic data with various sizes for phone ranking inference. The detailed evaluation results are presented in Figure 5. Since the rules contain two kinds of unknown variables F a s t C P U ( c ) and P e r f o r m a n c e ( p ) , we generated the relations at a ratio of 0.2:0.8. Our experiments show that PSL consumes a large amount of memory. The performance of PSL falls dramatically due to memory overflow when the size of the variable exceeds 4500. Compared to PSL, the NMLN scales much better when the data size increases. As shown in Figure 5a, all inference tasks are finished within two seconds by the NMLN. The NMLN spends most of the time in pre-processing when it runs on small data, such that the runtime does not increase significantly in (a). When the data size is large (more than 5000), we also provide the log scale performance in Figure 5b. It can be observed that the runtime scales in an approximately linear fashion. In the figure, the speed has a slight slow down when the data size is greater than 10,000, which is caused by the sequential operations in the pre-processing phase.
We also present the number of iterations required by both techniques to converge in Figure 5c,d. It can be observed that the NMLN takes 36 iterations on all the tasks with the number of variables varying from 100 to 10 M. The reason is that the average size of the local variables with respect to the same global variables is always a fixed number in the NMLN. In PSL, clause explosion causes a single variable taking more local copies when the size increases.

6.3. Sensitivity Evaluation

In this subsection, we evaluate the performance sensitivity of the NMLN w.r.t. the number of parallel threads, the base of exponential function α , and the step size ρ . In our empirical study, except the evaluated parameter, all the other parameters were set to the same values. We ran the evaluation of parallel threads on the synthetic data for scalability evaluation, since the size of the variables has a significant impact on the parallel methods. For evaluations on the parameters of α and ρ , we only present the evaluation results on the original mobile phone rank data due to the reason that different sizes of variables seem to have no effect on the results.
The evaluation results on the number of parallel threads are presented in Figure 6, in which the x-axis denotes the number of variables and the y-axis denotes the percentage of runtime compared with the runtime of non-paralleled method ( T h r e a d s = 1 ) spent on the same data as the baseline. It can be observed that the runtime of paralleled inference decreases significantly when the size of the variables is large. Specifically, when the number of threads is set to six, the runtime of inference decreases to 23% and 27%, respectively, on 1000 K and 100 K variables. However, if the variables are smaller than 1K, the runtime decreases only marginally with the increase of the threads. This should not be surprising because small tasks are not suitable for parallelization.
The evaluation results on the base of exponential function α are presented in Figure 7, in which the parameter varies from e to 100. It can be observed that the performance of the NMLN fluctuates only marginally within a long range of α for both the primal residual and dual residual. Therefore, the NMLN inference is stable to take different base values.
The evaluation results on the step size ρ are presented in Figure 8, in which the parameter varies from 0.1 to 1.0 . It can be observed that the larger value of ρ leads to a faster convergence speed on the primal residual and a slower convergence speed on the dual residual. Thus, the step size ρ should be set to a proper value ( 0.5 ) to balance the two conditions.

7. Conclusions

Current MLN solutions cannot support knowledge inference involving arithmetic expressions. In this paper, we propose the Numerical Markov Logic Network (NMLN) to enable effective and efficient inference of hybrid knowledge involving both logic and arithmetic expressions. We define the exp-loss function as the metric to integrate arithmetic inequalities and logic formulas. By exploiting the decomposition of exp-loss functions, our method reduces the computational complexity from O θ n k to O θ n k , such that the inference has a good scalability for the issue of clause explosion. We also present a parallel solution for hybrid knowledge inference based on convex optimization. The proposed approach can achieve better prediction accuracy while significantly reducing the inference time.

Author Contributions

Conceptualization, P.Z. and Q.C.; methodology, P.Z.; software, P.Z.; validation, P.Z., Q.C. and B.H.; formal analysis, Q.C.; investigation, Q.C.; resources, Z.L.; data curation, B.H.; writing—original draft preparation, P.Z.; writing—review and editing, Q.C. and M.A.; visualization, P.Z.; supervision, Q.C.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the National Key Research and Development Program of China (2018YFB1003400), National Natural Science Foundation of China (61732014, 61672432), Fundamental Research Funds for the Central Universities (Program No. 3102019DX1004) and Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2018JM6086).

Data Availability Statement

Mobile phone data at https://benchmarks.ul.com/ (accessed on 3 June 2018), AIDA-CONLL data at https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/ (accessed on 30 November 2018), Wiki-Sport and Wiki-FourDomains data at http://en.wikipedia.org/wiki/Wikipedia:Featured_articles (accessed on 10 January 2021).

Conflicts of Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Numerical Markov Logic Network: A Scalable Probabilistic Framework for Hybrid Knowledge Inference”, which has been approved by all authors. I would like to declare on behalf of my co-authors that the work described is original research that has not been submitted or published in other journals previously, and not under consideration for publication elsewhere, in whole or in part.

Appendix A. Knowledge Rules in the Phone Dataset

Table A1. Knowledge rules in the phone dataset.
Table A1. Knowledge rules in the phone dataset.
Knowledge RulesWeight
r 1 c o r e ( c 1 ) c o r e ( c 2 ) f a s t c p u ( c 1 ) f a s t c p u ( c 2 ) 2
r 2 f r e q u e n c y ( c 1 ) f r e q u e n c y ( c 2 ) f a s t c p u ( c 1 ) f a s t c p u ( c 2 ) 4
r 3 s c c o r e ( c 1 ) s c c o r e ( c 2 ) f a s t c p u ( c 1 ) f a s t c p u ( c 2 ) 1
r 4 s e c f r e q u e n c y ( c 1 ) s e c f r e q u e n c y ( c 2 ) f a s t c p u ( c 1 ) f a s t c p u ( c 2 ) 2
r 5 f a s t c p u ( c 1 ) f a s t c p u ( c 2 ) & h a s c p u ( p 1 , c 1 ) & h a s c p u ( p 2 , c 2 ) p e r f o r m a n c e ( p 1 ) p e r f o r m a n c e ( p 2 ) 1
r 6 m e m o r y ( p 1 ) m e m o r y ( p 2 ) p e r f o r m a n c e ( p 1 ) p e r f o r m a n c e ( p 2 ) 1

Appendix B. Knowledge Rules in the Aida Dataset

We show the meaning of all relations in the rules as follows:
  • prior ( m , e ) : the prior distribution computed by the number of entities linking to e.
  • anchormisim ( m , e ) : the similarity measured by mutual information according to the “hasWikipediaAnchorText” file in YAGO2.
  • anchormisim ( m , e ) : the similarity measured by word2vector according to the “hasWikipediaAnchorText” file in YAGO2.
  • catemisim ( m , e ) : the similarity measured by mutual information according to the “hasWikipediaCategory” file in YAGO2.
  • catewvsim ( m , e ) : the similarity measured by word2vector according to the “hasWikipediaCategory” file in YAGO2.
  • distance ( m , e ) : the edit distance-based similarity.
  • synsim ( m , e ) : the syntactical similarity computed by WordNet and YAGO2.
  • coherence ( m , e ) : we find the max candidate of ( m e n t i o n , e n t i t y ) pairs for one document, for which all entities in the set have the maximum word similarity.
Table A2. Knowledge rules in the Aida dataset.
Table A2. Knowledge rules in the Aida dataset.
Knowledge RulesWeight
r 1 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.008 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.033 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.11
r 2 d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.007 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.016 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.75
r 3 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.008 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.002 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.95
r 4 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.128 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.108 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.77
r 5 p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.178 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.95
r 6 c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.811 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.006 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.99
r 7 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.436 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.797 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.006 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.74
r 8 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.383 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.0 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.032 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 2.26
r 9 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.006 & a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.445 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.193 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.89
r 10 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.006 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.057 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.41 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.73
r 11 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.008 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.007 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.016 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.97
r 12 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.013 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.01 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.9
r 13 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.006 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.057 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.41 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.73
r 14 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.006 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.172 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.007 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.95
r 15 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.092 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.0 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.007 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.95
r 16 c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.811 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.173 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.21
r 17 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.006 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.189 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.82
r 18 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.131 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.354 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.84
r 19 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.809 & a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.131 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.094 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.04
r 20 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.008 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.984 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.003 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.58
r 21 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.01 & a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.008 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.172 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.003 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 5.0
r 22 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.185 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.061 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.007 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.59 :
r 23 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.09 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.095 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.389 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.124 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.69
r 24 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.09 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.272 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.007 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.3
r 25 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.093 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.298 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.077 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 3.29
r 26 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.008 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.0 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.163 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 3.28
r 27 a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.229 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.0 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.012 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.43
r 28 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.031 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.007 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 5.0
r 29 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.006 & a n c h o r w v s i m ( m , e 1 ) a n c h o r w v s i m ( m , e 2 ) 0.093 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.007 & s y n s i m ( m , e 1 ) s y n s i m ( m , e 2 ) 0.158 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 5.0
r 30 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) - 0.119 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.011 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.816 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.006 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.58
r 31 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.893 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.397 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.006 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 5.0
r 32 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.119 & c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.15 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.397 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.006 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.28
r 33 c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.796 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.573 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.003 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.15
r 34 c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.804 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.397 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.085 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.3
r 35 c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.17 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.807 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.006 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.033 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 1.33
r 36 c a t e m i s i m ( m , e 1 ) c a t e m i s i m ( m , e 2 ) 0.012 & c a t e w v s i m ( m , e 1 ) c a t e w v s i m ( m , e 2 ) 0.99 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.006 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.032 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 0.96
r 37 a n c h o r m i s i m ( m , e 1 ) a n c h o r m i s i m ( m , e 2 ) 0.362 & d i s t a n c e ( m , e 1 ) d i s t a n c e ( m , e 2 ) 0.006 & p r i o r ( m , e 1 ) p r i o r ( m , e 2 ) 0.162 l i n k ( m , e 1 ) l i n k ( m , e 2 ) 2.88
r 38 c o h e r e n c e ( m , e ) l i n k ( m , e ) 1
r 39 l i n k ( m , e ) 0.2 1

References

  1. Richardson, M.; Domingos, P.M. Markov logic networks. Mach. Learn. 2006, 62, 107–136. [Google Scholar] [CrossRef] [Green Version]
  2. Banerjee, O.; El Ghaoui, L.; d’Aspremont, A. Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data. J. Mach. Learn. Res. 2008, 9, 485–516. [Google Scholar]
  3. Dong, X.L.; Gabrilovich, E.; Heitz, G.; Horn, W.; Murphy, K.; Sun, S.; Zhang, W. From Data Fusion to Knowledge Fusion. PVLDB 2014, 7, 881–892. [Google Scholar] [CrossRef] [Green Version]
  4. Jiang, S.; Lowd, D.; Dou, D. Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic. In Proceedings of the 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, 10–13 December 2012; pp. 912–917. [Google Scholar] [CrossRef] [Green Version]
  5. Zhang, C.; Ré, C.; Cafarella, M.J.; Shin, J.; Wang, F.; Wu, S. DeepDive: Declarative knowledge base construction. Commun. ACM 2017, 60, 93–102. [Google Scholar] [CrossRef] [Green Version]
  6. Singla, P.; Domingos, P.M. Entity Resolution with Markov Logic. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, 18–22 December 2006; pp. 572–582. [Google Scholar] [CrossRef] [Green Version]
  7. Niu, F.; Ré, C.; Doan, A.; Shavlik, J.W. Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS. PVLDB 2011, 4, 373–384. [Google Scholar] [CrossRef]
  8. Chen, Y.; Wang, D.Z. Knowledge expansion over probabilistic knowledge bases. In Proceedings of the International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014; pp. 649–660. [Google Scholar] [CrossRef]
  9. Sa, C.D.; Ratner, A.; Ré, C.; Shin, J.; Wang, F.; Wu, S.; Zhang, C. Incremental knowledge base construction using DeepDive. VLDB J. 2017, 26, 81–105. [Google Scholar] [CrossRef] [Green Version]
  10. Bach, S.H.; Broecheler, M.; Huang, B.; Getoor, L. Hinge-Loss Markov Random Fields and Probabilistic Soft Logic. J. Mach. Learn. Res. 2017, 18, 109:1–109:67. [Google Scholar]
  11. Wick, M.L.; McCallum, A.; Miklau, G. Scalable Probabilistic Databases with Factor Graphs and MCMC. PVLDB 2010, 3, 794–804. [Google Scholar] [CrossRef]
  12. Zhang, C.; Ré, C. Towards High-throughput Gibbs Sampling at Scale: A Study Across Storage Managers. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, SIGMOD’13, New York, NY, USA, 22–27 June 2013; pp. 397–408. [Google Scholar] [CrossRef]
  13. Krapu, C.; Borsuk, M. Probabilistic programming: A review for environmental modellers. Environ. Model. Softw. 2019, 114, 40–48. [Google Scholar] [CrossRef]
  14. Salvatier, J.; Wiecki, T.V.; Fonnesbeck, C.; Elkan, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2016, 2. [Google Scholar] [CrossRef] [Green Version]
  15. Tran, D.; Kucukelbir, A.; Dieng, A.B.; Rudolph, M.R.; Liang, D.; Blei, D.M. Edward: A library for probabilistic modeling, inference, and criticism. arXiv 2016, arXiv:1610.09787. [Google Scholar]
  16. Bingham, E.; Chen, J.P.; Jankowiak, M.; Obermeyer, F.; Pradhan, N.; Karaletsos, T.; Singh, R.; Szerlip, P.A.; Horsfall, P.; Goodman, N.D. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res. 2019, 20, 28:1–28:6. [Google Scholar]
  17. Zhong, P.; Li, Z.; Chen, Q.; Wang, Y.; Wang, L.; Ahmed, M.H.M.; Fan, F. POOLSIDE: An Online Probabilistic Knowledge Base for Shopping Decision Support. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 6–10 November 2017; pp. 2559–2562. [Google Scholar] [CrossRef]
  18. Gutiérrez-Basulto, V.; Jung, J.C.; Kuzelka, O. Quantified Markov Logic Networks. In Proceedings of the Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018, Tempe, Arizona, 30 October–2 November 2018; pp. 602–612. [Google Scholar]
  19. Sabek, I.; Musleh, M.; Mokbel, M.F. Flash in Action: Scalable Spatial Data Analysis Using Markov Logic Networks. PVLDB 2019, 12, 1834–1837. [Google Scholar] [CrossRef]
  20. Gayathri, K.; Easwarakumar, K.; Elias, S. Probabilistic ontology based activity recognition in smart homes using Markov Logic Network. Knowl. Based Syst. 2017, 121, 173–184. [Google Scholar] [CrossRef]
  21. Schoenfisch, J.; Meilicke, C.; von Stülpnagel, J.; Ortmann, J.; Stuckenschmidt, H. Root cause analysis in IT infrastructures using ontologies and abduction in Markov Logic Networks. Inf. Syst. 2018, 74, 103–116. [Google Scholar] [CrossRef]
  22. Kennington, C.; Schlangen, D. Situated incremental natural language understanding using Markov Logic Networks. Comput. Speech Lang. 2014, 28, 240–255. [Google Scholar] [CrossRef]
  23. Ge, C.; Gao, Y.; Miao, X.; Yao, B.; Wang, H. A Hybrid Data Cleaning Framework Using Markov Logic Networks. IEEE Trans. Knowl. Data Eng. 2020, 1. [Google Scholar] [CrossRef]
  24. Sabek, I. Adopting Markov Logic Networks for Big Spatial Data and Applications. In Proceedings of the International Conference on Very Large Data Bases (VLDB), Los Angeles, CA, USA, 26–30 August 2019. [Google Scholar]
  25. Hao, W.; Menglin, J.; Guohui, T.; Qing, M.; Guoliang, L. R-KG: A Novel Method for Implementing a Robot Intelligent Service. AI 2020, 1, 117–140. [Google Scholar] [CrossRef] [Green Version]
  26. Sarkhel, S.; Venugopal, D.; Singla, P.; Gogate, V. Lifted MAP Inference for Markov Logic Networks. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, 22–25 April 2014; pp. 859–867. [Google Scholar]
  27. Beedkar, K.; Corro, L.D.; Gemulla, R. Fully Parallel Inference in Markov Logic Networks. In Proceedings of the Datenbanksysteme für Business, Technologie und Web (BTW), 15. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Magdeburg, Germany, 11–15 March 2013; pp. 205–224. [Google Scholar]
  28. Zhou, X.; Chen, Y.; Wang, D.Z. ArchimedesOne: Query Processing over Probabilistic Knowledge Bases. PVLDB 2016, 9, 1461–1464. [Google Scholar] [CrossRef]
  29. Sun, Z.; Zhao, Y.; Wei, Z.; Zhang, W.; Wang, J. Scalable learning and inference in Markov logic networks. Int. J. Approx. Reason. 2017, 82, 39–55. [Google Scholar] [CrossRef]
  30. Singla, P.; Domingos, P.M. Discriminative Training of Markov Logic Networks. In Proceedings of the Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, Pittsburgh, PA, USA, 9–13 July 2005; pp. 868–873. [Google Scholar]
  31. Lowd, D.; Domingos, P.M. Efficient Weight Learning for Markov Logic Networks. In Proceedings of the Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, 17–21 September 2007; pp. 200–211. [Google Scholar] [CrossRef] [Green Version]
  32. Huynh, T.N.; Mooney, R.J. Max-Margin Weight Learning for Markov Logic Networks. In Proceedings of the Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, 7–11 September 2009; pp. 564–579. [Google Scholar] [CrossRef] [Green Version]
  33. Kok, S.; Domingos, P.M. Learning the structure of Markov logic networks. In Proceedings of the Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany, 7–11 August 2005; pp. 441–448. [Google Scholar] [CrossRef] [Green Version]
  34. Mihalkova, L.; Mooney, R.J. Bottom-up learning of Markov logic network structure. In Proceedings of the Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, OR, USA, 20–24 June 2007; pp. 625–632. [Google Scholar] [CrossRef] [Green Version]
  35. Khot, T.; Natarajan, S.; Kersting, K.; Shavlik, J. Gradient-based boosting for statistical relational learning: The Markov logic network and missing data cases. Mach. Learn. 2015, 100. [Google Scholar] [CrossRef] [Green Version]
  36. Marra, G.; Kuzelka, O. Neural Markov Logic Networks. arXiv 2019, arXiv:1905.13462. [Google Scholar]
  37. Wang, J.; Domingos, P.M. Hybrid Markov Logic Networks. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, IL, USA, 13–17 July 2008; pp. 1106–1111. [Google Scholar]
  38. Klir, G.J.; Yuan, B. Fuzzy Sets and Fuzzy Logic—Theory and Applications; Prentice Hall: Upper Saddle River, NJ, USA, 1995. [Google Scholar]
  39. Boyd, S.P.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  40. Sang, E.F.T.K.; Meulder, F.D. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Edmonton, AB, Canada, 31 May–1 June 2003; pp. 142–147. [Google Scholar]
  41. Hoffart, J.; Suchanek, F.M.; Berberich, K.; Lewis-Kelham, E.; de Melo, G.; Weikum, G. YAGO2: Exploring and querying world knowledge in time, space, context, and many languages. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 229–232. [Google Scholar] [CrossRef]
Figure 1. Metric functions: an example: (a) truth value function; (b) exp-loss function.
Figure 1. Metric functions: an example: (a) truth value function; (b) exp-loss function.
Information 12 00124 g001
Figure 2. Example of relation decomposition.
Figure 2. Example of relation decomposition.
Information 12 00124 g002
Figure 3. Example of the comparison of the proposed framework. PSL, Probabilistic Soft Logic.
Figure 3. Example of the comparison of the proposed framework. PSL, Probabilistic Soft Logic.
Information 12 00124 g003
Figure 4. Evaluation of convergence on NMLN vs. PSL: (a) primal residual; (b) dual residual.
Figure 4. Evaluation of convergence on NMLN vs. PSL: (a) primal residual; (b) dual residual.
Information 12 00124 g004
Figure 5. Scalability evaluation on NMLN vs. PSL: (a) runtime; (b) runtime log scale; (c) iterations to converge; (d) iteration log scale.
Figure 5. Scalability evaluation on NMLN vs. PSL: (a) runtime; (b) runtime log scale; (c) iterations to converge; (d) iteration log scale.
Information 12 00124 g005
Figure 6. Sensitivity evaluation w.r.t. the number of parallel threads.
Figure 6. Sensitivity evaluation w.r.t. the number of parallel threads.
Information 12 00124 g006
Figure 7. Sensitivity evaluation w.r.t. α : (a) primal residual; (b) dual residual.
Figure 7. Sensitivity evaluation w.r.t. α : (a) primal residual; (b) dual residual.
Information 12 00124 g007
Figure 8. Sensitivity evaluation w.r.t. ρ : (a)primal residual; (b) dual residual.
Figure 8. Sensitivity evaluation w.r.t. ρ : (a)primal residual; (b) dual residual.
Information 12 00124 g008
Table 1. Examples of knowledge rules.
Table 1. Examples of knowledge rules.
Knowledge RulesSize
r 1 F r e q u e n c y ( c ) C o r e ( c ) F a s t C P U ( c ) n
r 2 P e r f o r m a n c e ( c ) 0.2 n
r 3 F a s t C P U ( c ) 0.9 H a s C P U ( p , c ) M e m o r y ( p ) 0.8 P e r f o r m a n c e ( p ) n
r 4 P e r f o r m a n c e ( p 1 ) P e r f o r m a n c e ( p 2 ) S i m i l a r p r i c e ( p 1 , p 2 ) P e r f o r m a n c e c o s t ( p 1 ) P e r f o r m a n c e c o s t ( p 2 ) n 2
Table 2. Statistics of the test datasets.
Table 2. Statistics of the test datasets.
DatasetTotal No. of VariablesNo. of Non-MatchesNo. of Matches
Mobile Phone1058
AIDA-CONLL728,225713,11315,112
Wiki-Sport28,24424,2444000
Wiki-FourDomains23,82819,3184510
Table 3. Evaluation results on mobile phone ranking. NMLN, Numerical Markov Logic Network.
Table 3. Evaluation results on mobile phone ranking. NMLN, Numerical Markov Logic Network.
Distance_avgGroundingInferenceTotal
NMLN0.8570.13 s0.45 s2.09 s
PSL0.85334.7 s37.9 s73.9 s
Table 4. Evaluation results on entity linking.
Table 4. Evaluation results on entity linking.
In-KBaccGroundingInferenceTotal
AIDA-CNOLL
NMLN0.805278 s1344 s1745 s
PSL0.70825,636 s25,566 s51,661 s
Wiki-Sport
NMLN0.86523 s162s201 s
PSL0.8261889 s889 s2793 s
Wiki-FourDomains
NMLN0.89314 s138 s164 s
PSL0.8761196 s545 s1753 s
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhong, P.; Li, Z.; Chen, Q.; Hou, B.; Ahmed, M. Numerical Markov Logic Network: A Scalable Probabilistic Framework for Hybrid Knowledge Inference. Information 2021, 12, 124. https://doi.org/10.3390/info12030124

AMA Style

Zhong P, Li Z, Chen Q, Hou B, Ahmed M. Numerical Markov Logic Network: A Scalable Probabilistic Framework for Hybrid Knowledge Inference. Information. 2021; 12(3):124. https://doi.org/10.3390/info12030124

Chicago/Turabian Style

Zhong, Ping, Zhanhuai Li, Qun Chen, Boyi Hou, and Murtadha Ahmed. 2021. "Numerical Markov Logic Network: A Scalable Probabilistic Framework for Hybrid Knowledge Inference" Information 12, no. 3: 124. https://doi.org/10.3390/info12030124

APA Style

Zhong, P., Li, Z., Chen, Q., Hou, B., & Ahmed, M. (2021). Numerical Markov Logic Network: A Scalable Probabilistic Framework for Hybrid Knowledge Inference. Information, 12(3), 124. https://doi.org/10.3390/info12030124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop