IntME: Combined Improving Feature Interactions and Matrix Multiplication for Convolution-Based Knowledge Graph Embedding

Zhang, Haonan; Liu, Xuemei; Li, Hairui

doi:10.3390/electronics12153333

Open AccessArticle

IntME: Combined Improving Feature Interactions and Matrix Multiplication for Convolution-Based Knowledge Graph Embedding

by

Haonan Zhang

,

Xuemei Liu

^* and

Hairui Li

School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(15), 3333; https://doi.org/10.3390/electronics12153333

Submission received: 12 July 2023 / Revised: 1 August 2023 / Accepted: 2 August 2023 / Published: 3 August 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge graph embedding is a method to predict missing links in knowledge graphs by learning the interactions between embedded entities and relations in a continuous low-dimensional space. Current research on convolution-based models tends to provide sufficient interactions for extracting potential knowledge. However, sufficient interactions do not mean that they are reasonable. Our studies find that reasonable interactions can further stimulate knowledge extraction capability. Reasonable interactions need to ensure that the elements participating in interactions are disordered and in a reasonable number. To model reasonable interactions that cannot be specifically quantified, we propose a concise and effective model IntME to address this challenge. In detail, we utilize checked feature reshaping and disordered matrix multiplication to form two different types of feature maps to ensure the disorder of the interacting elements and control the number of elements before feature fusion by the shapes of the feature maps after channel scaling reshaping. In feature fusion, we employ large convolution filters and pointwise filters for the deep and shallow linear fusion of feature interactions, which can take into account both explicit and implicit knowledge extraction capability. The evaluations of four benchmark datasets show that IntME has a powerful performance in convolution-based models and a lower training cost, and also demonstrate that our proposed approaches based on reasonable interactions can effectively improve knowledge discovery capability.

Keywords:

knowledge graph; knowledge graph embedding; convolution networks

1. Inroduction

Knowledge graph is a directed heterogeneous graph structure consisting of multiple triplets of relations and entities, each of which can be canonically represented as

(h, r, t)

, with a link start position, the head entity, a link itself, the relation, and a link end position, the tail entity. There are very classical large knowledge graphs, such as WordNet [1], YAGO [2], Freebase [3], DBpedia [4], and they have been used for many other tasks including social networks [5], question answering [6], and recommender systems [7]. In the real world, people often want to discover potential undiscovered knowledge through existing knowledge, and such needs are more urgent in knowledge systems. To resolve the issues caused by the uncertainty and incompleteness of knowledge graphs, link prediction is proposed to solve the problem of predicting missing links based on the existing triplets.

The most active approach for the link prediction task is knowledge graph embedding, which strives to learn the representational characteristics of all the entities and relations in the existing triplets. The majority of the models are distance-based models in initial research on knowledge graph embedding, such as TransE [8], TransR [9], TransH [10], TransO [11], which use distance relationships through low-dimensional spatial embedding between head entities, relations, and tail entities in triplets as training targets. Also, bilinear models, which capture the intrinsic interaction modes of the triad by tensor decomposition to predict missing links, include models such as Rescal [12], DistMult [13], and ComplEx [14]. With the continuous expansion of knowledge graphs and the increasing demand for potential knowledge discovery, the limitations of translation-based models, which find it difficult to handle nonlinear and deep features, come to the fore. Subsequent research has gradually shifted to neural networks with a better ability to fit nonlinear features. ConvE [15] is the first application of convolution networks in knowledge graph embedding models, using a vector reshaped into a matrix form and utilizing 2D convolution to extract interaction information.

However, ConvE simply stacks the feature maps, and there are few interactions of elements. As shown in Figure 1, ConvR [16] thinks of transforming relation embedding to internal convolution filters to achieve interactions, while JointE [17] extends the internal filters to entities and relations, and combines 1D convolution to increase the extraction ability of explicit knowledge, but the form of internal convolution filters for interaction augmentation comes with a high training cost. In contrast to the above methods of enhanced interactions, InteractE [18] and ArcE [19] employ the methods shown in Figure 2. InteractE enhances feature interactions by introducing multiple feature map permutations and by using large convolution kernels to extract latent information, and AcrE uses both atrous convolution and standard convolution to enhance interactions between different locations. The utilization of multiple feature maps combined with depth-wise circle convolution also causes an increased training cost, and the number of interactions can only be controlled by the scaling of the receptive fields.

In more detail, the internal 2D convolution needs to convert all relations and entities into convolution filters for interaction operations. Therefore, enhancing the interaction process of constructing internal filters is time-consuming and will cost much more time than the common externals, and simple 1D convolution finds it difficult to enhance interactions without introducing extra features and parameters. While InteractE employs checked feature reshaping to enhance interactions, reorganizing the elements of entity embeddings and relation embeddings into one or more new feature maps resembling a black-and-white checkerboard pattern, it only focuses on interactions which results in the lack of explicit knowledge extraction and the increase of training costs. The enhancement of the interactions by utilizing atrous convolution in AcrE is also limited.

In this paper, we propose an improved model IntME, which addresses the shortcomings of the above convolution-based models by using the approaches mentioned in Figure 3. Detailed operations will be discussed in the section that describes the model mechanisms. A reasonable number of interactions is the key to whether the latent knowledge extraction capability is strong in the model. Instead of finding a reasonable number of interactions starting from zero blindly, we introduce feature maps as the initial value of the interaction number and design a structure to fit the difference between the reasonable number of interactions and the initial value, which greatly reduces the difficulty of locating a reasonable number of interactions. In further detail, we introduce a checked feature reshaping feature map as the initial value of interaction number and use another path to expand the features linearly by matrix multiplication, so that each element of the enlarged feature maps contains information on both entities and relations. It is concluded from [18] that the fusion of the disordered elements tremendously facilitates the increase of the element interactions. Therefore, we disorder the feature map elements both before and after expansion by utilizing the element reordering to guarantee the quality of fused interactions. By channel scaling reshaping, the enlarged feature maps are reshaped to locate a reasonable number of elements in each channel before channel fusion, then shallowly fuse the channel features by pointwise filters to fit the difference. With the above operations, it is possible to train excellent potential knowledge extraction capability at a low cost. In addition, all the features are integrated in a linear way during fitting the difference, which reduces the loss of surface and explicit knowledge. Eventually, the features that contain the initial values of the interaction number and the features that contain the explicit knowledge information and the difference are fused by element-wise addition.

We evaluate our model on four representative benchmark datasets, and the implementation demonstrates that our model can achieve a state-of-the-art performance on medium and large datasets while having a decent performance on small datasets as well. In brief, our contributions are as follows:

We increase the element interaction randomness by employing element reordering and preserve a reasonable interaction number by utilizing channel scaling reshaping before information fusion to further improve the knowledge extraction capability without extra parameters;
We exploit matrix multiplication and pointwise filters to shallowly linearly fuse features in each channel, which can maintain interactions while retaining as much valuable explicit knowledge information as possible;
Experiment evaluations demonstrate that our model outperforms powerful convolution-based models on four benchmark datasets in most cases, with robust generalization capability and a faster training process.

2. Related Work

In this section, we will introduce the related work on knowledge graph embedding models.

2.1. Translation-Based Models

TransE [8] and Rescal [12], as the typical representatives of the distance-based models and bilinear models, have opened the knowledge graph embedding research process. The vector geometric relation in low-dimensional linear space is the focus of the distance-based models to model entity vectors and relation vectors. TransE first uses simple vector distances as training targets for modeling positional relationships between vectors in low-dimensional space, but its training targets are not suitable for modeling complex relations. Therefore, TransH [10] thinks of solving this problem by utilizing the projection of entity vectors onto the hyperplane to solve the issue caused by the same distance of multiple relations. TransD [20], in contrast, takes into consideration not only the diversity of relations but also the diversity of entities, modeling both relations and entities under different kinds of entity spaces and relation spaces. Bilinear models are also called semantic matching models, and the training targets of all these models are to encode only entity vectors while treating the relations as linear change matrices of the head entities corresponding to the tail entities. Rescal [12] defines relations as full-rank matrices and reflects deep interactions through matrix computation between entities and relations, but it is uncomplicated to overfit in the training process, which makes it difficult to model large-scale knowledge graphs. DistMult [13] utilizes diagonal matrices instead of full-rank matrices for modeling relation matrices to prevent the overfitting problem, and the disadvantage it brings is that it cannot model asymmetric relations. ComplEx [14] extends the vector space to the complex space on the basis of DistMult and uses conjugate complexes to solve the problem of not being capable of modeling asymmetric relations in previous models. Rotation-based models are also a top research direction. RotatE [21] is inspired by the Euler decomposition, which considers that any complex numbers can be obtained by rotating them in the complex space, and the mapping of relational vectors to entity vectors can be modeled by rotating them in the complex space, and then many rotation-based models follow, such as QuatE [22] and DualE [23].

2.2. Neural Network Models

With the emergence of neural network models, knowledge graph embedding can also be addressed using neural network architectures. ConvE [15] connects relations and entities by matrix reshaping and processes them through convolution, then proposes the training targets and training strategies for convolution-based neural networks, but its modeling of the interactions between entities and relations is superficial. To enhance the interaction information, InteractE [18] enhances the interactions by checked feature reshaping, circle padding, and larger kernel size, then uses depth-wise convolution to process each feature map separately. But it relies too much on interactions, which leads to the lack of explicit knowledge modeling capability, and its performance is inferior on small datasets. AcrE [19] models the interactions among distant elements in a certain extent through the use of atrous filters, while the size of its convolution kernel is constant, so it has good modeling ability for explicit knowledge. ConvKB [24] performs convolution operations on all vectors and extracts complex interactions by the convolution combination. ConvR [16] splits the relation vectors into convolution kernels to enhance the interactions. JointE [17] not only utilizes simple 1D convolution to enhance the learning of explicit knowledge but also utilizes internal 2D convolution for intra-feature interactions and parameter saving. Current research figures that the deep nonlinear operations, which design filters and feature maps to increase the interactions as much as possible in convolution-based models, affect the extraction of surface and explicit knowledge. However, whether the increased interactions are reasonable has not been deeply investigated in previous research.

To summarize, translation-based models are more sensitive to linear surface features and can capture surface knowledge information. However, they are powerless to capture deep nonlinear features. Convolution-based models are devoted to increasing the interactions as much as possible to achieve the purpose of improving the potential knowledge extraction capability, but the improved interactions are not necessarily reasonable. Thus, it is necessary to design a new structure that can take into account both linear feature fusion and interaction number control, based upon which we propose the IntME.

3. Model

In this section, we will describe the composition of IntME, an improved model that combines the advantages of deep nonlinear and shallow linear fusion. We present a table of score functions for some classical models and several convolution-based models, and then we present the specific structural composition and calculation mechanisms of our model.

3.1. Notations

There are plenty of triplets in the knowledge graphs, and we use

| E |

to denote the entity set and

| R |

to denote the relation set. The triplets use

(h, r, t)

, where h and t refer to the head entity and tail entity in the entity set

| E |

. The knowledge graph embedding embeds the triplets

(h, r, t)

into the n-dimensional space, so that they become

(e_{h}, e_{r}, e_{t})

, where

e_{h}, e_{r}, e_{t} \in R^{n}

. The score function will be used as the training target to ensure the consistency of predicted missing links and the true triplets. We collect the score functions of some classical models and convolution-based network models and summarize them in Table 1.

3.2. IntME

The entire architecture of IntME is shown in Figure 4. The improved model has two forward paths: Path 1 and Path 2, respectively, corresponding to the processing process of interaction number’s initial value and the difference between the initial value and reasonable interaction number. Given a triplet, we obtain

e_{h}, e_{r}, e_{t}

by embedding and processing it and feeding it into the two paths.

Path 1. We first transform

e_{h}

,

e_{r}

to

P_{k} \in R^{n_{1} \times n_{2}}

using checked feature reshaping, then we use

ϕ (P_{k})

to obtain output

M_{P_{k}} \in R^{b \times w \times q}

, where

w \times q = n_{1} \times n_{2}

:

M_{P_{k}} = ϕ (P_{k});

(1)

after that, we perform depth-wise circle convolution by using

ω_{L}

to obtain the output

M_{c i r c l e}

:

M_{c i r c l e} = M_{P_{k}} ★ ω_{L},

(2)

where the large kernel convolution filter

ω_{L} \in R^{k \times s \times s}

, and

M_{c i r c l e} \in R^{k \times w \times q}

. ⋆ indicates the depth-wise convolution operation. The initial value of the interaction number is obtained in this step. The details of depth-wise circle convolution and checked feature reshaping are presented in [18]. Then, we obtain the hidden-layer output of Path 1

e_{1}

using an activation function, concatenating and flattening:

e_{1} = v e c (f (M_{c i r c l e})) W_{1},

(3)

where the

e_{1} \in R^{n}

and

W_{1} \in R^{u \times n}, u = k \times w \times q

.

Path 2. We first regard

e_{h}, e_{r}

as matrix

e_{h}^{'}, e_{r}^{'} \in R^{1 \times n}

. Then, we use element reordering function

φ

, which means elements of the matrix rows and columns are rearranged separately in the given order set, to obtain the product results

M_{c r o s s}

of

φ (e_{r}^{' T})

and

φ (e_{h}^{'})

:

M_{c r o s s} = φ (e_{r}^{' T}) \times φ (e_{h}^{'}),

(4)

where the

M_{c r o s s} \in R^{n \times n}

. What can be seen is that the expanded feature maps have many times more new elements than before, but they are all calculated from the entity and relation elements, so the new elements can be equally used for increasing interactions. Then, we employ

{[•]}_{c h a n n e l}

channel scaling reshaping, which transforms the original feature maps to the channel mode and keeps the total number of elements unchanged, and utilize point-wise convolution to obtain the output

M_{p o i n t}

:

M_{p o i n t} = {[φ (M_{c r o s s})]}_{c h a n n e l} * ω_{P},

(5)

where

{[φ (M_{c r o s s})]}_{c h a n n e l} \in R^{o \times k_{1} \times k_{2}}

,

o \times k_{1} \times k_{2} = n \times n

. The ∗ donates the convolution operation and

ω_{P} \in R^{m \times 1 \times 1}

is the point-wise convolution filter. And

M_{p o i n t} \in R^{m \times k_{1} \times k_{2}}

.

{[•]}_{c h a n n e l}

is the key step to finding a reasonable number of interactions based on the initial value of the interaction number of Path 1, changing o and thus affecting the number of interactions in the feature maps after linear fusion. The value of o can represent the number of interactions in Path 2. Then, we use

W_{2}

to calculate the Path 2’s hidden-layer output

e_{2}

:

e_{2} = v e c (f (M_{p o i n t})) W_{2},

(6)

where the

W_{2} \in R^{v \times n}

,

v = m \times k_{1} \times k_{2}

and

e_{2} \in R^{n}

.

Finally, we obtain the final output e through element-wise addition:

e = e_{1} + e_{2} .

(7)

Then, we use prediction e and tail entity

e_{t}

to obtain the score vector

e_{s c o r e}

:

e_{s c o r e} = e e_{t} .

(8)

The details of our score function are shown in Table 1. We adopt GELU [26] as Path 2’s feature-map activation function in Equation (6), while others use RELU [27].

3.3. Traning Strategy

Once we obtain the score vector

e_{s c o r e}

, we will multiply it by the transpose matrix of embedding

M_{e m b e d}^{T}

to transfer the result back to the previous dimension, at which point the result is related to the training strategy:

s = e_{s c o r e} M_{e m b e d}^{T},

(9)

where the

M_{e m b e d}^{T} \in R^{n \times | E |}

; then, we will use sigmoid function g to set

s

in the range of 0 to 1 to obtain our final prediction y:

y = g (s) .

(10)

In this case, all the elements in y represent the probability that the corresponding ordinal number is authentic.

Considering the training speed and other aspects, we choose 1-N and 1-X among 1-1, 1-N, 1-X, N-1, and N-N as our training strategies. When we choose 1-N, the prediction

y \in R^{| E |}

, and ith element represents the probability that the triplet

(h, t, e_{i})

is real. When the 1-X strategy is chosen, it will select one positive sample and X negative samples, and if the total number of prediction samples is

X + 1

, then the prediction

y \in R^{X + 1}

.

Based on the previous work from [15,17,18], we still use binary cross-entropy loss as the loss function:

L = \frac{1}{N} \sum_{i} ((\hat{y_{i}} l o g (y_{i})) + (1 - \hat{y_{i}}) (1 - l o g (y_{i}))),

(11)

where the

\hat{y}

is the correct triplet’s value on prediction y. And we use layer normalization [28] in Path 2’s feature-map layer, while others use batch normalization [29]. As an aspect of parameter training, we prefer to adopt Adam optimizer and label smoothing [30], and we utilize dropout [31] to prevent the trouble of over-fitting.

4. Experiments

In this section, we first introduce our evaluation indicator selection and datasets, while we will give the structure-specific configuration of our model. After that, we will show the results of IntME on four benchmark datasets. Finally, we set up experiments to demonstrate the superiority of our approaches and the advantages of the structural choices.

4.1. Evaluation Indicator Selection

We still use the general metrics MR, MRR, and H@

r k

for the evaluation of the model performance. MR is the mean rank, which is the average of the ranking of the probability or the score of the correct tail entity in the prediction result, and a lower MR means a better prediction result. H@

r k

is Hits@

r k

, which indicates the proportion of the real tail entity ranking in the top

r k

positions in the prediction results and usually takes the values of 10, 3, and 1 corresponding to H@10, H@3, and H@1 to evaluate the model, and their higher values indicate a better model performance. MRR is mean reciprocal rank, which is the average of the inverse of the ranking of the probability or score of the correct tail entity in the prediction results. It differs from MR in that the higher its value is, the better the model performs.

Giving the set

E_{t e s t} = {x_{1}, x_{2}, x_{3}, . . ., x_{| E_{t e s t} |}}

, then MRR, MR and Hits@

r k

(H@

r k

) are calculated by Equations (12)–(14).

M R R = \frac{1}{| E_{t e s t} |} \sum_{i = 1}^{| E_{t e s t} |} \frac{1}{r a n k_{i}}

(12)

M R = \frac{1}{| E_{t e s t} |} \sum_{i = 1}^{| E_{t e s t} |} r a n k_{i}

(13)

H i t s @ r k = \frac{1}{| E_{t e s t} |} \sum_{i = 1}^{| E_{t e s t} |} f (r a n k_{i}), f (r a n k_{i}) = \{\begin{matrix} 1, x \leq r k \\ 0, x > r k \end{matrix} .

(14)

Given a triplet in the test set, we will filter the triplet that existed in the validation and training sets by applying the approach from [8], then we evaluate the model’s performance using the prediction results of the tail entities with the above metrics.

4.2. Datasets

To validate the performance and robustness of our improved model on different types of datasets, we select representative datasets to validate our IntME based on the graph size and the number of relation categories. FB15K [8] and WN18 [8] are two classical datasets upon which many previous excellent works have been evaluated, and FB15k-237 [32] and WN18RR [1], which are subsets of the above datasets, respectively, are more capable of validating the performance of the model, so we select FB15k-237 and WN18RR as representative of the medium-sized datasets. YAGO3-10, as a subset of YAGO [2], has a large number of entities and contains a large number of triplets, so we select YAGO3-10 as a representative of large datasets. Alyawarra Kinship [33] can be considered a representative of a micro dataset compared to the above three datasets, so we select the above four datasets for our experiments. The detailed dataset information is shown in Table 2.

The whole Freebase has 1.9 billion triplets, and FB15k-237 is its subset with nearly 15,000 entities and 237 relation types; as a subgraph of Freebase, it has the corresponding information of a father knowledge graph. WN18RR is a subgraph of WordNet with 40,943 entities and 11 types of relations. The size of FB15k-237 and WN18RR entities and the number of types of relations can well verify the expression ability of the model for explicit and latent knowledge. YAGO3-10 has 123,182 entities and 37 relations, and its size is very suitable for verifying the performance of the model under a large number of knowledge systems compared with the previous two datasets. Kinship has only 104 entities and 25 relations, which can verify the learning ability of our model for surface knowledge.

4.3. Experiment Setup

We uniformly set the batch size to 256 and train 500 epochs in our experimental setup. We set the embedding dimension of both entities and relations to 200 and the label smoothing rate to 0.1. We define a three-layer dropout to prevent training overfitting and enhance the robustness of the network. We use different values and learning rates for different benchmark datasets, which are shown in Table 3.

In WN18RR, the reshaped feature map’s shape of our model on Path 2 is

64 \times 25 \times 25

, and the other three datasets are

100 \times 20 \times 20

. The checked feature reshaping time is only 1.

4.4. Experiment Results

We first compare the results of several benchmark models with our IntME after training on FB15k-237 and WN18RR. The benchmark models are chosen from distance-based models (i.e., TransE [8], KMAE [34]), bilinear-based models (i.e., DistMult [13], ComplEx [14]), and some convolution-based networks (i.e., ConvE [15], InteractE [18], JointE [17], HypER [25]). It is clear from Table 4 that IntME outperforms all the benchmark models on FB15k-237 and WN18RR.

The performance of IntME on FB15k-237 far exceeds that of ConvE and HypER, with a considerable improvement compared to InteractE. The improvement is about 2% on MRR, about 0.8% on H@10, about 2.1% on H@3, and about 1.9% on H@1. Compared with JointE, which is a state-of-the-art convolution-based model, IntME also demonstrates its excellent performance.

From the results of WN18RR, IntME also shows an overwhelming superiority. Different from JointE, which is not as good as DistMult in MR, IntME also performs well in MR. While outperforming ConvE and HypER, the addition of Path 2 makes IntME outperform InteractE comprehensively, improving MRR from 0.469 to 0.475, MR from 5039 to 4055, and H@10 & H@3 by about 2.6% and 2.7%, respectively.

We re-run the open source code of InteractE on FB15k-237 and WN18RR and compare it with the improved model IntME. The best MRR-based performance variation on the validation set is shown in Figure 5 and Figure 6. As can be clearly seen in Figure 5, IntME takes the overall lead over InteractE’s performance after about the 30th epoch, and the gap is very obvious. And in Figure 6, InteractE performs checked feature reshaping four times, while IntME only uses one time. Moreover, InteractE uses a convolution kernel size of 11 for WN18RR in the open source code, while we only use 9, so we train more slowly on the first 80 epochs but, after 80 epochs, both H@10 and H@3 surpass InteractE, and after 150 epochs MRR also surpasses it.

Next, we utilize YAGO3-10 to validate the performance of our model for link prediction on large datasets, and we select several bilinear models, DistMult [13], ComplEx [14], and several benchmark convolution-based models such as ConvE [15], HypER [25], JointE [17], and InteractE [18] as our baselines. Experimental results are shown in Table 5.

In YAGO3-10, our model still gives a good performance; we perform optimally on two metrics, MRR and H@1, with a performance that is not weaker than JointE and is stronger than the other baselines. Compared to InteractE, IntME increases MRR by 0.7%, H@10 by 0.2%, H@3 by 0.3%, and H@1 by 1.4%. Our model is fainter than JointE in H@10 and H@3 metrics, and we think that our model is not quite strong enough for feasible interactions. If we continue to try channel scaling reshaping, we think we can reach the level of JointE or even surpass it. Furthermore, benefitting from the powerful nonlinear fitting capability, the neural network is much more powerful than the linear model for latent knowledge processing.

Kinship, the smallest of the four datasets, is appropriate for evaluating the model’s ability to extract explicit knowledge. We pick several models, including ComplEx [14], ConvE [15], RotatE [21], HAKE [35], and InteractE [18], as baselines. When applying Path 2, IntME exhibits highly extraordinary results in the test of explicit knowledge recognition capability, and the results are displayed in Table 6. InteractE is the worst-performing model in the results. IntME outperforms all other baselines in MRR, H@10, and H@3. ComplEx scores the highest marks in H@1, while IntME follows. Our model improves on the InteractE by 12.2% on MRR, 2.6% on H@10, 7.9% on H@3, and 20.3% on H@1, respectively.

In summary, IntME shows an excellent performance on all four datasets. The analysis of datasets of various sizes supports our expectations for what IntME can reach in terms of link prediction. Meanwhile, it is demonstrated that IntME can be the state-of-the-art convolution-based model in medium and large knowledge graphs.

4.5. Ablation Study

We design an ablation study to examine whether the individual modules of our model actually work.

To verify whether channel scaling reshaping is able to locate the interaction difference between reasonable interactions and the output features of Path 1 through channel element adjustment, we set up two sets of experiments on FB15k-237 and WN18RR for verification. To simplify the factorization in 200-dimensional embedding, we only consider the form of the square feature maps, i.e.,

k_{1}

=

k_{2}

in Equation (5). Thus, we set up 3 sets of square feature groups of different shapes—

100 \times 20 \times 20

,

64 \times 25 \times 25

, and

25 \times 40 \times 40

, respectively. We implement it on FB15k-237 and WN18RR with the optimal hyper-parameters, and the results are shown in Table 7.

The results of FB15k-237 indicate that the three shapes perform differently, with

100 \times 20 \times 20

exhibiting the best performance, indicating the benefit of improved interactions for IntME performance. Overall, in contrast to the results of FB15k-237, the best results of WN18RR are

64 \times 25 \times 25

, and this suggests that increasing interactions should be carried out reasonably rather than arbitrarily.

As a result, channel scaling reshaping can produce plausible interaction improvement without the requirement for extra parameters and can quantify the difference through the adjustment of channel element number.

In Section 3, we employ the

φ

function, which is an element reordering function. In brief, it reorders both the row and column elements of the matrix according to different permutations. We anticipate that it can enhance the possibilities of interaction between different elements. Therefore, we perform controlled trials to verify our anticipation on FB15k-237 and WN18RR. The structures of the models in the trials are all optimally chosen and the results are shown in Table 8.

According to the results, when

φ

is not applied, the performance of IntME at FB15k-237 is essentially identical to that of InteractE, with little improvement, and it is not significant on WN18RR, either. In contrast to the case of

φ

not applying, the IntME results are highly significantly improved on FB15k-237, and a slight improvement on WN18RR, especially improving FB15k-237’s results from 0.354 to 0.360 on MRR, from 0.536 to 0.543 on H@10, from 0.388 to 0.395 on H@3, and from 0.263 to 0.267 on H@1.

In summary, element reordering can effectively enhance the interaction quality through the disordered element interactions.

To verify the necessity of using Path 1 and Path 2 together in IntME, we implement an ablation study on FB15k-237 and WN18RR. The result is displayed in Table 9.

We can see from the results that it is necessary to employ Path 1 and Path 2 together. Without Path 1, Path 2 loses some capability to extract some potential knowledge. The decrease is less significant in FB15k-237 but, in WN18RR, MRR decreases from 0.475 to 0.449, H@10 from 0.545 to 0.532, H@3 from 0.495 to 0.478, and H@1 from 0.436 to 0.397. Without Path 2, IntME lacks much of its capability to extract explicit knowledge.

In summary, IntME is only complete if both Path 1, which represents the initial value of the interactions, and Path 2, with explicit knowledge information and the difference between more reasonable interactions and initial value, are used.

4.6. Training Cost

Our model has the advantages of not only robust generalization ability but also low training cost. InteractE, as a state-of-the-art convolution-based model using external common filters, and JointE, which utilizes internal alternate filters and is also a state-of-the-art model, are well suited as baselines for a training cost comparison. To confirm the consistency of comparison results, we set a 256 batch size and we choose the 1-N strategy in FB15k-237, WN18RR, and YAGO3-10. There is no open source code for JointE, so its code is rewritten in our environment according to [17].

As shown in Figure 7, our model has a minor improvement of training cost on FB15k-237 but brings a more powerful performance than InteractE, while it takes even less training cost than InteractE with optimal hyper-parameters on WN18RR and YAGO3-10. It benefits from the shallow linear operations in Path 2 and single feature map fusion in Path 1 and the complementary feature extraction capability of both Path 1 and Path 2. InteractE employs depth-wise circle convolution, which causes much cost in I/O operations instead of in calculation processes, and internal alternate convolution filters in JointE even cost several or tens of times the overhead of the above models in I/O, so that the time cost in an epoch would be much more.

MFAE [36] also has the advantage of low training cost, but our model can achieve a better performance than it with a lower training cost, and the comparison is displayed in Table 10, all of which again prove the superiority of our model structure.

5. Conclusions

A concise and effective model for locating reasonable interactions, IntME is proposed in this paper. Current research does not delve into the necessity of reasonable interactions, and as the scale of knowledge systems enlarges, the roles of interactions in knowledge extraction capabilities become increasingly evident. Blindly increasing interactions does not better stimulate knowledge extraction; the more reasonable the increased interactions are, the stronger the knowledge extraction ability will be. IntME uses the idea of splitting reasonable interactions, using two paths to learn the initial value and difference separately, which greatly reduces the fitting difficulty. Also, using element reordering and feature reshaping, instead of internal filters, further accomplishes enhanced interactions, which maximizes the training cost reduction. Furthermore, IntME can achieve a state-of-the-art performance on four benchmark datasets compared to other convolution-based models, which represents the excellence of our model’s generalization ability and the superiority of our methods. Exploring the mechanisms of reasonable interactions may be a subsequent direction in which to improve the knowledge extraction capability.

In future work, we will focus our research on the combined model of vector embedding and graph embedding in low-dimensional space, using the information of proximity nodes to conform to the vector representation of the modeling triplets. Also, we will further research interaction effects on the knowledge representation capability to enhance the performance of the new model.

Author Contributions

Conceptualization, H.Z.; methodology, H.Z.; software, H.Z.; validation, X.L., H.L. and H.Z.; investigation, H.Z.; data curation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, H.Z., X.L. and H.L.; visualization, H.Z.; supervision, H.Z.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Projects of Open Cooperation of Henan Academy of Sciences (Grant No. 220901008) and Major Science and Technology Projects of the Ministry of Water Resources (Grant No. SKS-2022029).

Data Availability Statement

The datasets in this paper are available in the following articles: FB15k-237 [32]; WN18RR [1]; YAGO3-10 [2]; Kinship [33].

Conflicts of Interest

The authors declare no conflict of interest.

References

Miller, G.A. WordNet: A lexical database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 1247–1250. [Google Scholar]
Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. Dbpedia: A nucleus for a web of open data. In The Semantic Web; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar]
Cai, T.; Li, J.; Mian, A.S.; Li, R.-H.; Sellis, T.; Yu, J.X. Target-aware holistic influence maximization in spatial social networks. IEEE Trans. Knowl. Data Eng. 2020, 34, 1993–2007. [Google Scholar] [CrossRef]
Bordes, A.; Chopra, S.; Weston, J. Question Answering with Subgraph Embeddings. In Proceedings of the EMNLP, Doha, Qatar, 25–29 October 2014. [Google Scholar]
Zhang, F.; Yuan, N.J.; Lian, D.; Xie, X.; Ma, W.Y. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 353–362. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Li, Z.; Liu, X.; Wang, X.; Liu, P.; Shen, Y. TransO: A knowledge-driven representation learning method with ontology information constraints. World Wide Web 2022, 26, 1–23. [Google Scholar] [CrossRef]
Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
Yang, B.; Yih, S.W.t.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 2071–2080. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Jiang, X.; Wang, Q.; Wang, B. Adaptive Convolution for Multi-Relational Learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 978–987. [Google Scholar]
Zhou, Z.; Wang, C.; Feng, Y.; Chen, D. JointE: Jointly utilizing 1D and 2D convolution for knowledge graph embedding. Knowl.-Based Syst. 2022, 240, 108100. [Google Scholar] [CrossRef]
Vashishth, S.; Sanyal, S.; Nitin, V.; Agrawal, N.; Talukdar, P. Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions. In Proceedings of the AAAI Conference on Artificial Intelligence, Hilton, NY, USA, 7–12 February 2020; Volume 34, pp. 3009–3016. [Google Scholar]
Ren, F.; Li, J.; Zhang, H.; Liu, S.; Li, B.; Ming, R.; Bai, Y. Knowledge Graph Embedding with Atrous Convolution and Residual Learning. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1532–1543. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 687–696. [Google Scholar]
Sun, Z.; Deng, Z.H.; Nie, J.Y.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Zhang, S.; Tay, Y.; Yao, L.; Liu, Q. Quaternion knowledge graph embeddings. arXiv 2019, arXiv:1904.10281. [Google Scholar]
Cao, Z.; Xu, Q.; Yang, Z.; Cao, X.; Huang, Q. Dual quaternion knowledge graph embeddings. In Proceedings of the the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 6894–6902. [Google Scholar]
Dai Quoc Nguyen, T.D.N.; Nguyen, D.Q.; Phung, D. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. arXiv 2018, arXiv:1712.02121. [Google Scholar]
Balažević, I.; Allen, C.; Hospedales, T.M. Hypernetwork knowledge graph embeddings. In Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Springer: Cham, Switzerland, 2019; pp. 553–565. [Google Scholar]
Hendrycks, D.; Gimpel, K. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units. 2016. Available online: https://openreview.net/forum?id=Bk0MRI5lg (accessed on 1 August 2023).
Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Toutanova, K.; Chen, D. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, Beijing, China, 26–31 July 2015; pp. 57–66. [Google Scholar]
Lin, X.V.; Socher, R.; Xiong, C. Multi-Hop Knowledge Graph Reasoning with Reward Shaping. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3243–3253. [Google Scholar]
Jiang, D.; Wang, R.; Yang, J.; Xue, L. Kernel multi-attention neural network for knowledge graph embedding. Knowl.-Based Syst. 2021, 227, 107188. [Google Scholar] [CrossRef]
Zhang, Z.; Cai, J.; Zhang, Y.; Wang, J. Learning hierarchy-aware knowledge graph embeddings for link prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Hilton, NY, USA, 7–12 February 2020; Volume 34, pp. 3065–3072. [Google Scholar]
Jiang, D.; Wang, R.; Xue, L.; Yang, J. Multiview feature augmented neural network for knowledge graph embedding. Knowl.-Based Syst. 2022, 255, 109721. [Google Scholar] [CrossRef]

Figure 1. The principle of 2 types of convolution.

Figure 2. The operation details of InteractE and AcrE.

Figure 3. The patterns of pointwise convolution, element reordering, and channel scaling reshaping.

Figure 4. The schematic of IntME’s structure.

Figure 5. The best MRR-based performance variation on FB15k-237’s validation set.

Figure 6. The best MRR-based performance variation on WN18RR’s validation set.

Figure 7. The time cost comparison of running one epoch in our experimental environment.

Table 1. The score functions

f_{r} (e_{h}, e_{t})

embedding parameters of some knowledge graph embedding models. The

〈 e_{h}, e_{r}, e_{t} 〉

defines the inner product of three vectors which is calculated by

\sum h_{i} r_{i} t_{i}

. ⊙ indicates Hadamard Product.

\hat{e_{t}}

indicates conjugate vector for

e_{t} \in C^{n}

, and

\bar{e_{h}}

indicates 2D reshaping from

e_{h}

. The ∗ indicates convolution operation and ⋆ indicates depth-wise circle convolution.

v e c (•)

indicates vectorizin a tensor and

v e c^{- 1} (•)

means the inverse operation.

e_{h}^{'}

indicates the matrixing of

e_{h}

. f is a nonlinear function, while g is a sigmoid function.

ϕ

is the checked feature reshaping function and

φ

is a reordering function.

ω_{S}

indicates standard convolution filters and

ω_{A}

indicates atrous convolution filters.

ω_{h}

and

ω_{r}

indicate internal convolution filters whose elements are based on

e_{h}

and

e_{r}

.

ω_{L}

indicates the large kernel convolution filters while

ω_{P}

means the point-wise convolution filters.

[e_{h}; e_{r}]

means the concatenation of

e_{h}

and

e_{r}

.

{[•]}_{c h a n e l}

indicates the channel scaling reshaing. The

H, W, W_{1}, W_{2}

are both learnable weights.

Table 1. The score functions

f_{r} (e_{h}, e_{t})

embedding parameters of some knowledge graph embedding models. The

〈 e_{h}, e_{r}, e_{t} 〉

defines the inner product of three vectors which is calculated by

\sum h_{i} r_{i} t_{i}

. ⊙ indicates Hadamard Product.

\hat{e_{t}}

indicates conjugate vector for

e_{t} \in C^{n}

, and

\bar{e_{h}}

indicates 2D reshaping from

e_{h}

. The ∗ indicates convolution operation and ⋆ indicates depth-wise circle convolution.

v e c (•)

indicates vectorizin a tensor and

v e c^{- 1} (•)

means the inverse operation.

e_{h}^{'}

indicates the matrixing of

e_{h}

. f is a nonlinear function, while g is a sigmoid function.

ϕ

is the checked feature reshaping function and

φ

is a reordering function.

ω_{S}

indicates standard convolution filters and

ω_{A}

indicates atrous convolution filters.

ω_{h}

and

ω_{r}

indicate internal convolution filters whose elements are based on

e_{h}

and

e_{r}

.

ω_{L}

indicates the large kernel convolution filters while

ω_{P}

means the point-wise convolution filters.

[e_{h}; e_{r}]

means the concatenation of

e_{h}

and

e_{r}

.

{[•]}_{c h a n e l}

indicates the channel scaling reshaing. The

H, W, W_{1}, W_{2}

are both learnable weights.

Model	Score Function $f_{r} (e_{h}, e_{t})$	Embedding Parameters
TransE [8]	$- \| \| e_{h} + e_{r} - e_{t} \| \|$	$e_{h}, e_{r}, e_{t} \in R^{n}$
DistMult [13]	$〈 e_{h}, e_{r}, e_{t} 〉$	$e_{h}, e_{r}, e_{t} \in R^{n}$
ComplEx [14]	$R e 〈 e_{h}, e_{r}, \hat{e_{t}} 〉$	$e_{h}, e_{r}, e_{t} \in C^{n}$
RotatE [21]	$- \| \| e_{h} ⊙ e_{r} - e_{t} {\| \|}^{2}$	$e_{h}, e_{r}, e_{t} \in C^{n}, \| e_{r_{i}} \| = 1$
ConvE [15]	$f (v e c (f ([\bar{e_{h}}, \bar{e_{r}}] * ω)) W) e_{t}$	$e_{h}, e_{r}, e_{t} \in R^{n}$
HyerER [25]	$f (v e c (e_{h} * v e c^{- 1} (e_{r H})) W) e_{t}$	$e_{h}, e_{r}, e_{t} \in R^{n}$
InteractE [18]	$g (v e c (f (ϕ (P_{k}) ★ ω)) W) e_{t}$	$P_{k} \in R^{n_{1} \times n_{2}}, e_{t} \in R^{n}$
AcrE [19]	$f (v e c (f ([\bar{e_{h}}; \bar{e_{r}}] + [\bar{e_{h}}; \bar{e_{r}}] * ω_{S} * ω_{A}) W) e_{t}$	$e_{h}, e_{r}, e_{t} \in R^{n}$
JointE [17]	$f ([e_{h}; e_{r}] * ω_{1 D}^{1} * ω_{1 D}^{2} + v e c ([\bar{e_{h}} * ω_{r}; \bar{e_{r}} * ω_{h}]) W) e_{t}$	$e_{h}, e_{r}, e_{t} \in R^{n}$
IntME	$f (v e c (f (ϕ (P_{k}) ★ ω_{L})) W_{1} + v e c (f ({[φ (φ (e_{r}^{' T}) \times φ (e_{h}^{'}))]}_{c h a n n e l} * ω_{P})) W_{2}) e_{t}$	$P_{k} \in R^{n_{1} \times n_{2}}, e_{h}^{'}, e_{r}^{'} \in R^{1 \times n}, e_{t} \in R^{n}$

Table 2. The detailed information of the datasets.

Dataset	$\| E \|$	$\| R \|$	Train	Valid	Test
FB15k-237	14,541	237	272,115	17,535	20,446
WN18RR	40,943	11	86,835	3034	3134
YAGO3-10	123,182	37	1,079,040	5000	5000
Kinship	104	25	8544	1068	1074

Table 3. The model’s detail settings. @ indicates the current path’s convolution filter number.

k e r_{s i z e}

indicates the large kernel size.

i n p_{d r o p}

indicates input dropout.

f e a t_{d r o p}

indicates feature-map dropout.

h i d_{d r o p}

indicates hidden-layer dropout.

Table 3. The model’s detail settings. @ indicates the current path’s convolution filter number.

k e r_{s i z e}

indicates the large kernel size.

i n p_{d r o p}

indicates input dropout.

f e a t_{d r o p}

indicates feature-map dropout.

h i d_{d r o p}

indicates hidden-layer dropout.

Dataset	$@ {Path}_{1}$	$@ {Path}_{2}$	$\ker_{size}$	lr	${inp}_{drop}$	${feat}_{drop}$	${hid}_{drop}$
FB15k-237	96	64	$9 \times 9$	0.0005	0.2	0.3	0.7
WN18RR	96	64	$9 \times 9$	0.003	0.1	0.2	0.8
YAGO3-10	64	64	$7 \times 7$	0.003	0.1	0.3	0.3
Kinship	96	80	$9 \times 9$	0.0007	0.1	0.8	0.6

Table 4. The experimental results of FB15k-237 and WN18RR. The best results are shown in bold. The results of InteractE are re-run by its open-source code and others are taken from [17].

	FB15k-237					WN18RR
	MRR	MR	H@10	H@3	H@1	MRR	MR	H@10	H@3	H@1
TransE [8]	0.287	226	0.475	0.325	0.192	0.193	5230	0.445	0.370	0.003
DistMult [13]	0.178	409	0.352	0.204	0.092	0.332	4107	0.456	0.380	0.260
ComplEx [14]	0.234	472	0.407	0.265	0.146	0.394	5365	0.461	0.419	0.353
KMAE [34]	0.326	235	0.502	0.358	0.240	0.448	4441	0.524	0.465	0.415
ConvE [15]	0.316	246	0.491	0.350	0.239	0.460	5277	0.480	0.430	0.390
HypER [25]	0.341	250	0.520	0.376	0.252	0.465	5798	0.522	0.477	0.436
InteractE [18]	0.353	182	0.539	0.387	0.262	0.469	5039	0.531	0.482	0.438
JointE [17]	0.356	177	0.543	0.393	0.262	0.471	4655	0.537	0.483	0.438
IntME	0.360	172	0.543	0.395	0.267	0.475	4055	0.545	0.495	0.436

Table 5. The experiment alresults of YAGO3-10. The best results are shown in bold. The results of InteractE on YAGO3-10 are re-run by its open-source code and others are taken from [17].

	YAGO3-10
	MRR	H@10	H@3	H@1
DistMult [13]	0.340	0.540	0.380	0.240
ComplEx [14]	0.360	0.550	0.400	0.260
ConvE [15]	0.440	0.620	0.490	0.350
HypER [25]	0.533	0.678	0.580	0.455
InteractE [18]	0.552	0.688	0.598	0.476
JointE [17]	0.556	0.695	0.605	0.481
IntME	0.556	0.690	0.600	0.483

Table 6. The experiment results of Kinship. The best results are shown in bold. The results are taken from [19].

	Kinship
	MRR	H@10	H@3	H@1
ComplEx [14]	0.823	0.971	0.899	0.823
ConvE [15]	0.833	0.981	0.917	0.738
RotatE [15]	0.843	0.978	0.919	0.760
HAKE [35]	0.852	0.980	0.928	0.769
InteractE [18]	0.777	0.959	0.870	0.664
IntME	0.872	0.984	0.939	0.799

Table 7. The experimental results of different feature map shapes gained by channel scaling reshaping on FB15k-237 and WN18RR. The best results are shown in bold.

	FB15k-237				WN18RR
	MRR	H@10	H@3	H@1	MRR	H@10	H@3	H@1
$25 \times 40 \times 40$	0.356	0.542	0.394	0.263	0.473	0.541	0.491	0.435
$64 \times 25 \times 25$	0.358	0.541	0.394	0.265	0.475	0.545	0.495	0.436
$100 \times 20 \times 20$	0.360	0.543	0.395	0.267	0.473	0.547	0.491	0.433

Table 8. The results of controlled trials on FB15k-237 and WN18RR. The best results are shown in bold.

	FB15k-237				WN18RR
	MRR	H@10	H@3	H@1	MRR	H@10	H@3	H@1
IntME- $w i t h o u t_{φ}$	0.354	0.536	0.388	0.263	0.471	0.543	0.489	0.432
IntME- $w i t h_{φ}$	0.360	0.543	0.395	0.267	0.475	0.545	0.495	0.436

Table 9. The ablation study results on FB15k-237 and WN18RR. The best results are shown in bold. All models are with optimal hyper-parameters.

	FB15k-237				WN18RR
	MRR	H@10	H@3	H@1	MRR	H@10	H@3	H@1
IntME $_{p a t h 1}$	0.353	0.540	0.387	0.262	0.469	0.531	0.482	0.438
IntME $_{p a t h 2}$	0.353	0.537	0.389	0.260	0.449	0.532	0.478	0.397
IntME	0.360	0.543	0.395	0.267	0.475	0.545	0.495	0.436

Table 10. The performance comparison between MFAE and IntME. The best results are shown in bold. The results of MFAE are taken from [36].

	FB15k-237				WN18RR
	MRR	H@10	H@3	H@1	MRR	H@10	H@3	H@1
MFAE	0.355	0.540	0.390	0.263	0.467	0.530	0.482	0.437
IntME	0.360	0.543	0.395	0.267	0.475	0.545	0.495	0.436

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Liu, X.; Li, H. IntME: Combined Improving Feature Interactions and Matrix Multiplication for Convolution-Based Knowledge Graph Embedding. Electronics 2023, 12, 3333. https://doi.org/10.3390/electronics12153333

AMA Style

Zhang H, Liu X, Li H. IntME: Combined Improving Feature Interactions and Matrix Multiplication for Convolution-Based Knowledge Graph Embedding. Electronics. 2023; 12(15):3333. https://doi.org/10.3390/electronics12153333

Chicago/Turabian Style

Zhang, Haonan, Xuemei Liu, and Hairui Li. 2023. "IntME: Combined Improving Feature Interactions and Matrix Multiplication for Convolution-Based Knowledge Graph Embedding" Electronics 12, no. 15: 3333. https://doi.org/10.3390/electronics12153333

APA Style

Zhang, H., Liu, X., & Li, H. (2023). IntME: Combined Improving Feature Interactions and Matrix Multiplication for Convolution-Based Knowledge Graph Embedding. Electronics, 12(15), 3333. https://doi.org/10.3390/electronics12153333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IntME: Combined Improving Feature Interactions and Matrix Multiplication for Convolution-Based Knowledge Graph Embedding

Abstract

1. Inroduction

2. Related Work

2.1. Translation-Based Models

2.2. Neural Network Models

3. Model

3.1. Notations

3.2. IntME

3.3. Traning Strategy

4. Experiments

4.1. Evaluation Indicator Selection

4.2. Datasets

4.3. Experiment Setup

4.4. Experiment Results

4.5. Ablation Study

4.6. Training Cost

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI