5.3.1. Classification of Nodes
In our experiments, we use classical method-accuracy to verify the effectiveness of MCHIN. We perform experiments on three real-world datasets—DBLP, Yelp, and IMDb. For more convincing results, we randomly select labeled objects as the priori knowledge and classify the rest of the data in the DBLP dataset. Because Yelp dataset is very sparser than DBLP, we randomly select the proportion of labeled objects as . In the IMDB dataset, we choose the ratio of labeled objects similar to the Yelp dataset.
In our work,
indicates the selection of the important types of links during the ranking process. As discussed in Reference [
17], we set
. They are good enough to verify the validity of MCHIN.
The results are shown in
Table 3,
Table 4 and
Table 5, and the highest performance is in bold. In DBLP, clearly, when the proportion of labeled nodes are only
, the accuracy of MCHIN can be up to
. That outperforms Deepwalk more than
, and better than the best baselines models by
. By increasing the proportion of labeled objects, the performance of the proposed MCHIN has been far ahead than other comparison methods. The accuracy of MCHIN can achieve
by labeling
of the objects. In fact, our proposed model MCHIN is significantly more effective than MCHIN-ori especially when there is a lower number of labels, as MCHIN uses extension meta graphs.
In
Table 4, our method consistently outperforms the other baselines on the Yelp dataset as well. The results of LLGC, wvRN, RankClass and Deepwalk methods are quite similar to each other. The reasons are two-folds: (1) the four baseline methods are generally applied to homogeneous information networks. They lose the quality semantics and the useful complex information when they are applied to HINs. (2) None of the baseline approaches consider extra-label information through different types of models in HINs. When extending the meta graphs, MCHIN augments the prior label sets. This improves the performance
more than the MCHIN-ori method on average.
From
Table 5, we can see that the effectiveness of MCHIN is higher than the best performance obtained by the baseline methods from
to
when the labeled objects change from
to
. The LLGC, wvRN, Deepwalk, and SDNE are generally used in homogeneous information networks, none of them can capture the rich semantic information in HINs. As the results of LLGC, wvRN, Deepwalk and SDNE methods show, they are significantly lower than others. Both of RanlClass and HIN2Vec algorithms are applied in the HINs, they can gain more semantic information in different relationships. Using meta paths in HIN2Vec effectively improve accuracy. However, the performance of MCHIN-ori is still up to
more than HIN2Vec because of adding meta graphs when the labeled objects are
. Furthermore, when we extend the meta graphs, the accuracy increases by
when the labeled objects are
.
5.3.2. Comparison of Algorithms Using Single Meta Path on DBLP
Meta graphs contain various meta paths, which represent different semantics in HINs. In order to verify the impacts of meta graphs, it is essential to compare the meta element-meta paths in meta graphs. Due to the space limitation, we only report the performance of meta graphs on the DBLP dataset. In our paper, the performances of four meta paths:
, and the meta graphs
are compared for the DBLP dataset.
Figure 5a–e show the accuracy of authors based on the meta paths and meta graphs. We can see that our model MCHIN outperforms all the baselines based on all the meta paths. The performance of MCHIN is still stable under different ratios of training data. In all the methods, meta path APCPA performs the best among APA and APAPA, as it can capture more semantic information in HINs.
We further compare the performance of MCHIN based on meta graphs
with all the four meta-paths. From
Figure 5e, we can see that the meta graph
performs the best among the four meta paths. The single meta graph
outperforms other meta paths at least
. It shows that meta graphs can capture more semantic information than meta paths.
5.3.3. Learning Weights of Meta Graphs In MCHIN
From our experiments and comparisons, we can see that different meta graphs express different semantics. To utilize rich semantics of meta graphs, it is possible to weight meta graphs differently and assign higher weights to meta graphs with higher impacts on accuracy.
Figure 6 shows the performance of MCHIN under meta paths and meta graphs on DBLP. The meta graphs’ relative effectiveness is determined by MCHIN via its weight assignment mechanism.
Table 6 denotes different weights of meta graphs computed by MCHIN. MCHIN has assigned the highest weight to the meta graphs
, as
performs the best in the classification. This matches with the intuition. On the other hand, the weight of APA is around 0.01∼0.15 due to its poor ability to capture the main features for the classification task. The results of meta graphs’ weights on the Yelp dataset are also shown in
Table 6. Because the labels of the links between restaurants are related to categories, our model assigns more weights 0.30∼0.40 to
than
. The meta graph
achieves the highest weight among other meta graphs. In the IMDB dataset,
has higher weight in average than
, as the
expresses more semantic than
.