1. Introduction
Non-negative matrix factorization (NMF) is now known to be a relatively new method of matrix factorization [
1]. Since D.D. Lee et al. proposed a new method of feature subspace in
Nature in 1999, Non-negative Matrix Factorization [
2] has been widely used in image analysis, text clustering, data mining, speech processing and other aspects. With the deepening of research, many application analyses have developed, from the early single-structure feature analysis to the joint mining of multiple network structures and the layered analysis of multi-source information. In addition, abundant data indicate that the model based on a single pairwise interaction may not capture complex dependencies between network nodes [
3]. The interaction of user with both the video and its enrichments results in a lot of explicit and implicit relevance feedback, which enables some works to provide personalized and rich multimedia content [
4]. Lambiotte and Rosvall et al. [
5] described the shortcomings of the traditional network model and the existing ideal high-order model in their article published in
NaturePhysics and discussed that the multi-layer network model played an important role in the analysis of various types of interactions of many actual complex systems.
In the analysis and mining of multi-relational data, the method based on network representation learning has attracted the attention of many scholars because of its excellent performance in many practical tasks. Network representation learning [
6] (also known as network embedding) is an effective network analysis method. Based on preserving network structure information, it embeds graphs into a low-dimensional and compact space. For complex multi-level data, however, with only the NMF decomposition single-layer network composed of a single decomposition, it tends to reach a higher accuracy even with the corruption of a severe proportion of data, as well as the function of the local minimum in split, leaving it unable to express the characteristics of data from many angles, so decomposition results are not always satisfied [
7].
To effectively represent high-order and multi-layer complex data, traditional vector-based machine learning and deep learning algorithms can directly use the mapping of low-dimensional space representations to efficiently complete network analysis tasks, which greatly enrich the selection of algorithms and models for network mining tasks [
8,
9]. Therefore, how to use an effective deep network structure for the hierarchical feature extraction of complex data and how to combine the advantages of non-negative matrix decomposition and deep networks have important practical implications for the collaborative discovery of data research knowledge from multiple information sources.
Therefore, in this paper, NMF is introduced into a multi-layer NMF structure, and the feature representation of input data is realized by using a complex hierarchical structure. By adding regularization constraints for each layer, the essential features of data are obtained by characterizing the feature transformation layer-by-layer, and further, the multi-layer NMF model MDA-NMF of a deep autoencoder and NMF is constructed. The method proposed in this paper can effectively improve the detection accuracy and prediction accuracy of social groups in the complex social management system. The main contributions of the proposed system can be summarized as follows: 1. The model integrates the multi-layer structure features of deep self-coding; 2. The model introduces multi-layer NMF structure, which can effectively use a complex hierarchical structure to achieve feature representation of input data; 3. Through the evaluation of multiple data sets, it is proved that the proposed method is superior to the existing multi-layer NMF method.
The rest of the paper is organized as follows:
Section 2 discusses the related work.
Section 3 introduces the proposed method through model description and model optimization. The experimental results are shown in
Section 4 and discussed by parametric sensitivity analysis, multiclassification experiment, node clustering experiment, and link prediction experiment. Finally,
Section 5 introduces the conclusion.
4. Results and Discussion
In this paper, MSDA-NMF model is constructed by using the complex hierarchical structure to realize the feature representation of input data. With 8 different data sets and 6 methods, effective comparison is achieved. The experiments are performed on a computer with Windows 7, 3.10 GHz and 32.00 GB RAM.
4.1. Experimental Objects
The parameters of the msDA-NMF model in this paper include three hyperparameters and , different layer dimensions , and convergence coefficient . In this complex network experiment, we first defined and or , . Based on these parameters, we obtain the optimal parameters of the model. Here, the number of clusters, K, is a variable that changes according to the tags in the data set.
4.2. Data Sets and Comparison Methods
This section provides a brief overview of the open data set and advanced complex network representation learning models used in various fields.
4.2.1. Data Set
We complete the task of multi-label node classification through four popular networks. To better verify our model’s clustering and link prediction robustness, we use three data sets with basic facts here. The statistical characteristics of the data set are shown in
Table 2:
GR-QC, Hep-TH, Hep-PH [
36]: This collaborative network is coauthored by authors from three different fields (general relativity and quantum cosmology, theory of high-energy physics, and phenomenology of high-energy physics) and extracted in the arXiv. The vertices of the network represent the authors, and the edges of the network represents an author who co-authored a scientific paper in this network. The GR-QC data set covers the smallest graph with 19,309 nodes (5855 authors, 13,454 articles) and 26,169 edges. The HEP-TH data set covers documents during January 1993 to April 2003 (124 months). It began during the beginning of arXiv and therefore essentially stands for the entire history of the HEP-TH section, with the citation graph covering all citations in a data set with
N = 29,555 and
E = 352,807 edges. If a paper I references a paper J, the chart contains directed edges from I to j. If an article is cited or cited, the diagram does not collect pieces of information about the paper. The HEP-PH data set is the second citation graph, taken from the HEP-PH section of arXiv, which covers all the citations in the data set with
N = 30,501 and
E = 347,268 edges.
Open Academic Graph (OAG) [
37]: This collaborative network of undirected authors is formed by an open available academic chart indexed from Microsoft Academia and on the Miner website in the United States, which includes 67, 768, 244 authors and 895, 368, 962 collaborative advantages. The labels of the network vertices represent the top research areas of each author, and the network contains 19 areas (labels) and allows authors to post in different areas, making related vertices have multiple labels.
Polblog [
38]: Polblog is a network of blogs used by American politicians with nodes. There are 1335 blogs, including 676 pages of liberal blogs and 659 pages of conservative blogs. If their blogs have a WEB connection, there are edges. The labels here mainly refer to the different political categories of politicians.
Orkut [
39]: As an online dating network, Orkut takes nodes as users and creates connections between nodes according to the user’s friend relationships. Orkut has multiple highlighted communities, including student communities, events, interest-based groups and varsity teams.
Livejournal [
39]: The nodes in the Livejournal data set represent bloggers. How two people can be friends, and there is an edge between them. It divides bloggers into groups based on their friendships and label them culture, entertainment, expression, fans, life/style, life/support, games, sports, student life and technology.
Wikipedia [
40]: This word co-occurrence network is owned by Wikipedia. The class tags for this network are POS tags inferred by the Stanford pos-tagger.
4.2.2. Control Methods
We compared the proposed SDA-NMF model based on three NMFs with the four most advanced network embedding methods. The comparison results are as follows.
M-NMF [
41]: M-NMF combines the community structure characteristics and 2-step proximity of nodes in the NMF framework to learn node embedding in network structure. Node representation is used to show consistency with the network community structure, and an auxiliary community representation matrix is used to link local characteristics (first- and second-order similarity). Community structure features in the network structure to make joint optimization through the optimization formula. The embedding aspect of the experiment is set to 128, and the other parameters are set according to the original paper.
NetMF [
34]: NetMF proves that models with negative sampling (DeepWalk, PTE and LINE) may be considered enclosed matrices, and demonstrates their superiority over DeepWalk and LINE in traditional network analysis and mining missions.
AROPE [
42]: This method moves the singular value decomposition frame, and the embedding vector to any order, and learns the higher-order proximity of nodes. Thus, further, it reveals its internal relations.
DeepWalk [
43]: Deep walk generates random paths for each node and treats the paths of these nodes as sentences in a language model. It subsequently proceeds to learn the embedding vectors using the Skip-Gram model. In the experiment, the parameters are set according to the original paper.
Node2vec [
44]: This method extends the use of a biased random walk in DeepWalk. All parameters are the default settings of the algorithm, but two offset parameters
P and
q are introduced to optimize the process of random walk.
LINE [
45]: LINE learns the embedding of the nodes through the definition of two loss functions, preserving the first- and second-order of proximity separately. The standard parameter setup is applied by default in this article, but the negative ratio is 5.
SDNE [
28]: SDN utilizes a deep autoencoder with a semi-supervised architecture to optimize first- and second-order similarity of nodes and explicits objective functions to elucidate ways to retain network structure. In the experiment, the parameters are set according to the original paper.
GAE [
30]: GAE model has some advantages in link prediction tasks of citation networks. The algorithm is based on a variational autoencoder and has the same convolution structure as GCN.
4.2.3. Evaluation Index
The robustness of the proposed model is verified by the following evaluation indicators:
NMI [
46]: The accuracy of comparison between algorithmically divided communities and generated standard communities is an important measure of community discovery. The measurement value is generally between 0 and 1. The higher the value is, the more accurate the detection result of the algorithm will be. When the value is 1, the result consistent with the label community can be obtained. The formula is as follows:
NMI [
47]: AUC is defined as the area under the receiver operating characteristic (ROC) curve, which is initially used to evaluate the classification effect of a classifier. Specifically, given the order of edges that are not observed, the AUC value is a random selection of the edge of a lost edge (e.g., an edge in the EP) and the probability is higher than the edge of a randomly selected edge not existing (for example, an edge in the US—E), a probability in the process of the realization of the algorithm, because, considering the time complexity, we usually calculate the probability of each observed no-edge value instead of a sorted list. To better estimate the value of AUC in the sorted list case, at each step we randomly select a missing edge and a nonexistent edge and compare their values. If in n independent comparisons, the value of the missing edge is higher than that of the non-existent edge for
n times, and they are equal for
n times, then AUC can be defined as:
4.3. Parameter Sensitivity Analysis
This section analyzes the effects of parameters and of the MSDA-NMF model on the clustering performance. These effects are on the real network, where are the i-level embedding dimensions. To determine the specific parameters of the model, this paper first fixed all the other parameters except the two changing numbers based on the OAG data set. Second, the effect of each change is verified by adjusting the parameters of the two changes. This paper takes the OAG data set as an example to investigate influences of different parameters. The effects of each parameter are explored through varying parameters and simultaneously keeping the others fixed. For example, we observe the effect of , by changing , and fixing with , , and so on. Specifically, we change from 100, 200, 300, 400, 500, and ask for .
Figure 2 shows the influence of dimensions
and
embedded in different layers on the effects of the three data clustering classes. The lighter the color of the point in the figure is, the larger the NMI value under the point coordinates. The NMI value of the yellow point is the maximum, and the NMI value of the purple point is the minimum. The size of a point also indicates the NMI value of the point. The larger the point is, the greater the NMI value of the point.
Figure 3a–c shows the performance of the NMI as these parameters change.
From the figures, we can see that:
1. In
Figure 2, when
is approximately 200,
is approximately 170 and
is 150, NMI is the maximum. If
is controlled and
remains unchanged, NMI decreases with the increase of
, and the dimension of
is not lower, but the NMI value is higher.
(1) When is less than 30, and is greater than 80 and less than 100, the model has the worst value of robustness.
(2) When is greater than 40 and less than 80, the NMI value tends to be stable with an increase in to less than 50.
(3) When is greater than 50 and is less than 30, the model can obtain better clustering performance at this time.
3. In
Figure 3b, when
y is in
, NMI does not change much, indicating that when
is in
, and the clustering performance is relatively stable with increasing
.
4. In
Figure 3c, we notice that in a particular range, when both
and
increase, NMI tends to be stable, while when
and
are in the range
, NMI reaches its maximum value.
As for the relationship between and the cluster evaluation index NMI, experimental results show that when , and , at this point, more effective network structure features can be obtained, even in low-dimensional space. Therefore, we set the data in the following experiment when , and .
4.4. Multiclassification Experiment
Table 3 illustrates the experimental results.
In the HEP-TH dataset, the MSDA-NMF () model has a micro result of and a macro result of , just slightly below the optimal values of and for all the comparative models, respectively, but the results of this model are better than all the other comparative models except for the optimal value. The MSDA-NMF () model has a micro result of , better than all comparative models and higher than the AROPE model, which has the highest micro result among them. In the macro comparison, the macro result of this model of is only slightly lower than the optimal value of of the GAE model, but still performs better in the optimal value than the other comparative models.
The results of the multiclassification experiments of the MSDA-NMF () and the MSDA-NMF () are compared. In the HEP-TH dataset, the results of the MSDA-NMF () model are lower than those of the three-layer model. It can be seen that the more intermediate levels there are in the classification experiments, the more effective the MSDA-NMF can be in the model. In other citation network data sets, OAG and HEP-PH, the performance of the three-layer models proposed in this paper is superior to other models.
In addition, the micro and macro data results of the MSDA-NMF () model proposed in this paper also outperformed the other models in the HEP-PH dataset, and the macro indicators were higher than the comparison models in the OAG dataset. In the HEP-TH, OAG and HEP-PH data sets, the micro-F1 and macro-F1 multi-classification evaluation performance of MSDA-NMF is better than those of NetMF, GAE and other popular feature models. This proves the effectiveness of our network-embedding model.
What is special here is that the AREOP model performs better than the MSDA-NMF method for the micro-F1 and macro-F1 metrics in our comparison based on the Wikipedia data set. This is probably because Wikipedia is a dense word co-occurrence network, so a relatively low order is sufficient to characterize Wikipedia’s web structure. Therefore, the MSDA-NMF method based only on matrix decomposition performs poorly in the classification task on sparse data sets.
4.5. Node Clustering Experiment
In this section, the behavior of node clusters is assessed according to the normalized mutual information (NMI) of typical metrics. In this paper, we use real data (including Polbog, Livejournal and Orkut) to assess the clustering performance of the model on real data sets. The NMI varies from 0 to 1, with a larger value indicating better cluster performance. In experiments to verify the clustering effect of the model, the standard K-means algorithm is used. Because the initial value has a significant impact on the clustering result, the clustering should be repeated 50 times and its average value should be considered as the result.
Figure 4 demonstrates the clustering ability of nodes with related NMI. It can be seen from the figure that:
1. In the Polblog and Orkut data sets, MSDA-NMF () and MSDA-NMF () obtained better results based on NMI compared with all models. In particular, in the Polblog data set, the MSDA-NMF () and MSDA-NMF () modes also have a great advantage in NMI value compared with the best DeepWalk mode among the comparison modes. This is because our method integrates lower-order structural features and multi-layer features to capture diverse and comprehensive structural features of the network, and can obtain better NMI values in data sets with a lower number of categories.
2. The model in this paper also achieves better results in Orkut data sets with fewer categories. What is special here is that in the relatively high number of categories in the Livejournal data set, the NMI value obtained by this model is slightly lower than that obtained by the GAE model. The main reason for this is that GAE has the same convolution structure as GCN and is based on a variational autoencoder, so GAE has better robustness in link prediction for citation networks.
3. SDNE and LINE only retain the proximity between network nodes and cannot effectively preserve the community structure. Random walk-based DeepWalk and Node2VEC can better capture the second-order and higher-order similarity. Although AROPE can capture the similarity of different nodes and capture more global structure information as the length increases, the omission of community structure makes the algorithm ignore module information. But for the sparse network and the network without prominent community structure, the modularity of the NMF is constrained by the similarity of nodes to each other. Therefore, its performance is relatively low.
This paper also compares the performance of the MSDA-NMF model of three-layer NMF and the MSDA-NMF model of two-layer NMF in multiclassification experiments. The results show that the NMI values obtained by the MSDA-NMF () model are higher than those obtained by the MSDA-NMF () model in all data sets. This proves the validity of the MSDA-NMF () model in network embedding.
4.6. Link Prediction Experiment
Link prediction mainly detects the accuracy of prediction by predicting which pairs of nodes may form edges and comparing them with the actually deleted edges. In the experiment, we randomly hide the , , , and edges as test data, and the other edges are connected. We use the remaining edges to train the robust results of node embedding vectors. We evaluated the effectiveness of our model based on typical AUC (Area Under A Curve) evaluation indexes.
To verify the validity of our proposed model, we first delete the
edge on all network data sets. As shown in
Table 4:
1. Compared with other algorithms, MSDA-NMF () improves the prediction performance by compared to the optimal prediction model GAE and .
2. Compared to the worst prediction model, MSDA-NMF () has a low prediction performance of compared to the optimal prediction model in the comparison model.
3. In the comparison data set of Orkut and GR-QC:
(1) The proposed MSDA-NMF () and MSDA-NMF () models obtained the optimal prediction results.
(2) However, in the GR-QC data set, the accuracy of the MSDA-NMF () model was higher than that of the MSDA-NMF ().
What is remarkable here is that in the Polblog data set, none of the methods proposed in this paper can obtain the optimal prediction effect. As shown in
Table 2, the Polblog data set has a small number of pairs of categories, which makes the model unable to obtain better prediction results in hierarchical operation.
Specifically, to further verify the influence of the proportion of training data on the model, this paper conducted tests through Livejournal and Orkut data. The results in
Figure 5 show that our method has a certain superiority over all the mainstream methods for removing different parts of edges in the two data sets. Because networks have different structural characteristics, the remaining edge of the Livejournal data set is close to the optimal level at
, and MSDA-NMF (
) obtains prediction results similar to those of MSDA-NMF (
). It can be seen that our method has an excellent performance in link prediction, indicating that network embedding results retain the structural characteristics of data sets.