In addition to thoroughly exploring interactions among users, we also consider the issue of inherent individual characteristics, such as personal activity level (), and personal traits ().
For user relationship prediction, a classification is performed considering sentiment reversal , social relationship , personal characteristics , and personal activity level . The goal is to classify the user interaction as either a friendly relationship or an adversarial relationship .
The overall implementation of the algorithm follows the outlined approach. Finally, logistic regression is employed to fuse all extracted relevant features, predicting potential signed relationships between users: positive relationships and negative relationships .
Topic Relationship Prediction Based on Meta-Path Similarity
This subsection focuses on topic sign relationship prediction. From the original heterogeneous information network , we extract edges of the type representing historical sentiment relationships between users and topics, i.e., and . This forms the original input network. By constructing meta-paths, we mine rich semantic information in the user sentiment network, obtaining the coupling and competition between topics. This leads to the construction of a sign network representing relationships between topics.
Non-controversial Topic Mining: This section analyzes the possible correlations between topics based on the main objectives of the paper, namely, coupling and competition arising from user behavior. Topics with coupling tend to receive similar sentiment attitudes from the same group of users, while topics with competition usually encounter opposing sentiment attitudes. Exploring such relationships serves as contextual factors aiding in the discovery of users’ unknown attitudes towards topics in subsequent tasks. Many methods assume that tasks involving mining node relationships require simultaneous consideration of two nodes, neglecting the inherent properties of the nodes. In this context, user emotional states may depend on the nature of the considered topics. Naskar et al. [
32], experimenting with topics related to various terrorist attacks, such as the Syrian terrorist attacks, indicated that users maintain highly negative sentiment attitudes towards topics related to terrorist attacks. The sentiment evolution of such topics deviates from the average level of general topics. Therefore, it is crucial to consider such topics, which have special properties, separately.
Due to the inherent nature of topics leading to user tendencies in sentiment, topics for which users tend to exhibit consistent attitudes are considered non-controversial topics. Non-controversial topics include strongly positive and strongly negative topics. Strongly positive topics refer to topics for which users participating in discussions generally maintain a positive attitude, such as HappyNationalDay and WinterOlympicsSmoothOpening. On the other hand, strongly negative topics refer to topics for which users participating in discussions generally maintain a negative attitude, such as TerroristAttack and EasternAirlines MU5735 Crash.
To avoid the problem of a small number of laws caused by sample sparsity, we define topics with more than 10 users participating in discussions in historical sentiment data as candidate topics. The set of non-controversial topics is mined by calculating the information entropy of candidate topics. From the original heterogeneous information network
, edges of the types representing positive and negative sentiment relationships, i.e.,
and
, can be extracted. Information entropy is utilized to measure the diversity of user attitudes towards each topic. Information entropy is a method used to measure the degree to which the categories in a dataset tend to be consistent. Larger information entropy indicates more balanced user attitudes, while smaller information entropy indicates that the topic has strong special properties leading to user tendencies in sentiment. The formula for calculating the information entropy
for topic
is shown as follows:
where
represents the proportion of users with positive or negative sentiment in all users participating in topic
, and
is the information entropy of topic
ranging from 0 to 1. Topics with information entropy less than
are considered non-controversial, and the sentiment polarity of these topics is determined. Topics with different sentiment polarities are competitive, while topics with the same sentiment polarity are coupled, determining the correlation between non-controversial topics.
Heterogeneous Signed Information Network is represented as follows: .
Node types are defined as
, where
u represents nodes of user type and
v represents nodes of topic type. Their initial node embeddings are represented in one-hot encoding based on their respective attribute features. We use
Figure 3 to illustrate the various relationships and their corresponding symbolic representations.
Edge types consist of six relationships in three semantic spaces . In the user–user relationship space , where represents friendship relationships and represents antagonistic relationships between user-type nodes u. In the user-topic relationship space , where represents positive sentiment links (indicating user u supports topic v) and represents negative sentiment links (indicating user u opposes topic v). In the topic–topic relationship space , where represents competitive relationships and represents coupling relationships between topic-type nodes v.
Since each relationship corresponds to fixed types of nodes, the adjacency matrix of this heterogeneous symbolic network can be represented as:
where
indicates that the relationship between those nodes is unknown.
Mining Relationships Between Controversial Topics. Most existing methods primarily consider the direction from the topic’s attributes, utilizing clustering methods based on feature similarity to find similar topics and ultimately obtaining potential signed relationships between topics. The drawbacks of this method are: first, it only focuses on positive relationships between topics, overlooking the existence of negative relationships; second, there might be similar but negative relationships that cannot be determined solely by attribute similarity.
The coupling and competition of topics are determined by the user’s attitude, making the analysis based on the user’s historical sentiment data reasonable. It is important to note the corresponding user and the signed polarity of their sentiment links. Path-based methods preserve node feature information along the paths and retain different semantic relationships based on different path patterns. Path-based methods are often used for semantic extraction between nodes in heterogeneous networks. Additionally, considering all topics requires calculating the similarity between each pair of topics, involving expensive matrix multiplication operations leading to increased time complexity. Therefore, we set requirements for the candidate topic neighbors to prune the matrix multiplication. In the process of extracting path instances, the second-order reachable neighbors for each topic i with a reachable path count greater than 5 are selected as candidate neighbors.
In the defined heterogeneous information network , nodes involve user type u and topic type v, and edge types include three types , where one is the unsigned social relationship between user type nodes . Since links have different signed semantics, four types of meta-path patterns are defined, including:
Meta-path Pattern One: The same user expresses a positive attitude towards two topics
and
.
Meta-path Pattern Two: The same user expresses a negative attitude towards two topics
and
.
Meta-path Pattern Three: The same user expresses a positive attitude towards the starting topic
and a negative attitude towards the ending topic
.
Meta-path Pattern Four: The same user expresses a negative attitude towards the starting topic
and a positive attitude towards the ending topic
.
The four meta-path patterns can be summarized into two types of paths: symmetric meta-paths and asymmetric meta-paths. Symmetric meta-paths refer to instances where the same user expresses similar sentiments towards two different topics, indicating a coupling relationship between topics. Asymmetric meta-paths, on the other hand, refer to instances where the same user expresses opposite sentiments towards two different topics, indicating a competitive relationship between topics. The similarity between meta-paths is calculated based on these two path types:
Calculation of Topic Coupling based on Symmetric Meta-Paths: In the context of meta-path type
, we count the total number of paths from topic
i to topic
j, denoted as
, and the total number of paths from topic
i and topic
j to themselves, denoted as
and
respectively, under the conforming meta-path pattern
. These values represent the total reachable paths for topics
i and
j to themselves in the path pattern
.
Calculation of Topic Competition Based on Asymmetric Meta-paths: Considering that the linkages in meta-path type
have different semantics, the method based on PathSim is not applicable. Therefore, the HeteSim method is considered. Assuming the encounter probability of the two end nodes under path
P:
where
represents the product of the reachable probability matrices on the left and right sides of path pattern
, with the midpoint type
M as the boundary.
denotes the adjacency matrix of topics
i and
j, normalized along the row direction.
After calculating
, normalization is performed:
Topic i first checks whether the second-order neighbor topic
j satisfies the criteria of candidate topic neighbors. If topic
j qualifies as a candidate topic neighbor, then the coupling degree and competitive degree for topics
i and
j are calculated using the two methods mentioned above and are combined to obtain the meta-path similarity
:
Feature fusion: For topic relationship prediction, the comprehensive classification of the relationship
Q between topics is based on the fusion of topic characteristics
and metapath similarity
.
where ⊙ is the exclusive OR (XOR) operator, yielding one for identical elements and zero for different ones. Finally, logistic regression is employed to fuse and classify all relevant features, completing the prediction of potential signed relationships between topics: coupling relation
and competitive relation
.