This Section describes the main methodology used in this paper.
Section 3.1 shows how to create an ego network for each user. In
Section 3.2, the difference between active and inactive users is explained after calculating the persistent homology of the ego network and visualizing it as a persistence diagram and barcodes. In
Section 3.3, the classification method combining persistent homology and machine learning is explained. The main methodology of this paper is as follows:
3.2. Persistent Homology in the Ego Network
After obtaining the egocentric network of each node, the persistent homology of each egocentric network is computed to extract the topological features of the ego network, and finally, the features are visualized as persistence diagrams and persistence barcodes. The process is as follows.
- (1)
Construct Simplicial Complex:
Convert the ego-network into a series of Rips complexes. Rips complexes capture the topological structure of the network by adding higher-order simplices (such as triangles).
For example, the simplicial complex constructed from the self-network in
Figure 1 is represented as ([0, 1], 1.0), ([1, 2], 1.0), ([2, 3], 1.0), ([4, 5], 1.0), ([5, 6], 1.0), ([6, 7], 1.0), ([7, 8], 1.0), ([0, 2], 2.0), ([1, 3], 2.0), ([3, 4], 2.0), ([4, 6], 2.0), ([5, 7], 2.0), ([6, 8], 2.0),([0], 0.0), ([1], 0.0), ([2], 0.0), ([3], 0.0), ([4], 0.0), ([5], 0.0), ([6], 0.0), ([7], 0.0), ([8], 0.0).
- (2)
Compute Persistent Homology:
For each simplicial complex, compute its corresponding homology groups. Homology groups describe the connectivity and cyclic nature of the topological space. Persistent homology focuses on how these homology groups change over time (i.e., as the simplicial complex grows).
- (3)
Generate Persistence Diagrams and Barcodes:
Persistence diagrams and barcodes are two methods for visualizing the results of persistent homology. The persistence diagram and persistence barcode for the previously mentioned self-network output are shown in
Figure 2b.
Persistence Diagram: Each point represents a homology class. The horizontal axis indicates the birth time (when the class first appears in a simplicial complex), and the vertical axis indicates the death time (when the class disappears). Homology classes with longer lifespans appear closer to the diagonal in the diagram.
Persistence Barcodes: Each bar represents a homology class. The horizontal axis indicates the time range (birth to death), and the vertical axis indicates the index of the homology class. Longer bars correspond to homology classes with longer lifespans.
Since egocentric networks formed by active individuals have a different topology than egocentric networks formed by inactive individuals, and active individuals have more activities in the network compared to inactive individuals, active users therefore have more persistent features and more high-dimensional features, which means that the persistence diagrams and barcodes of the two are different, and the following
Figure 2 shows the persistence diagrams and barcodes of active and inactive users.
Figure 2, the red dots and red barcodes are 0-dimensional persistent features, and the blue dots and blue barcodes are 1-dimensional persistent features. 0-dimensional features usually refer to connected components, whereas 1-dimensional features refer to cycles. Users with longer feature durations and more high-dimensional features in
Figure 2a are active users, while users with shorter feature durations and fewer high-dimensional features in
Figure 2b are inactive users. This means that active and inactive users can be observed from the duration graphs and barcodes.
Therefore, we use the complexity of topological features to describe the user’s activity. If the topology feature is more complex, the more active the user is.
3.3. Categorizing Users by Activity
The persistence features obtained from the ego network cannot be directly input into machine learning for classification, so this paper considers processing the persistence feature data to obtain the persistence entropy [
21], and then input them into machine learning for classification. Persistent entropy is an important concept in persistent homology analysis. Persistent entropy can be defined as the information entropy of the probability distribution of persistence intervals in the persistence diagram. Persistence intervals are the duration periods of topological features during topological decomposition, and they can be used to represent the evolution and change of different topological features in a dataset.
denote the set of persistence intervals in the persistence diagrams, where
denotes the start and end time of the
i-th persistence interval,
denotes the probability that the duration interval
occurs in the dataset. The persistence entropy
19] is defined as:
Persistence entropy represents a measure of the complexity of persistent features. If the entropy value is high, it means persistence features are more complex; if the entropy value is low, it means the feature distribution is simpler.
Based on the persistence entropy, we define the Norm Entropy (NE(X)):
μ is the mean of the entropy value and σ is the standard deviation. Equation (2) clearly describes the complexity of the topological features, meaning that it can describe the active nature of the nodes.
The NE(X) reflects the distribution of different topological features in the persistence diagram, when the NE(X) is high, it indicates that the topology of the dataset has a large diversity and complexity, which is the active; the NE(X) is in the middle of the dimensional general users; the NE(X) is low, it indicates that the topology of the dataset is relatively simple and consistent, which is the inactive.
After obtaining the NE(X) of each ego network, we input the 0-dimensional, 1-dimensional, and 2-dimensional NE(X) of each ego network as 3D feature vectors into the machine learning model. At the same time, we also tried to use only 1- and 2-dimensional NE(X) as inputs to the feature vectors for classification analysis. Finally, we compared the classification results of these two input methods.