1. Introduction
Random walks are ubiquitous in the field of mathematics, physics, and biology, and are interesting from both theoretical and practical perspectives. The basis of random walk theory can be traced back to the irregular motion of individual pollen particles, also known as Brownian motion. Random walks have been studied for many decades on both regular lattices and on graphs with different structures [
1]. As one of the most fundamental types of stochastic processes, they can be used to describe numerous physical phenomena, such as percolation [
2], diffusion [
3], biological systems [
4], and opinion dynamics [
5]. The first simple model of movement using random walks is uncorrelated and unbiased, and the time interval and step size of each random walk is fixed as a constant. We named it the random walk prototype. In this context, uncorrelated means that the direction of movement is completely independent of the previous directions moved: the location after each step taken is dependent only on the location in the previous step, i.e., the process is Markovian with regard to location. Unbiased means that there is no preferred direction: the direction moved at each step is completely random. Assuming that movement in any direction is allowed, this process is essentially Brownian motion, and such models can be shown to produce the standard diffusion equation, or heat equation. Different types of modifications based on the same simple underlying prototype, such as correlated random walks [
6], biased random walks [
7], and Levy walk [
8], have been proposed to be used in different contexts.
Online social networks, such as Twitter, Facebook, WeChat, and Bitcoin, provide users with new and interesting ways to connect with one another. The evolution analysis of social networks mainly focuses on two aspects. One is devoted to understanding the mechanism of network formation, such as triadic closure [
9], growth and preferential attachment [
10], homophile [
11], reciprocity [
12], and structure balance [
13,
14,
15,
16]. The other is dedicated to model the generation of real-world social networks with specific topology structure, such as scale-invariant of the degree distribution [
17,
18,
19] and small world [
20].
Both mechanism analysis and modeling are devoted to understanding the driving rules of network formation and evolution, and have a wide range of real-life applications such as viral marketing [
21], personalized recommendation [
22], and social mobilization [
23]. Despite many endeavors on the microscopic analysis of the edge-by-edge evolution of large online social networks [
24], there are still numerous challenges which have to be addressed [
25].
In this paper, considering a walker moving on an infinite one-dimensional uniform lattice (i.e., a line split into discrete points), we address this problem by introducing a simple discrete time random walk model. We analytically investigate the stochastic generation process of signed edges of real-world online signed social networks. The main goal of our theoretical analysis is to provide an analytical explanation for the signed edge dynamic evolution of a network under the simple random walk process.
Specifically, we assume that the generating process of the positive and negative edges of an empty network is a discrete time random walk. Within any unit time interval τ, the empty network has h edges. Therefore, the generating process of signed edges can be characterized by a discrete random walk model on a straight line. Considering the conditions τ → 0, h → 0, we find that the incremental dynamics of signed edges can be characterized by a one-dimensional thermal diffusion or parabolic partial differential equation, which is related to Brownian motion and the Gaussian process. Furthermore, we verify that the growth of signed edges in a dynamic signed network satisfies a Gaussian distribution.
One of the focal issues of network structure evolution that is causing many notable problems in the literature is the confusion between the observed pattern and the underlying process that generated it. In this article, we take a unique approach, instead of only focusing on the global network structure and then hypothesizing about what kind of microscopic edges generating behavior would reproduce the observed macroscopic network structure pattern. Here, we explore the strategy of signed edge evolution from the micro level perspective of a simple random walk, and empirically validate how the micro level (local Brownian motion) evolution mechanism gives rise to the global emergence pattern of the Gaussian pocess. From the perspective of statistical physics, the result revealed that the signed edge growth dynamics process can be regarded as a thermodynamic diffusion process. Moreover, our findings supplement earlier network models based on the degree preferential attachment or node-centered mechanism.
Our work will aid in understanding the generating mechanism of network systems with positive and negative links, and has profound potential applications, such as information spreading, evolutionary games [
26], trust transmission, and dynamic structural balance [
27] on signed complex networks.
The remainder of this paper is organized as follows.
Section 2 implements our proposed method.
Section 2.1 provides the incremental dynamic model of signed edges behind the simple random walk.
Section 2.2 presents the continuity implementation of the one-dimensional random walk, and illustrates that the growth process of signed edges can be analogical to a thermal diffusion dynamic process. In
Section 2.3, Brownian motion and Gaussian process are proposed as a theorem to describe the signed edge increasing process. In
Section 3, we validate that the theoretical stochastic analysis is consistent with the observations of the real-world temporal signed network edge increment. Finally,
Section 4 concludes the paper and briefly explores directions for future research.
2. Proposed Method
2.1. Random Walk on One-Dimensional Uniform Lattice
We assume that the generating process of the positive and negative edges of an empty network is a discrete time random walk on an infinite one-dimensional uniform lattice. Within any unit time interval τ, the empty network has h signed edges (all positive, all negative, or mixed). To make the model more concise and intuitive, the discrete time random walk in this paper satisfies the following conditions:
- (a)
Homogeneity: the probability of a walk only depends on displacement and is independent of the starting point of the walk.
- (b)
Independent: each walk is an independent random walk, and the direction (right “+” denotes the increment of positive edges, left “−” represents the increment of negative edges) of current movement is completely independent of the previous directions moved.
- (c)
Universality: the time interval τ and step size h of each random walk are fixed.
- (d)
Unbiased: there is no preferred direction, the direction moved at each step is completely random, and the probability of each moving to right “+” (generating positive edges), or moving to left “−” (generating negative edges) is equal.
Now, we explain the generating process of signed network edges based on the above four assumptions of discrete one-dimension random walks. We set the right side of the origin on the straight line to be the positive direction. The origin here contains two meanings. Case I: an empty network with neither negative nor positive edges. Case II: a non-empty network with the same number of positive and negative edges.
It is assumed that after experiencing time nτ, the network completes the process of increasing the number of positive edges from ih to jh, and the probability of this event occurring is defined as P (ih − jh, nτ). Homogeneity suggests that P (ih − jh, nτ) only depends on the increment of positive edges |i − j| and n. Independent increase means that the current probability of positive or negative edge generation does not depend on the results of the previous step. Universality implies that a null signed network starts from x = 0 edges, and then adds fixed h edges in a fixed time interval τ. Unbiased means that the edge generation is assumed to be fully random, so the probabilities of adding both positive and negative edges are 1/2.
Without loss of generality, starting from time
t = 0, according to homogeneity,
P (
ih −
jh,
nτ) only depends on the positive edges increasing
m = |
i −
j| and
n; after time interval
nτ, without considering edges broken, the network would obtain
nh edges; assuming the number of positive edges is
kh and the number of negative edges is (
n −
k)
h, we have
kh − (
n −
k)
h = |(
i −
j)|
h. The above result is prepared on the basis of the universality condition. In the case of
kh > (
n −
k)
h, i.e., the number of positive edges is not less than that of the negative edges, we arrive at
k = (
n +
m)/2, with
m =
i −
j > 0. According to the unbiased condition, after
n time steps, the probability of an empty network obtaining
kh positive edges and (
n −
k)
h negative edges is given by
where
m =
i −
j > 0,
.
According to the total probability rule, we have
Due to the fact that
P ((
i −
j)
h,
nτ) only is dependent on
m and
n, let
P ((
i −
j)
h,
nτ) =
Pm,n, and Equation (2) is transformed into
Now, we obtain a discrete second-order partial differential equation (PDE), with initial value condition
and boundary condition
2.2. Continual Random Walk and Diffusion Equation
Let
x = (
I −
j)
h, and (
n − 1)
τ =
t; therefore the positive edge incremental process can be described by a random variable
X(
t); at time
t, the probability of a network having
x positive edge increment is
P (
x,
t), i.e.,
P (
x,
t) is a time-dependent probability distribution corresponding to the random variable
X(
t). Furthermore, based on Equations (2) and (3), we arrive at a second-order partial differential equation.
Subtracting
P (
x,
t) and dividing by
τ on both sides of the equation simultaneously, we have
where
D is a constant. On the other hand, with the conditions
τ → 0,
h → 0,
Equation (7) is reduced to the continuous description of
P (
x,
t) as shown in Equation (8),
Alternatively, the time-dependent can be determined by considering the effective probability density function (PDF) with appropriate Dirac boundary conditions , when , and , when .
Then, we obtain the classic diffusion or parabolic partial differential equation, which is closely linked with Brownian motion, as follows:
2.3. Brownian Motion and Gaussian Process
With initial Dirac boundary conditions, the diffusion equation Equation (9) for the positive edges increment random process
X(
t) can be summarized as solving the following Cauchy problem
where
δ(
t) is the Dirac
δ function, and
.
Theorem 1. The positive edge incremental process X(t) of a signed network is a Brownian motion which satisfies the diffusion Equation (9) with probability density function (PDF) q (x, t) and diffusion constant D. With initial condition q (x, 0) = δ(x), Equation (10) has the fundamental solution as
Proof. Recall that Equation (1) is a form of the binomial distribution, with mean
and variance
. For large
n, this converges to a Gaussian distribution; therefore, after a sufficiently large amount of time
t =
nτ, the location
x =
mh of the walker is normally distributed with mean 0 and variance
h2t/τ. Taking the limit
τ,
h → 0, such that
h2/τ =
D, which gives the PDF for the location of the walker after time
t. For the one-dimensional solution of Equation (11), considering <
X(
t)> = 0, it is easy to show that <
X2(
t)> = 2
Dt. Note that this is the fundamental solution of the diffusion equation Equation (10) [
28], and
X(
t) ∼
N (0, 2
Dt) is the displacement of a Brownian particle at time
t, describing the increment of the signed edges at time
t. □
The theorem illustrates the absence of a preferred direction or bias, while suggesting the standard property of a diffusive process that <X2(t)> increases linearly with time. In physics, the property is the Gaussian of the probability density function to find the diffusing particle at position X(t) at some time t. From a more mathematical viewpoint, the Gaussian emerges as a limit distribution of independent, identically distributed, random variables (the steps of the random walk) with finite variance, and in that sense, it assumes a universal character.
It is well known that a Brownian motion X(t) corresponds to a Gaussian process P (x, t).
Therefore, we conclude that the positive edges incremental process can be explained by Brownian motion and diffusive processes.
Similar to positive edge incremental process X(t), according to symmetry, we can derive that the negative edge incremental process Y(t) is a Brownian motion based on an infinite one-dimensional uniform lattice model. In fact, after nτ time steps, we have negative edges increasing (n − k)h − kh = |i − j|h, and obtain k = (n − m)/2 with kh < (n − k)h. Again according to the unbiased rule, after n time steps, the probability of an empty network obtaining kh positive edges and (n − k)h negative edges is given by Equation (1), which still holds with k = (n − m)/2. Therefore, we confirm that the negative edge incremental process Y(t) of a network is also a Brownian motion which satisfies Theorem 1.
Furthermore, according to the independent and unbiased conditions (each walk is an independent random walk, and the probability of generating positive edges and the probability of generating negative edges are equal), X(t) and Y(t) can be regarded as two independent stochastic processes; therefore, X(t) + Y(t) is also a Brownian motion. With this, we conclude that the total edge incremental process of a signed network is also satisfied by thermal diffusion Equation (9) and Gaussian process Equation (11).
3. Empirical Analysis
In this section, to explore the signed edge incremental dynamic process, we investigate a real-world online signed network, Soc-RedditHyperlinks (Reddit for short). Reddit is a hyperlink network representing the directed connections between two subreddits (a subreddit is a community on Reddit). Users on Reddit form and join interest-based communities called subreddits (e.g., ‘r/Documentaries’ or ‘r/StarWars’), and within these communities, they post and comment on content (e.g., images, videos, links to articles, etc.), and then construct links (labelled by −1 or +1) which are explicitly positive or negative. Our analysis was conducted with publicly available Reddit data [
29], and relevant codes and data are available on the project website [
30], or the partial dataset is accessible from the data collection site: SNAP: Social network: Reddit Hyperlink Network (stanford.edu). In our investigation, the signed network is extracted from 31 December 2013 16:39:00 to 30 April 2017 16:52:00, as shown in
Table 1. To explore signed edges incremental dynamic process, we divide the period of the dataset into 305 time stamps.
Figure 1a–c depict the positive, negative, and total signed edge incremental processes based on the samples from the Reddit network dataset.
Figure 1a shows the displacement observation curve of Brownian motion
X(
t) at time
t, describing the increment of the positive signed edges at time
t. Similarly,
Figure 1b,c correspond to the growth paths of the negative signed edges
Y(
t) and the total signed edges
X(
t) +
Y(
t) at time
t, respectively. Meanwhile, we note that the number of positive edges in real networks is much higher than that of negative edges, i.e.,
i −
j > 0 is almost surely true, which is consistent with the empirical observations of structure balance [
15,
31].
Figure 1d–f show the frequency histograms of the incremental observation data of the signed edges. The red solid lines are the Gaussian probability density fitting curves.
Figure 1 suggests that the signed edge incremental process results in Gaussian probability density.
Figure 2 provides the plots of normal distribution tests on the sequences of signed edge increment. As we observed, except two tails showing some deviations, over the entire displayed time range, the agreement between Gaussian PDFs and empirical probability density distributions is excellent.
Furthermore, to verify the normality of
Figure 1d–f, we randomly select 305 samples from normal distribution N (11,078, 2432.22), N (−11,078, 2432.22), and N (14,090.3, 3065.83), respectively.
Then, we conduct
t-tests on both empirical data and random samples. At a significance level of 5%, the
t-tests of the three groups (empirical PDF (
Figure 1d) and N (11,078, 2432.22), empirical PDF (
Figure 1e) and N (−11,078, 2432.22), and empirical PDF (
Figure 1f) and N (14,090.3, 3065.83)) with two samples all accepted the normality assumption of the empirical data.
Our theoretical analysis is confirmed by the empirical result of real-world online signed social networks. The incremental sequence of signed edges as a simple random walk naturally leads to the evolutionary model of signed edges as a Brownian motion and approximate Gaussian distribution density profile.
4. Conclusions
Most of the studies on real-world network evolution, such as world wide web, online social networks, and biological networks, focused on static macroscopic properties, such as communities, degree distributions, diameter, clustering coefficient, etc. However, one of the main issues of network structure evolution that is causing many notable problems in the literature is the confusion between the observed pattern and the underlying process that generated it.
In this paper, our aim is twofold: to introduce the mathematics behind random walks in a straightforward manner and to explain how such models can be used to aid our understanding of signed edge evolutionary processes. We introduce the mathematical theory behind the simple random walk and explain how this relates to Brownian motion and diffusive processes in general. The proposed theorem suggests how a simple random walk model can be used to describe such sequences of signed edge increment. The obtained analytical results are corroborated by direct real-world signed network empirical observation.
It is useful to note that the number of positive edges in real networks is much higher than the number of negative edges. Therefore, it inspires us to find a more appropriate random walk model to describe the growth process of signed edges, for example, biased random walks (BRWs) [
32,
33]. BRWs can describe paths that contain a consistent bias in the preferred direction or towards a given target, due to the bias in the probability of moving in the preferred direction. For BRWs, the unbiased condition (the probability
p of generating positive edges and the probability
q of generating negative edges is equal) in the simple random walk will no longer hold, and
p >
q is more in line with the growth pattern of signed edges in real-world networks. Recently, attention has been paid to structure entropy measures applied to signed network [
34]. Combining the signed edge evolutionary model and structure entropy would be an interesting topic to follow up on. This is a potential avenue for future research.