1. Introduction
Graphs are used extensively in modern times to model real-world complex systems. The examples are as diverse as world wide web, rail networks, electrical networks, communication networks, social media networks, protein graphs, modeling of human brain, etc. Perhaps the first and simplest model for random graphs was proposed by Erdos and Renyi in the 1960s [
1], in which for any pair of vertices, an edge is added with probability
p. The Erdos–Renyi model is a purely combinatorial model of random graphs. The model has the property of independence between a set of edges. Based on the range in which probability
p lies, Erdos–Renyi graphs show different behavior in terms of the number of connected components, expansion properties, spectral distribution, etc. The Erdos–Renyi model has been extensively studied in the literature (see [
2,
3]). The Erdos–Renyi model is not very appropriate when the connectivity of nodes is based on physical proximity. Gilbert initiated the study of random geometric graphs [
4] in
. The points are chosen from a 2D plane with Poisson distribution, and two points are joined by an edge if the Euclidean distance between points is less than
r. In subsequent years, random geometric graphs were studied very extensively due to their wide applicability. They were studied in higher dimensions, by changing ambient space to hypercube
Euclidean spheres. Several variants of the model have been explored like soft RGGs, directed RGGs [
5], dense RGG [
6], and translation invariant RGGs, to name a few. The random geometric graph model has found application in a wide variety of areas like wireless networks [
7], consensus [
8], robot motion planning [
9], spread of virus [
10], and protein interaction [
11]. Instead of giving a long list of related works, we point the reader to an excellent survey [
12] for modern development on the topic, and for more historical development we refer to [
13].
We landed on defining and exploring the random covering graph model from an entirely different route. The underlying covering process used in our definition of the RCG model is quite common in the study of integer lattices and geometry of numbers in general. Specifically, a similar process has been used by Ajtai–Kumar–Sivakumar [
14] in the sieving algorithm for the shortest vector problem. While analyzing certain properties of the sieving process, we found it convenient to define the underlying graph as we did in the RCG definition.
Our definition of RCG is similar to geometric random graphs (RGGs) but has a crucial difference. We pick points uniformly at random from the Euclidean ball of unit radius in n-dimensional space. We retain only those random points in the collection which are at a distance at least r from the already chosen points (this is the crucial difference with the standard RGG model). To ensure that the above process terminates, we define suitable criteria based on the expected number of random points we need to pick before we find a point which is at distance r apart from the already chosen points. We put up an edge between the two chosen vertices if they are at distance at most . Clearly, we cannot add any new point to our collection only when every point inside the ball of the unit radius is at distance at most r from some chosen point, that is, balls of radius r centered at the chosen points cover the unit ball. That is the reason we name our model, random covering graphs. We believe that putting a lower bound on the distance between the chosen points makes the model intrinsically linked with the geometry of the underlying Euclidean space. The model has both global (e.g., it can be used to study random packing density or random covering densities) as well as local aspects (e.g., we observe an interesting pattern in the degree distribution of the RCG, which might be connected with the kissing number, as we note in the paper that clearly the kissing number is a local property).
First, we prove the bound on the number of vertices and edges of RCGs using a simple packing argument. Empirically, we observe that the log of the expected number of vertices and the log of the expected number of edges of RGGs with parameter r are a linear function of . Next, we study the degree distribution of RCGs, varying the parameter r. We observe two distinct bulges in the graph in the dimensions 2, 3, 4 and 5. The second bulge is visible in dimension 6 if we carefully observe it, but it is not prominent. As r decreases, we observe that the second maxima reduces, and it also shifts away from the first local maxima. It will be, mathematically, a very interesting problem to understand the reason behind this, particularly the two-lobe structure of the degree distribution graph.
Interestingly, the first local maxima are located at degrees which are close to the kissing number in the respective dimensions. The kissing number of the
n-dimesional Euclidean space is the largest number of unit radius
n-dimensional spheres one can place around a unit radius sphere so that no spheres intersects. The term kissing is derived from the game of billiards, where it is used to indicate the touching of balls. The kissing number problem has very interesting history: The debate on the kissing number of 3-dimensional space dates back to Newton and Kepler. For 3-dimensional space, the answer was not known at that time, and it was either 12 or 13. Even today, exact kissing numbers are known only for dimensions 1, 2, 3, 4, 8 and 24 [
15,
16,
17], which are, respectively, 2, 6, 12, 24 and 196,560. For other dimensions, only lower bounds and upper bounds are known (see [
15,
18] for the current best-known upper bounds and lower bounds). All of the above results except for dimension 1, 2 underlie deep mathematics, and even some of the proofs require computer assistance. Clearly, the properties of RCGs are linked with the geometric properties of the underlying space. We believe that insisting on the lower bound on the distance between the chosen points strengthens this linkage, as it is against general geometric random graphs.
The first local maxima of the degree distribution graph for RCG are close to kissing numbers in dimensions 2, 3, 4 and 5. Does it indicate some relation between RCG and the kissing number problem? Even if there might not be a direct connection with the kissing number, we believe that the mathematical study of this model certainly sheds some light on the local geometric properties of Euclidean space. Moreover, it appears that even to study the combinatorial properties of the RCGs such as planarity (in general, finding the genus of the graph), coloring number of the graphs, size of the sparsest cut, maximum flow, etc., the interplay between geometric and combinatorial techniques would be needed, which makes the model quite interesting.
It turns out that the degree distribution of RCGs as well as the distribution of the eigenvalues of RCGs deviate (which in itself is interesting, as RCGs too are geometric random graphs of a special kind) from that of Erdos–Renyi random graphs and general geometric random graphs. With reference to degree distribution, power law distribution or log power law distribution has been observed in several practically useful networks [
19,
20]. Moreover, the degree distribution of RCG demonstrates a “heavy tail”, right-skewed nature: a common feature of many practical networks [
19,
20,
21]. The spectrum of RCGs is closer to the power law distribution unlike other random graph models, which have a spectrum close to Wigner’s semicircle distribution [
22,
23,
24].
We note that in a recent work [
6], the authors emphasized upon a similar point as we are doing in the current work about a geometric random graph model which deviates from Erdos–Renyi or general geometric random graphs in terms of various parameters. It will be interesting to further investigate the effectiveness of the RCG model in modeling practically arising complex networks. In the current work, our main motivation was to introduce the model and motivate further study of the model. There is substantial scope for new experiments to evaluate various properties of the model, such as conductance of the graph, expansion properties of the graph, average number of triangles, clustering coefficients, etc.
The following is the organization of the paper. We start with the preliminary section, followed by a section in which we define the random covering graph model. In the next two sections, we summarize our experimental findings and compare and contrast them with those of other random graph models. We conclude with a summary and discussion.
2. Preliminaries
We begin with some basic definitions and preliminaries which are useful throughout the paper. The definition of geometric random graphs and the new family of graphs we will be defining rely on sampling points from higher-dimensional Euclidean balls. First, we recall some related definitions. Let
denote a set of real numbers. For any
k tuple,
, and for real number
, the
norm of
x denoted by
is defined as
where
denotes the absolute value of
.
along with
norm forms a metric space and thereby any points
satisfy triangle inequality
with equality only if vectors
are parallel. For
, the distance between
x and
y with respect to norm
is denoted by
. For
and a positive real number
r, the ball of radius
r with respect to
norm centered at
x is defined as
so
is simply the collection of all points with distance, at most,
r from
x. In most of the part of the paper, we will be working with Euclidean space with the associated
norm. For ease of notation, unless stated otherwise, by
we mean
, respectively.
There are several practical scenarios where data items of interest are naturally expressed as vectors in higher dimensions: for example, a product is associated with a tuple such that each entry of tuple is an attribute of the product, the image is represented by the vector of its features, a word is expressed as a high-dimensional vector using word embeddings like WordtoVec, etc. Our geometric intuition is formed in 2 and 3 dimensions and is often misleading in higher dimensions. For example, if we pick a point uniformly at random in
d dimensional unit spheres centered at origin, then with very high probability, the distance from the point to the origin is between
and 1, where c is an absolute constant independent of
d. That is, with high probability, the point would lie in the outer annular fringe of width
. To give another example, uniformly random chosen vectors in higher dimensions are almost orthogonal to each other with very high probability unlike our intuition in 2-D or 3-D. One has to be careful while geometrically interpreting the higher-dimensional data. For a thorough treatment of various properties of higher-dimensional space, we refer to [
25,
26].
The volume of radius
r Euclidean ball in
is
where
is Euler’s gamma functions, which extends the usual factorial function to non-integer arguments. For positive integer
n,
and
. A simple method to sample points uniformly at random from a low-dimensional sphere centered at origin is as follows: Choose each coordinate
of point uniformly at random in the range
. This amounts to choosing point
x uniformly at random from the box of dimension 2 in each coordinate centrally placed at origin. We discard the point if it falls outside the sphere and regenerate the sample point uniformly at random as discussed above until we obtain a point inside the unit sphere. This simple Monte Carlo simulation, even though it works for low dimensions, soon becomes computationally infeasible, as it ends up discarding too many points because the unit sphere has a vanishingly small volume inside the bounding box (
. There are several different sophisticated ways to efficiently sample points uniformly from the
sphere. We will discuss a suitable efficient sampling procedure in the next section.
We recall some basic probability distributions useful for the paper.
Definition 1 (Wigner’s Semicircle distribution)
. It is a probability distribution on for positive real number R whose probability density function is defined asfor , and is defined to be 0 if . Definition 2 (Power Law Distribution). A power law distribution has the form where are variables of interest, k is the absolute constant, and α is the exponent.
Next, we recall some basic terminology related to graphs. An undirected graph
G is a pair
, where the finite set
V is called a set of vertices and
E is some collection of unordered pairs of vertices. Elements of set
E are called edges. For an edge
,
are called end points of the edge, and edge
e is said to be incident on
. The degree of a vertex
is the number of edges incident on the vertex
i. An adjacency matrix for
G is a
matrix, with
entry being 1 if edge
. For undirected graphs, the corresponding adjacency matrix is symmetric, and being symmetric it has real eigenvalues. We denote eigenvalues as
. The collection of eigenvalues is called the
spectrum of the graph. The spectrum of the graph includes a lot of interesting information of the graph. For example, in the case of
d regular graphs,
.
if and only if the graph is connected, and
if and only if the graph is bipartite. For basic properties of the spectrum of the graph, we refer to [
27]. Refer to [
28] for applications to computer science.
Let
D be a diagonal matrix with the
entry being the degree of vertex
i. Let
be the normalized adjacency matrix. Let
be the Laplacian matrix and
be the normalized Laplacian matrix [
29].
Next, we define random graph models useful for the discussion in the paper.
Definition 3 (Erdos–Renyi Random Graphs). The model for random graphs due to Erdos and Renyi is defined as follows. n is the number of vertices. For each pair of vertices in the vertex set of G, there is an edge between i and j with probability p.
Based on certain thresholds on
p, the graph may have several connected components, it can have one huge connected component, or the graph is connected. We are more interested in the connected graph regime. It is well known that the degree distribution of the Erdos–Renyi graph follows binomial distribution (see [
25]) and the eigenvalue distribution follows Wigner’s semicircle law (see [
30]) For a thorough discussion on various properties of the model, we refer to Section 8 of [
25].
Definition 4 (Random Geometric Graphs). Random geometric graph is defined as follows. Choose n points uniformly at random from in dimension d, and put an edge between two vertices if .
We refer to [
13] for comprehensive treatment of geometric random graphs.
3. Random Covering Graphs
In this section, we define our random geometric graph family , where d is the dimension of the Euclidean space and r is a parameter which takes real values between 0 and 1.
Sampling from unit
n-sphere: Before describing a new family of graphs, first we describe a standard efficient process to sample points uniformly from an
n-dimensional ball [
25]:
For
to
n, let
be chosen according to Gaussian distribution
with mean
and variance
.
Choose u uniformly at random in .
Let
The vector
y will be uniformly distributed inside the
n-dimensional unit sphere of radius 1 centered at origin. In Step 1 above, we sample point
x from the surface of the unit sphere centered at the origin. In Step 3, we scale
x appropriately to obtain a point
y inside the unit sphere.
Figure 1 demonstrates the random samples chosen from the 2D ball.
Random covering graph construction: The random covering graph family is defined as below.
Let .
choose x uniformly at random from
Repeat Step 2 until we find that for all point . If we need to repeat Step 2 more than times then goto Step 4 else goto Step 5.
Include x in S and goto Step 2.
Define graph G whose vertex set is S and we put an edge between two vertices of G if .
Basically, we are repeatedly sampling points uniformly in the unit sphere as long as we do not obtain a fresh point which is at a distance of at least
r from all the chosen points so far. To obtain a process with bounded running time, we need to identify a situation where we would not obtain a fresh point anymore. To do this, we use the threshold in Step 3 based on simple geometric distribution. Let
V denote the union of all spheres of radius
r centered at points chosen thus far to be intersected with the unit sphere centered at origin, that is
Intuitively, if , then it is unlikely to find a fresh point anymore, where denotes the volume of T. Now, suppose , then the expected number of times we need to pick a point uniformly at random from until we obtain a fresh point (which is at least r distance from all the chosen points so far) is (this follows from the basic properties of usual geometric distribution). This gives intuitive justification for the choice of our threshold in Step 3.
Note that if there do not exist any fresh points, then it is implied that for every point , there is such that , that is , union of balls of radius r placed at points in S covers the entire unit ball . This is why we name our family as the random covering graph family.
There are a couple of important features of the above construction. In general, in geometric graphs, two vertices are connected if they are at a small distance from each other, whereas in the case of random covering graphs, the process ensures that though the distance between the vertices connected by an edge is “small” at a small distance, they are not at “too small” of a distance from each other. This is ensured by points which are at a distance of at least r. This is the key distinguishing feature of the covering graph family compared to random graphs, and it is the crucial reason behind the deviations observed in the degree distribution and eigenvalue distribution as compared to general random geometric graphs. Typically, for Erdos–Renyi graphs, as well as general geometric random graphs, one is also interested in aspects such as number of connected components, threshold beyond which a large connected component appears in the graph, etc. We note that covering graphs are almost always connected. If we want to understand the behavior of our model in the regime, where the graph may have several connected components, we can do that by setting a smaller threshold in Step 3 above. We summarize it in the observations below.
Observation 1. The random geometric graph is always connected with very high probability. If we want to work with random covering graphs with multiple connected components, then it can be achieved by setting a smaller threshold in Step 3 above.
Claim 1. The number of vertices, degree of any vertex of , are upper-bounded by and , respectively.
Proof. For any two vertices of G, we know that . All the vertices are inside . Place radius spheres centered at each vertex of G. From triangle inequality, it follows that . Again, as all vertices of G lie inside , by triangle inequality it follows that . That is, all small spheres are disjoint and contained inside ball of radius centered at origin. So, simple packing argument implies that the number of vertices of G is upper-bounded by .
Suppose x is a vertex of G and are neighbors of x in G. So, we have . Clearly, from the definition of G. Again, using triangle inequality, we see that balls of radius placed at are disjoint and contained in . This implies . □
So the number of vertices is upper bounded by an exponential function of , whereas the maximum degree is upper bounded by which is a constant (independent of r).
We have experimentally computed the average number of vertices and average number of edges for various values of
r. To compute the average, we run the same experiment repeatedly for a “large” number of iterations with fresh random bits and compute the average value of the number of vertices and edges across the iterations. We plot the logarithm of the average number of vertices against
; it is clear from the graph that
, where
is the average number of vertices. Similarly, the linearity of
with respect to
is clear from the graph in
Figure 2, where
is the average number of edges. We can see the line approximation becomes better and better with increasing the dimension. Below, we have given graphs only in the
case just to demonstrate the nature of the graph.
Observation 2. Let be a random covering graph with vertex set V and edge set E, then .
4. Degree Distribution
In this section, we contrast and compare the degree distribution of Erdos–Renyi random graphs as well as general geometric random graphs with the degree distribution of random covering graphs.
Let
be an Erdos–Renyi graph. Since
p is the probability of an edge being present, the clearly expected degree of any vertex is
. It follows easily that the actual degree distribution is given by
Using tail inequalities like the Chernoff bound, it can be shown that the above binomial distribution falls exponentially fast as we shift away from the mean. For the general geometric graphs too, similar behavior has been observed. We demonstrate in
Figure 3 our experimental observation displaying the degree distribution for random geometric graphs by varying parameter
r.
To obtained the degree distribution, we run the random experiment a large number of times with fresh random bits. For every iteration, we compute the degree histogram by counting the number of vertices of each degree. Then, we take the average across the number of iterations. The number of iterations is chosen to be sufficiently large so that the distribution converges. We repeat the experiment by changing
r in a certain interval, each time varying the value of
r by a small constant. The specific interval is chosen so that it captures interesting features in the graph; at the same time, the computation can be performed in a reasonable amount of time. We cannot choose
r to be too small, as from the Observation 1, we know that the number of vertices grows exponentially with the reciprocal of
r. In
Figure 4, we demonstrate the distribution observed for the dimensions 2 to 6.
As remarked in the introduction, the degree distribution of several graphs arising in practice do not exhibit sharp drops in the degree when one goes away from the mean degree. They rather drop slowly, resulting in a broader distribution, which is referred to as “heavy tail” or “fat tail” in the literature [
24]. The degree distribution of random covering graphs is quite broad as noted in
Figure 4 This suggest the possibility of using covering graphs to model real-world networks. Even though covering graphs are almost always connected, if one wants them to model disconnected graphs, then one can set appropriately smaller threshold in the stopping criteria in Step 3 as noted in Observation 2.
Shape of degree distribution and plausible connection with kissing number:
As observed in
Figure 4, we see two lobes in the graph of degree distribution of random covering graphs. It is a very interesting problem to understand the mathematical reason behind the similarity. As we increase the dimension, the second lobe starts moving towards the right, and the local maxima corresponding to the second lobe drops down. While defining the covering graphs, we have additionally put up a requirement that the chosen points are not too close. This must be crucially connected to the observed nature of the graph. From Claim 1, we know that the degree of any vertex is upper bounded by
but still the distribution seems to concentrated around quite small values (e.g., 7 to 8 in 2D, and 11 to 18 in 3D). It is interesting to note that the two bulges emerge in the graph in the region around the kissing number in the respective dimension. A
kissing number in dimension
d is the largest number of unit spheres one can place touching a central unit sphere such that no two spheres overlap. The problem has an interesting history, dating back to Newton and Kepler. We refer to an excellent book by Conway and Sloane [
15] and article [
17] for the interesting history and results related to the problem. For 3-dimensional space, the answer was not known at that time, and it was either 12 or 13. Even today, the exact kissing numbers are known only for dimensions 1, 2, 3, 4, 8 and 24 [
15,
16,
17], which are, respectively, 2, 6, 12, 24 and 196,560. For other dimensions, only lower bounds and upper bounds are known (see [
15,
18] for the current best-known upper bounds and lower bounds). All of the above results except for dimension 1, 2 underlie deep mathematics, and even some of the proofs require computer assistance.
Clearly, the properties of RCGs are linked with the geometric properties of the underlying space. We believe that insisting on the lower bound on the distance between the chosen points strengthens this linkage, as it is against general geometric random graphs.
The first local maxima of degree distribution graph for RCG are close to the kissing numbers in dimensions 2, 3, 4 and 5. Does it indicate some relation between RCG and the kissing number problem? It might be quite speculative to say that there is a connection between the kissing number problem and the degree distribution of random covering graphs; nevertheless, it would be definitely worthwhile to explore the plausibility of such a connection. Even if there might not be a direct connection with the kissing number, we believe that the mathematical study of this model certainly sheds some light on the local geometric properties of Euclidean space. It will be an interesting problem from a purely mathematical perspective.
Observation 3. The degree distributions of random covering graphs have two bulges unlike the distribution for general geometric random graphs. They are located near the kissing number in the respective dimensions for dimensions 2, 3, 4, 5.