1. Introduction
Networks naturally arise in many real-world situations. Examples vary from macro-structures, such as social networks [
1] and economic trade networks [
2], to microscopic structures involving protein–protein interactions [
3], transcriptional regulation networks [
4], gene regulatory networks [
5], and both biological and artificial neural networks [
6,
7]. While real-world networks are often modeled as directed graphs, their computational analysis is not only challenging but far more cumbersome and restricted. Thus, it would be helpful to analyze and solve certain classes of problems for directed graphs from an undirected graph framework. Here, we tackle this problem by using notions and principles of Category Theory (CT) within a graph context. Put simply, CT studies abstract structures and their relations. These structures, or
categories, are composed of a collection of things we called “objects”, and a collection of relations between two objects that we call “morphisms”. Originally, CT developed within pure mathematics; much more recently, it started to be used broadly across the natural sciences and engineering, including applications in machine learning and artificial neural networks [
8], biological networks [
9,
10], and social networks [
11].
The way we bridge a directed graph framework with an undirected one is by first considering a category of undirected graphs that encode the notion of direction. This category, which we call the
prime graphs category, has as objects undirected graphs equipped by a ‘prime labeling’, and as morphisms undirected graphs morphisms that preserve the prime labeling. Indeed, it is the prime labeling that provides a notion of direction over the structure of the undirected graph, allowing us to define a unique direction when transforming to a directed graph, and vice versa. With this in mind, we construct a bijective functor that relates the category of simple directed graphs with the category of prime graphs, such that the notion of direction is preserved. It is worth mentioning that one can always relate a directed graph to an undirected graph in a trivial way: by simply considering the underlying structure and ‘forgetting’ the direction. This correspondence gives rise to a ’forgetful’ functor which is not invertible. In [
12], Miller provides one of the first nontrivial transformations between simple directed graphs and undirected graphs by the construction of “gadgets”. These gadgets are used to encode the notion of direction, just as our prime label does. However, each gadget adds seven nodes to the corresponding undirected graphs, while in our framework, we are just adding a prime label node. Additionally, we are the first to address the problem of converting directed graph morphisms into undirected graphs, while preserving the notion of direction. This latter aspect is crucial for both of our applications; in network alignment, the labeling and its preservation through prime graph morphisms ensures the mapping between appropriate nodes through the node similarity metric, while in spectral graph clustering, the labeling plays a role in determining edge weights.
Network alignment is a technique that allows us to compare two networks. This is performed by “putting one on top of the other” in such a way that the structure—or topology—between the networks being compared is preserved as much as possible, and the similarity between the networks is quantified. To date, several network alignment tools exist for undirected networks (see [
13,
14,
15,
16]), but to the best of our knowledge, none exist for directed graphs. Our framework, hence, proposes to perform network alignment on directed graphs via their corresponding prime graphs (which are undirected). Within this line of applications, we show the efficacy of our approach empirically by using synthetically generated pairs of networks whose pairwise similarity is known and controlled by the graph generator’s pairwise correlation coefficient. Our results in
Section 3.1.1 show that there is a strong statistical correspondence between the generated networks and their resulting pairwise network similarity scores.
Spectral clustering is a widely used and robust technique that considers the spectrum—or eigenvalues and eigenvectors—of the
graph Laplacian matrix to partition the nodes of a graph into clusters. More precisely, one can cluster the nodes of a graph by sorting the components of the eigenvector corresponding to the second-smallest eigenvalue of the characteristic polynomial of the graph Laplacian matrix. Initially, spectral clustering was developed for undirected graphs using their adjacency and Laplacian matrices [
17]. Later, the technique was extended to directed graphs [
18], where a transition probability matrix—or random walk—is used to overcome the asymmetry found in their adjacency and Laplacian matrices. There are heuristic techniques that circumvent the latter construction by making the adjacency matrix of a directed graph symmetric; therefore, they can define a symmetric graph Laplacian matrix [
19], which they then use to find the cuts of the directed graph. However, while they are able to show empirically that the resulting cuts are the same, they do not prove equivalence, as we do. Precisely, we prove that our framework preserves minimum cuts, and consequently, it preserves clusters in directed graphs and their respective prime graph counterparts.
The paper is organized as follows.
Section 2 gives preliminary notions about directed and undirected graphs from a category theory perspective. We describe the category of prime graphs and prime graph morphisms. We also present the concepts and theoretical results used in the application of our framework. In
Section 3, we discuss the applications of network alignment and spectral clustering for directed graphs via functoriality. Finally,
Section 4 gives some concluding arguments.
2. Methods
Mathematically, a graph is a structure that consist of a set of vertices V (nodes) and a set , which we call edges (connections). Within this context, we say that a graph is undirected if its set of edges are connections without directionality. A graph is directed if all its edges are connections with a direction. Now, given an undirected graph, we represent an edge between the nodes u and v by the unordered pair , equivalently . Instead, for a directed graph, an edge from node u to node v is represented by the ordered pair ; conceptually, we can think of this edge as an arrow with initial node u and terminal node v. Also, we say that the nodes u and v are neighbors if there is at least one edge connecting nodes u and v. Throughout this work, we will consider simple directed graphs, this is, directed graphs with no multiple edges and no self loops.
Intuitively, a graph morphism is a function between the vertex sets that preserves the structure or topology of the graphs; this is, it preserves the edges under transformation:
Definition 1. Let and be two undirected graphs. An undirected graph morphism is a function that maps adjacent vertices in into adjacent vertices in . Algebraically, this means that for any pair of vertices with , we have .
For two directed graphs and , we say that a function is a directed graph morphism if f maps initial nodes into initial nodes, and terminal nodes into terminal nodes. One can verify that the composition of two graph morphism always yields another graph morphism, in both undirected and directed cases. Further, the composition is an associative operation that has a neutral element, called the identity morphism (which coincides with the identity map). With this in mind, one can show that the collection of all undirected graphs and all undirected graph morphisms () forms a category. Similarly, the collection of all simple directed graphs and all directed graph morphisms () forms a category. As we see next, the category is isomorphic to a subcategory of ; this subcategory, in fact, is the category of prime graphs () that we define.
Before describing the isomorphism between
and
, we will detail the category of prime graphs and its connection to simple directed graphs. Conceptually, a prime graph is an undirected graph which admits a ’prime labeling’ on its set of nodes (
Figure 1). By allowing a “prime labeling”, we mean that there exists a labeling function on the vertex set with the following two properties; any prime labeled node has only non-prime labeled nodes as neighbors (and vice versa), and any prime labeled node is always adjacent to its non-prime labeled counterpart node.
Definition 2. Let be an undirected graph with vertices, and let . We say that admits a prime labeling if there exists a bijective function such that for , one has the following three cases:
- (i)
If and , then for each neighbor u of v, for some . We visualize this as - (ii)
If and , then for each neighbor u of v, for some . We visualize this as - (iii)
For each , if and , then there exists a neighbor of v such that and .
For illustrative purposes, consider the following graphs:
The graph on the left is not a prime graph, as the prime labeled nodes are connected to each other. For the same reasons, the graph on the right is not a prime graph; however, this last graph becomes a prime graph if we endow it with the following prime labeling:
We also observe that one can visualize a prime graph as an undirected bipartite graph. This is a consequence of the definition of a prime labeling, as prime labeled node only have non-prime labeled nodes as neighbors, and vice versa. Please see in
Figure 2.
Now, due to condition
, we can naturally induce a directed graph from a prime graph. In this case, the direction of an arrow will be given by the ’prime’ label vertex. For instance, if we consider the linear undirected graph
G
we can give the prime labeling
which induces the directed graph
having a directed edge from an initial vertex
to a terminal vertex
.
In terms of morphisms, we can think of a prime graph morphism as an undirected graph morphism that preserves the prime and non-prime labelings. Formally:
Definition 3. Let and be two prime graphs, with labeling functions and , respectively. A prime graph morphism is an undirected graph morphism that satisfies the following conditions:
- (i)
(Non-prime label preservation) If with for some , then is such thatfor . - (ii)
(Prime label preservation) If with for some , then is such thatfor . - (iii)
If are adjacent vertices with and , then one always has that
The above can be rephrased by saying that a prime graph morphism is an undirected graph morphism compatible with the non-prime and prime labelings. From an algebraic perspective, this compatibility condition means that, for each prime graph morphism
, there exists a function
making
a commutative diagram; that is,
.
Furthermore, the composition of two prime graph morphisms results in a prime graph morphism itself. This follows from the fact that the composition of undirected graph morphisms is a closed operation, and also from the fact that the composition of prime graph morphisms preserves the prime and non-prime labeling conditions. Expressed in diagrams, this latter aspect is equivalent to saying that the commutativity of the large diagram is a consequence of the commutativity of the smaller diagrams:
Additionally, as the composition operation of undirected graph morphisms is an associative operation, it follows that the composition of prime graph morphisms is also an associative operation. Further, for any prime graph , its identity prime graph morphism coincides with the identity map defined on the undirected graph . Considering the above, one has that the collection of prime graphs, along with the collection of prime graph morphisms, form a category.
Theorem 1. The collection of all prime graphs, and all prime graph morphisms, defines a category denoted by PGraph.
It is worth noting that not every undirected graph morphism induces a prime graph morphism. To see this, let us consider the following prime graphs:
Then, the function with correspondence rule , , , , , and is an undirected graph morphism which is not a prime graph morphism. In fact, this illustrates that the category is not a full subcategory of UndGraph. Consequently, both the graph isomorphism problem and the subgraph isomorphism problem for prime graphs differentiates that for undirected graphs, which are known to be open and NP-complete, respectively.
2.1. and Are Isomorphic Categories
The first part of this subsection describes the functors and . The second part of this subsection shows that and are inverse of each other.
2.1.1. The Functors and
The following two propositions describe the assignment on objects and morphisms of functor :
Proposition 1. Let be a directed graph in . Then, induces a prime graph in .
Proof of Proposition 1. Let be a finite directed graph in DGraph. Without loss of generality, let us suppose that is represented by the set . Thus, by denoting for each , the prime graph has vertex set
Now, to define the edge set , we will consider the following two cases:
- (i)
for each , the tuple defines an edge in ;
- (ii)
for in , we have that defines an edge in if, and only if, there exists a directed edge .
We claim that
admits an
I-labeling. Let
and
be the non-prime and prime sets, respectively. We define the labeling function
as follows. For each
,
Clearly,
is a bijective function. Moreover, based on how
is defined, condition
of Definition 2 is satisfied. Thus, it suffices to show that
equipped by
satisfies conditions
and
. Now, if
is such that
, then
for
. Hence, as the incident edges to vertex
have the form
—that is, when
—or
, it follows that the neighbors of
also have the form
, with
, or
. In any case, one has that
for any possible neighbor
of
v. Likewise, if
for some
, then
. Again, based on how the undirected graph
is defined, one has that the incident edges to vertex
in
are either of the form
(namely, when
) or
. Thus, the neighbors of
can be either
(for some
) or
, which in turn implies that
for any possible neighbor
of
v. Therefore,
defines a prime labeling function on
. □
Figure 3 displays the correspondence between a directed graph
and its corresponding prime graph
. Notice that the prime labeled nodes encode for the notion of incoming edges of the corresponding directed graph.
Proposition 2. Let and be two directed graphs inDGraph, and let be a directed graph morphism. Then, f induces a prime graph morphism between the prime graphs and .
Proof of Proposition 2. Let
, and let us take
, a directed graph morphism. Without loss of generality, assume that the vertex sets are
and
, respectively. Then, by Proposition 1, the vertex set of the corresponding prime graphs
and
are given by
and
respectively. With this in mind, we define
by
where
is the prime labeled vertex in
such that
. We claim that
is a prime graph morphism. Indeed, for any edge
, with
, it corresponds to a directed edge
in
. Thus, as
f preserves adjacencies—in the directed case—we have that
, which in turn implies that
. Considering this, one has that
Moreover, for edges of the form
, one obtains
Thus,
preserves adjacencies. Now, based on how
is defined, it is clear that
is compatible with the labeling, as it preserves the prime and non-prime labelings:
. Therefore,
is a prime graph morphism. □
To better visualize Proposition 2, let us consider a directed graph morphism
f that maps a directed edge
into a directed edge
:
Then, in the prime graph context, one has a prime graph morphism
mapping
Remark 1. If is a directed graph, then its corresponding prime graph satisfies thatThis follows from the fact that an edge in has either the form —corresponding to a directed edge in —or the form . Considering the results from above, we have the following:
Theorem 2. The map that assigns to each object the object and to each morphism the morphism defines a functor from DGraph to PGraph.
Proof Theorem 2. Observe that the object and morphism assignments of functor are exactly Proposition 1 and Proposition 2, respectively. Thus, it suffices to show that preserves identity morphisms and composition of morphisms.
The fact that preserves identity morphisms follows from Proposition 2 as coincides with the identity map . To see that preserves compositions, we will show that, given the directed graph morphisms and , one has that . Following the notation so far, we will denote the vertex set of the directed graph by , the vertex set of the directed graph by , and the vertex set of the directed graph by . Then, by Proposition 10, the vertex sets of their corresponding prime graphs are given by
Now, on the one hand, the image under functor
of the directed graph morphism defined by the composition
is the prime graph morphism
, whose correspondence rule is given by
Here,
denotes the prime labeled vertex on
such that
.
On the other hand, the image of
f under
is the prime graph morphism
given by
while the image of
g under
is the prime graph morphism
given by
Note that, as prime graph morphisms preserve the prime vertex and non-prime vertex labelings, one then has that
and
. This way, when considering the composite morphism
, one has that, for each
,
whereas for each ,
Hence,
and
have the same correspondence rule, and hence,
. Consequently,
that is,
preserves compositions of morphisms. □
Remark 2. The functor preserves topological features. For instance, connectivity is preserved under . Also, if is a complete directed graph, then its corresponding prime graph is a complete bipartite graph.
On the other end, the next two propositions describe the assignment on objects and morphisms of functor :
Proposition 3. Let be a prime graph. Then, induces a simple directed graph .
Proof of Proposition 3. Let be a prime graph, and let be its labeling function, with . If the vertex set of is given by
then, we define the vertex set of the directed graph
, induced by
, as
In order to define the edge set
, we must consider all incident edges to the set of prime label vertices
in
. For instance, if
is adjacent to
in
(with
), then we will obtain a directed edge in
from vertex
towards vertex
. In case
is the only adjacent vertex to
in
, then the corresponding vertex
in
will not have incoming directed edges. In other words, the set
is given by
Proposition 4. Let and let be a prime graph morphism. Then, g induces a directed graph morphism between their corresponding directed graphs and .
Proof of Proposition 4. Let , and let be a prime graph morphism. Without loss of generality, let us assume that and . Then, by Proposition 3, the vertex sets and are given by and , respectively. With this in mind, we define the map as follows. For each , we set . We claim that preserves adjacencies. Indeed, given , we have that . Thus, as f is a prime graph morphism, we obtain that . Moreover, since f is compatible with the prime and non-prime labelings, it follows that , from which we have that . Therefore,
showing that
is a directed graph morphism. □
Considering the above, we have the following:
Theorem 3. The map that assigns to each object the object , and to each morphism the morphism is a functor.
Proof of Theorem 3. By Propositions 3 and 4, it suffices to show that the map preserves identity morphisms and composition of morphisms.
The fact that
preserves identity morphisms follows from Proposition 4, as
coincides with the identity map
. To see that
preserves compositions, we will prove that, given the prime graph morphisms
and
, one has that
. Without loss of generality, let us suppose that
Following along with the notation, we will denote the vertex sets of the directed graphs
and
by
,
and
, respectively. Now, on the one hand, the image under
of the prime graph morphism
is the directed graph morphism
, whose correspondence rule is given by
On the other hand, the directed graph morphism
is defined by
, for all
. Likewise, the directed graph morphism
is defined by
, for all
. Therefore, the composition
is a directed graph morphism such that, for each vertex
,
The above shows that
and
have the same correspondence rule. Since both functions have the same domain and codomain, we obtain that
. Therefore,
that is,
preserves compositions of morphisms. □
2.1.2. The Functors and Are Isomorphisms
Recall that an isomorphism between two categories and is a functor that is a bijection on both objects and morphisms. In other words, a functor is an isomorphism if, and only if, there exists a functor for which the compositions and are the identity functors and , respectively. In this case, we say that the categories and are isomorphic.
Proposition 5. Let be a directed graph inDGraph. Then Proof of Proposition 5. Let . Without loss of generality, we can assume that . Then, the prime graph has the vertex set
and the edge set
If we now consider the directed graph induced by
, that is,
, then, the vertex set
is given by
and its edge by
In this way, as
if, and only if,
, it follows that
if, and only if,
. Moreover, since
and
have the same vertex set
, we can conclude that
. Therefore,
□
Proposition 6. Let , and let be a directed graph morphism. Then, Proof of Proposition 6. Let
and
be directed graphs, and let
be a directed graph morphism. Without loss of generality, assume that
. Then, by Theorem 2, we obtain the prime graphs
and
, along with the prime graph morphism
defined by
Here,
. Now, as
for
, it suffices to show that the functions
and
f have the same correspondence rule, since both have the same domain and codomain sets. By considering
, we see that
, for each vertex
. Therefore,
, and thus,
□
Corollary 1. The functor and the functor satisfy
On the other end, we have the following results:
Proposition 7. Let be a prime graph. Then Proof of Proposition 7. Let . Without loss of generality, assume that the vertex set is given by the set . Then is the directed graph with vertex set , and edge set
Now, if we denote by
the prime graph induced by
, that is,
, then,
, and
Notice that
if, and only if,
, which holds if, and only if,
. Since both graphs
and
have as vertex set the set
, we conclude that
. Therefore,
□
Proposition 8. Let be a prime graph morphism. Then Proof of Proposition 8. Let be a prime graph morphism, and let us suppose that . By Theorem 3, we have the directed graphs and , along with the directed graph morphism defined by , for each . Now, as
for
, it suffices to show that
and
f have the same correspondence rule, as these have the same domain and codomain sets. By considering that
is such that
we see that
and
f have the same correspondence rule. Thus,
, which in turn implies that
□
Corollary 2. The functor and the functor satisfy
Therefore, by Corollaries 1 and 2, we have the following:
Theorem 4. The categories DGraph and PGraph are isomorphic.
We highlight that the property of functoriality on graph morphisms is what allowed us to extend network alignment techniques for undirected graphs to directed graphs. As previously mentioned, an alignment of two networks consists of a mapping between the nodes of the compared networks. In doing this, one aims to preserve as much of the structure (or topology) between the considered networks as possible. Thus, when transforming two directed graphs into their corresponding prime graphs, the way we relate these prime graphs is by defining a prime graph morphism. This prime graph morphism preserves the topology and the labeling conditions, which ultimately gives the notion of direction. Consequently, by functoriality, these prime graph morphisms correspond to the directed graph morphisms used when aligning the compared directed graphs.
2.2. Prime Transformation Algorithm
This section outlines an algorithm (Algorithm 1) to convert a directed graph to a prime graph, reflecting definitions and constructions found in the previous section. Then, we discuss the time and space complexity for creating and storing a prime graph from a directed graph. Note that the algorithm does not consider edge weights because weighting schemes can vary based on application.
The input to Algorithm 1 is a directed graph, and the output is its corresponding prime graph. The algorithm operates on each edge of the directed graph. As such, the algorithm’s time complexity is
, where
e is the number of edges and
n is the number of nodes in the directed graph. While additional space is required to store the prime graph, the space necessary scales according to
. The extra space needed to store a prime graph results from the additional nodes and edges it contains relative to its directed graph counterpart. For prime graphs, the number of nodes is always double that of directed graphs because each node in the directed graph spawns an additional prime node. Additionally, an edge is created between each pair of prime and non-prime nodes; therefore, additional
n edges are required.
Algorithm 1 Prime Transformation Algorithm |
Input DGraph
| |
Output PGraph | |
- 1:
procedure MakePGraph
| |
- 2:
| ▹ Initialize edge list |
- 3:
| ▹ Initialize node list |
- 4:
| ▹ Initialize an empty prime graph |
- 5:
for n in dirNodes do:
| |
- 6:
| ▹ store the label of node n |
- 7:
| ▹ create a label for the n’s prime node |
- 8:
| ▹ add prime node |
- 9:
| ▹ add non-prime node |
- 10:
| ▹ add an edge between prime and non-prime node pairs |
- 11:
for e in dirEdges do:
| |
- 12:
| ▹ store edge head label |
- 13:
| ▹ store edge tail label |
- 14:
| ▹ store the tail node’s prime label |
- 15:
| ▹ add edge to PGraph using labels |
- 16:
| ▹ add edge to PGraph using labels |
- 17:
return PGraph
| |
4. Discussion
This work shows a novel construction that reinforces the power of CT as a tool to formalize structures and their relations. In this case, we use CT to bridge a directed graph framework to an undirected graph framework, so that not only directionality is preserved but also several topological features. This bridge enables the use of undirected graph techniques to obtain information from systems that are represented as directed graphs. Both the computational and space complexity of the transformations are , where N is the number of nodes in the network. As an empirical demonstration, we provide a new option to perform network alignment for directed graphs. This is relevant since network alignment tools do not exist for a directed graph setting. Furthermore, our transformation does preserve network similarity between directed graphs and their prime graph counterparts; we attained an value of (with a corresponding p-value of ) between the network aligner results, i.e., the similarity metric, a known graph generation correlation coefficient. Because we proved that our construction leads to an invertible transformation, there is only one prime graph that describes a simple directed graph and vice versa; as such, and in that sense, our transformation is error-free. Be that as it may, our transformation does not mitigate errors inherent in postprocessing the resultant graphs, for example, not achieving an value of 1 in the network alignment task.
Although the process of making an adjacency matrix of a directed graph symmetric is not new [
19], nor is transforming a directed graph into an undirected graph [
12], our framework is an advance. We proved that the minimum cuts are preserved when going from a directed graph framework to a prime graph framework and vice versa. These results, in turn, imply that clusters are preserved when moving from one setting to the other. As a proof of concept, we proved cluster preservation by generating a directed SBM network with known intra-block and inter-block connectivity.
While this work is a step towards a new application of existing network alignment tools, there is much left in this area to be explored in future work. Adoption of this technique may be limited by and rely upon showing additional mathematical proofs for commonly used techniques on graphs, for example, answering how the existing undirected node and edge similarity metrics might be skewed by the prime graph transformation. Another avenue for the application of prime graphs is to take advantage of their bipartite nature in problems such as the graph isomorphism problems for directed graphs. It is worthwhile studying the complexity of checking for equivalence between arbitrarily labeled DGraphs and PGraphs. Lastly, a categorical bridge, now between a multidirected graphs setting to a prime graph setting, might unlock new ways to study high complex data.
yes