A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions

Aguirre-Guerrero, Daniela; Bernal-Jaquez, Roberto

doi:10.3390/math11102265

Open AccessArticle

A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions

by

Daniela Aguirre-Guerrero

and

Roberto Bernal-Jaquez

^*

Department of Applied Mathematics and Systems, Universidad Autónoma Metropolitana, Cuajimalpa, Mexico City 05348, Mexico

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(10), 2265; https://doi.org/10.3390/math11102265

Submission received: 15 April 2023 / Revised: 4 May 2023 / Accepted: 7 May 2023 / Published: 12 May 2023

(This article belongs to the Special Issue Feature Papers in Complex Networks and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

Scientific research often involves collaboration among researchers, and coauthorship networks are a common means of exploring these collaborations. However, traditional coauthorship networks represent coauthorship relations using simple links, i.e., pairwise interactions, which fail to capture the strength of scientific collaborations in either small or large groups. In this study, we propose a novel methodology to address this issue, which involves using a multilayer network model that captures the strength of coauthorship relations and employs a convergence index to identify the collaboration order in which these properties converge. We apply this methodology to investigate the collaborative behavior of researchers in the context of the three main public universities in Mexico over the last decade, using Scopus data as the primary source of information. Our study reveals that community structure emerges in low-order collaborations, and higher-order collaborations lead to increased clustering and centrality measures. Our methodology provides a comprehensive and insightful way of analyzing scientific collaborations and sheds light on the dynamics of scientific collaboration, providing a valuable tool for future studies. Our proposed model and convergence index can be applied to other scientific domains to better capture the strength of collaborations among researchers.

Keywords:

coauthorship networks; scientific collaboration networks; higher-order interactions; multilayer network model; collaboration patterns; Mexico

MSC:

91D30

1. Introduction

Coauthorship networks have become an increasingly popular way to study scientific collaborations and the dynamics of scientific research. In recent years, complex network analysis has emerged as a powerful tool for analyzing such networks. Complex network analysis provides a set of quantitative measures that can be used to describe the structure and dynamics of coauthorship networks and can help to identify key actors, communities, and patterns of collaboration within the network [1,2,3]. However, traditional coauthorship networks only capture the existence of coauthorship relations through pairwise interactions. This approach fails to capture the strength of scientific collaborations in either small or large groups. To address this limitation, we propose a novel methodology that uses a multilayer network model to capture the strength of coauthorship relations and employs a convergence index to identify the collaboration order in which these properties converge. In this study, we apply our proposed methodology to analyze coauthorship networks in the context of the three main public universities in Mexico over the last decade, using the Scopus database as our primary source of information. Mexico’s public universities are a suitable case study for several reasons. Mexico has a rich tradition in scientific research and development, with several leading universities contributing to the production of high-quality research. In particular, the three main public universities in Mexico—Universidad Nacional Autónoma de México (UNAM), Instituto Politécnico Nacional (IPN), and Universidad Autónoma Metropolitana (UAM)—have been at the forefront of scientific research in the country. These universities have produced a large number of research publications in different fields and have contributed significantly to the development of science and technology in Mexico. In this context, recent research have attempted to identify and analyze the emerging collaboration patterns among the scientific community in Mexico through complex network analysis [4,5,6,7,8].

Using our methodology, in contrast to [4,5,6,7,8], we employ a multilayer network model to represent coauthorship networks, which captures higher-order interactions that cannot be represented in traditional coauthorship networks [9,10,11]. The multilayer network model can be defined as a tuple

M = (G, C)

, where

G

is the set of graphs

G_{i} = (V_{i}, E_{i})

. In this model, each layer

G_{i}

represents the coauthorship network of articles cowritten by exactly i authors. The nodes

V_{i}

in each layer represent authors, intralayer edges in

E_{i}

represent coauthorship relations, and interlayer edges in

C

connect the same author at different layers. This approach gives a microscopic view of different orders of collaboration and allows us to explore the evolution and emergence of collaboration patterns.

As stated before, we also propose a novel convergence index to identify the level (order) of collaboration in which the global properties of collaboration networks converge. Specifically, we examine the convergence of basic network metrics, the assortativity coefficient, network connectivity, and community structure. The study is expected to provide insights into the collaborative behavior of researchers in the three main public universities in Mexico and provide a better understanding of the structure and dynamics of their coauthorship networks. These insights may help to identify areas of scientific research that require further attention and help foster collaborations among researchers in the three universities.

The remaining parts of the paper proceed as follows: Section 2 is concerned with the data and methodology used for this study. Section 3 presents the findings of the research, focusing on the four categories of the analyzed metrics: basic metrics, degree metrics, network connectivity, and community structure. Finally, Section 4 gives a brief summary and a discussion of the implication of the findings and future research.

2. Data and Methods

2.1. Data

Data included in this study were retrieved from the Scopus database via the Scopus API (see http://api.elsevier.com and http://www.scopus.com accessed on 7 March 2023) using the pybliometrics library for Python [12]. We searched for articles published between 2012 and 2021, where at least one coauthor is affiliated with one of the following three major Mexican universities (although the rest of the coauthors may have different affiliations): the National Autonomous University of Mexico (UNAM), the National Polytechnic Institute (IPN), and the Metropolitan Autonomous University (UAM). The data collected consist of metadata built from 74,400 articles, which were classified into three categories: UNAM, IPN, and UAM. An article is in the UNAM category if its listed affiliations (the listed affiliations of an article include every affiliation associated with each of its coauthors) include UNAM; the categories IPN and UAM are defined similarly. Note that an article can be classified into more than one of the aforementioned categories. The distribution of articles among the mentioned universities is as follows: UNAM 51,520; IPN 23,624; and UAM 2394, see Figure 1. It is important to mention that this distribution is related to the distribution of associate and full professors per university, which according to the 2021 faculty registries is as follows: UNAM 8510; IPN 5758; and UAM 2673 [13,14,15].

In order to analyze the collaboration networks across different subject areas, we consider the All Science Journal Classification (ASJC) system, which is used in the Scopus database to classify journals and conference proceedings under the following four subject areas: life sciences, physical sciences, health sciences, and social sciences. Figure 1 shows the distribution of the analyzed articles among the universities considered and subject areas. Since some articles are classified into more than one university and subject area, adding the total number of articles of the different areas results in a number that is greater than the total number of articles for this period.

Another important piece of information about the analyzed data is the coauthors count per article. In this study, articles with 100 coauthors or more are excluded, because in Scopus, they are reported as 100-coauthor papers, impeding, for instance, from distinguishing between those with 102 and 105 coauthors (since the percentage of excluded articles is less than

0.01 %

of the articles retrieved, it has no impact on the findings of this study).

Remark 1.

Let M denote the set of analyzed articles and let A be a random variable denoting the coauthor count of an article in M. Here, and subsequently,

P [A]

denotes the distribution function of A,

P [A \leq a]

denotes the cumulative distribution function of A, and

P [A > a]

denotes the complementary cumulative distribution function of A.

Figure 2 shows the complementary cumulative distribution functions

P [A \geq a]

for the coauthor count in the cases of the three universities and subject areas analyzed. The log–log scale shows that most of the distributions consist of three parts that have a linear fit, each of them with a different slope. Therefore, the distributions functions could fit piecewise functions that follow the power law with different exponents. These characteristics indicate that the distributions of coauthors per article follow a scale-free distribution, where we have a small number of articles with a high number of coauthors, while most of the articles have a moderate number of coauthors.

The following subsection describes the multilayer collaboration network model used in our analysis, which allows to estimate the level of collaboration in which different properties of the collaboration network converge.

2.2. Multilayer Coauthorship Networks

Based on the article classification described in the previous section, we mapped the sets of articles into multilayer coauthorship networks for each subject area and university. Formally, a multilayer network is defined as follows.

Definition 1.

A multilayer network is given by a tuple

M = (G, C)

, where

G

is a set of graphs

G_{i} = (V_{i}, E_{i})

, such that each

G_{i}

denotes the i-th layer of

M

, and thus,

E_{i}

denotes the set of edges connecting nodes in the same layer

G_{i}

; these edges are called intralayer edges. Meanwhile,

C

is the set of edges connecting nodes of different layers, which are called interlayer edges [16].

We applied the representation of multilayer coauthorship networks proposed in [10], where each layer

G_{i}

is the coauthorship network of articles cowritten by exactly i authors. Then, nodes in

V_{i}

represent authors, intralayer edges

E_{i}

represent coauthorship relations, and interlayer edges in

C

connect the same author at different layers.

The advantage of modeling scientific collaboration as multilayer networks lies in the fact that these networks capture the strength of scientific collaborations by modeling them as higher-order interactions [17]. If we assume that the strength of coauthorship relations of an article is inversely proportional to its number of coauthors, then nodes belonging to a layer

G_{i}

have i-order interactions, and the strength of their links is inversely proportional to i. Moreover, the union of consecutive layers of

M

gives a microscopic view of different orders of collaboration. In particular, the coauthorship network of articles cowritten by at most i authors results from the union of layers from 1 to i. This network is called the i-th projection network of

M

.

Definition 2.

Let

M = (G, C)

be a multilayer network, where

G

is the set of layers

G_{i} = (V_{i}, E_{i})

and

C

is the set of intralayers. The i-th projection network of

M

is given by the union of consecutive layers from

G_{1}

to

G_{i}

, and is defined as

P_{i} (M) = (V_{i}, E_{i}), where V_{i} = ⋃_{1 \leq j \leq i} V_{j} and E_{i} = ⋃_{1 \leq j \leq i} E_{j} .

(1)

Remark 2.

Let

M

be a multilayer coauthorship network. The i-th layer of

M

, i.e.,

G_{i}

, is given by the coauthorship network of articles with exactly i coauthors. The i-th projection network of

M

, i.e.,

P_{i} (M)

, is given by the coauthorship network of articles with at most i coauthors.

Figure 3 presents an example of a multilayer coauthorship network

M

and its projection networks

P_{i} (M)

, in which 5 articles are mapped.

In this work, we modeled the multilayer coauthorship networks and their projection networks for each subject area of the analyzed universities. Our goal is to estimate the order of collaboration in which different global network properties converge. In the next subsection, we propose a convergence index for the global properties of collaboration networks.

2.3. Convergence Index

The authors in [10] define the convergence layer and maturation index of a global network property as follows.

Definition 3.

Let

P_{i} (M)

be a projection network of a multilayer network

M

and let

x (i)

be the value of some global network property x at

P_{i} (M)

, e.g., the network diameter. Then, the convergence layer of x denoted by

ℓ_{\tilde{x}}

is the minimal layer of

M

that satisfies

\frac{| x (ℓ_{\tilde{x}}) - x (i) |}{x (i)} \leq ϵ, \forall i \geq ℓ_{\tilde{x}},

(2)

where

ϵ = 0.05

. Hereafter,

\tilde{x} = x (ℓ_{\tilde{x}})

denotes the convergence value of x in

M

.

Definition 4.

Let

M

be multilayer network and let

ℓ_{m a x}

the maximal layer of

M

. Let x be a global network property and let

ℓ_{\tilde{x}}

be the convergence layer of x in

M

. Then, the maturation index of x in

M

is given by

ℓ_{\tilde{x}} / ℓ_{max}

.

In the context of scientific collaboration networks, the convergence layer

ℓ_{\tilde{x}}

of a property x indicates that the evolution of x in the consecutive coauthorship networks

P_{i} (M)

from

i = 1

reaches a steady state in the coauthorship network

P_{ℓ_{\tilde{x}}} (M)

. Thus, articles cowritten by more than

ℓ_{\tilde{x}}

authors are irrelevant for the evolution of the values taken by property x. On the other hand, the intention of the maturation index

ℓ_{\tilde{x}} / ℓ_{max}

is to give an idea of the fraction of layers needed to reach a stable value for any property x. However, we must interpret this index with caution because the value of this index is quite unstable. To illustrate this instability, consider that for some property x maturation is reached at layer 8 out of 80, giving a maturation index value of

0.1

. Now, if we add only one more paper with 800 authors, the maturation index value would turn out to be

0.01

by only adding one single paper.

In order to have a stable index that we can use in the context of collaboration networks, we take into account the percentage of analyzed articles that conform to the collaboration network

P_{ℓ_{\tilde{x}}} (M)

. We propose the following convergence index.

Definition 5.

Let M be a set of articles and let

P [A \leq a]

be the cumulative distribution function of the coauthor count of an article in M (Remark 1). Let

M

and

P_{i} (M)

be the multilayer and projection networks formed by M, respectively (Definitions 1 and 2). Let x be a global network property with maturation index

ℓ_{\tilde{x}} / ℓ_{max}

in

M

(Definitions 3 and 4). We define the convergence index of x in

M

as the average of the maturation index and the proportion of the analyzed articles up to

P_{ℓ_{\tilde{x}}} (M)

, i.e.,

C I_{x} = \frac{(ℓ_{\tilde{x}} / ℓ_{max}) + P [A \leq ℓ_{\tilde{x}}]}{2} .

(3)

Note that the convergence index

C I_{x}

can be computed for any global network property x. To illustrate the use of

C I_{x}

in our study case, we chose a set of global network metrics that are relevant in the context of coauthorship networks. These metrics are described in the next subsection.

2.4. Network Metrics

In this section, we present the network metrics whose evolution was followed across the projection networks. The analyzed metrics are explained in the context of coauthorship networks and classified into basic metrics, degree metrics, network connectivity, and community structure. For further details about these network metrics, refer to [18,19,20].

2.4.1. Basic Metrics

Nodes, Edges, and Density

As it was explained, nodes and edges in a coauthorship network represent authors and coauthorship relations, respectively. In this study, we analyze the evolution of the number of nodes denoted by N and the number of edges denoted by L. We also analyze the evolution of the network density, which indicates the proportion of edges in the network with respect to the all possible edges. The network density is defined as follows

D = \frac{2 L}{N (N - 1)} .

(4)

If a coauthorship network was formed by articles cowritten by just one author, as in

P_{1} (M)

, then

L = 0

and

D = 0

. In contrast, if every pair of authors have cowritten an article, then

L = N (N - 1) / 2

, and D reaches its maximum value, i.e.,

D = 1

. Networks with low density are said to be sparse, while networks with high density are said to be dense. Scientific collaboration networks usually are sparse.

Diameter

Let i and j be two nodes in a network, a path between i and j is a sequence of links that connect them. The number of links in a path is called path length. Then, the distance between two nodes i and j, denoted by

d_{i j}

, is equal to the length of the shortest path between them. If there is no path between two nodes, we say that the distance between them is infinite. In a coauthorship network, distance between authors indicates the number of coauthorship relations, i.e., articles, between them. A popular distance measure in coauthorship networks is the Erdös number, which states the distance between a given author and the great mathematician Paul Erdös. In this study, we analyze the evolution of the diameter denoted by d and given by the shortest distance between the two most distant nodes in the network.

2.4.2. Degree Metrics

The degree of a node i, denoted by

k_{i}

, is given by the number of nodes that are adjacent to it. In a coauthorship network, the degree of an author indicates its number of coauthors. Coauthorship networks commonly exhibit the scale-free property, which holds that their degree distribution follows a power law. It means that most of authors have a low degree, while there is a small set of authors with a high degree. This property results from the well-known preferential attachment process that, in the context of coauthorship networks, refers to new authors tendency to collaborate with high-degree authors.

In this study, we analyze the degree distribution functions, i.e.,

P [K]

, of the whole of the coauthorship networks, i.e.,

P_{ℓ_{m a x}} (M)

. In addition, we analyze the evolution of the average node degree, i.e.,

〈 k 〉

, and the assortativity coefficient, i.e., r, across all the projection networks. The assortativity coefficient is given by the Pearson correlation coefficient and indicates the tendency of nodes to be connected with other nodes that have a similar degree [21]. The assortativity coefficient of a network is defined as follows:

r = \frac{tr (e) - | | e^{2} | |}{1 - | | e^{2} | |},

(5)

where e is the mixing matrix whose entries

e_{i j}

indicate the proportion of edges whose incident nodes have degrees i and j, respectively. If

r = 1

, then the network is said to be perfectly assortative, i.e., nodes with the same degree are adjacent. In general, if

r > 0

, then the network is assortative, i.e., nodes with a similar degree tend to be connected. In contrasts, if

r < 0

, then the network is nonassortative, i.e., low-degree nodes tend to be connected with high-degree nodes. Otherwise, if

r = 0

, then there is no correlation between the degree of adjacent nodes, and the network is called neutral.

2.4.3. Network Connectivity

Coauthorship networks are usually disconnected networks; then, these networks consist of a set of connected components. Since some network properties, such as diameter, make sense in connected networks, then the largest connected component is extracted and analyses are carried out on it. In this study, we analyze the evolution of the number of connected components, i.e.,

| S |

, the relative size of the largest connected component, i.e.,

N_{G}

, and the proportion of isolated nodes in the network, i.e.,

I_{N}

.

2.4.4. Community Structure

Community detection algorithms compute a network partition consisting of locally dense connected subgraphs called communities. In coauthorship networks, communities represent groups of highly collaborative authors. Let

n_{C}

be the number of communities detected in a network with N nodes and L links. The number of nodes in a community

C_{C}

is denoted by

N_{C}

, and the number of links connecting nodes in

C_{C}

is denoted by

L_{C}

. In this study, we analyze the evolution of the number of communities, i.e.,

n_{C}

, the relative size of the largest community, i.e.,

max (N_{C})

, and the average relative community size, i.e.,

〈 N_{C} 〉

.

In addition to the size and number of communities detected, it is important to measure the quality of the community partition. It is well-known that random networks have binomial degree distribution, and thus, these networks do not exhibit a community structure. In contrast, a single and well-defined community is a locally dense subgraph with degree distribution similar to the distribution of a random network. Therefore, the quality of a community partition can be measured by comparing the density of each community with the density of a hypothetical random graph with the same nodes. This measure is called modularity and is given by

M = \sum_{i = 1}^{n_{C}} [\frac{L_{i}}{L} - {(\frac{k_{i}}{2 L})}^{2}],

(6)

where

k_{i}

denotes the sum of degrees of nodes belonging to a community

C_{i}

. If the community partition consists in just a single community, given by the whole network, then

M = 0

due to

L_{i} = L

and

k_{i} = 2 L

. On the other hand, if each node defines a community, then

M < 0

due to

L_{i} = 0

for every community

C_{i}

in the partition. Finally, the optimal community partition is given by the partition with higher modularity, which cannot exceed one. We used the Louvain algorithm to detect communities by optimizing the partition modularity [22].

Other important metrics to estimate the quality of a community partition are the imbalance in the size of the detected communities, which is given by

I = \frac{max (N_{C})}{〈 N_{C} 〉},

(7)

and the edge-cut size, denoted by

L_{C}

and given by the proportion of edges joining communities, that is, the proportion of articles that can be seen as bridges between communities. Since communities must be densely connected subgraphs, community partition methods minimize the edge-cut size. In summary, low imbalance and small edge-cut size indicate high quality in the community partition.

3. Results

In this section, we present a case study of the proposed methodology, which consists of an analysis of the multilayer coauthorship networks for research publications of the three major Mexican universities: UNAM, IPN, and UAM during the 2012–2021 period. Details about this dataset can be found in Section 2.1. To begin the analysis, articles were classified according to the analyzed universities and subject areas; see Figure 1. The second step was to build the multilayer coauthorship networks

M

(and their projection networks

P_{i} (M)

) for each subject area and university. From Remark 2, a projection network

P_{i} (M)

is given by the coauthorship network of articles with at most i coauthors, where i denotes the collaboration order of

P_{i} (M)

. The next step was to identify the collaboration order i in which the global network metrics described in Section 2.4 converge. For each subject area and university, the evolution of each network metric x was tracked across the coauthorship networks

P_{i} (M)

, and then the functions

x (i)

were plotted. Finally, in the cases where the network metric converges on some collaboration order i, the convergence value

\tilde{x}

(Definition 3) and the convergence index

C I_{x}

(Definition 5) were reported.

3.1. Basic Metrics

The analysis starts with the following basic metrics: the number of nodes (N), the number of edges (L), the network density (D), and the diameter of the largest connected component (d). Figure 4 shows the evolution of these metrics across different collaboration orders i, i.e., across the coauthorship networks

P_{i} (M)

. Table 1 presents the convergence values and convergence indices for N, L, and d. Regarding N, note that in all subject areas, N converges on lower collaboration order i at UAM and IPN than at UNAM; see Figure 4. As a consequence, UNAM has the highest (or close to the highest)

C I_{N}

in all subject areas, except for the social sciences. In this area, N converges on i less than 10, stemming from most authors collaborating in articles with both a low and high number of coauthors. Turning now to L, since

C I_{L}

takes values between

0.9

and 1, we can argue that L never converges, which is expected due to the number of coauthorship relations L increasing as the order of collaboration increases. In the same way, D never converges because its value depends on L. Finally,

C I_{d}

takes values between

0.31

and

0.66

. This suggests that the connection (collaboration) of distant authors occurs in low collaboration orders.

3.2. Degree Metrics

The second part of our analysis focuses on the following degree metrics: the complementary cumulative degree distribution

P [K > k]

, the average node degree

〈 k 〉

, and the assortative coefficient r. Figure 5 shows the functions

P [K > k]

for the whole of the coauthorship networks, i.e.,

P_{ℓ_{m a x}} (M)

. These distribution functions indicate the proportion of authors that have collaborated with k or more authors. Similarly to the distribution of coauthor count per article (Figure 2), the log–log scale shows that the degree distributions may consists of two parts, each of them following the power law with different exponents. Because of this, the collaborative patterns of the analyzed universities obey the preferential attachment process, as it was expected. Let us now turn to

〈 k 〉

and r; Figure 6 shows the evolution of these metrics across different collaboration orders i. As we can see,

〈 k 〉

and r increase steadily and thus never converge. The value of

〈 k 〉

increases as a consequence of the degree of new authors added in each

P_{i} (M)

. Regarding r, it gets the lowest values when

i < 10

, because the majority of authors (

90 %

, see Figure 5) usually collaborate with authors whose degree ranges between 1 and 10. In contrast, r increases slowly when

i \geq 10

, owing to the remaining

10 %

of authors having high degree and usually collaborating with authors that also have high degree, although some of these authors also keep low-order collaborations. This trend is more marked in social sciences, where r increases rapidly and reaches the highest assortativity in the universities analyzed. It suggests that in social science, high-degree authors only collaborate with high-degree authors.

3.3. Network Connectivity

The next step of our analysis corresponds to the following network connectivity metrics: the relative size of the largest connected component

N_{G}

, the number of connected components

| S |

, and the proportion of isolated nodes

I_{N}

. Figure 7 shows the evolution of these metrics across different collaboration orders i, while Table 2 presents the convergence values and indices for

N_{G}

,

| S |

, and

I_{N}

. Let us consider the functions

N_{G}

that converge in low collaboration orders

i < 10

. This is the case for the health, life, and physical sciences in UNAM; all subject areas in IPN; and physical sciences in UAM. In these cases, articles cowritten by 10 or more authors have at least a coauthor that also has collaborated with a low number of authors. Therefore, such an author adds their coauthors to the largest connected component when

i \geq 10

. Consider now the remaining cases: social sciences in UNAM and health, life, and social sciences in UAM, where the values of

N_{G}

converge for high-order collaborations. In these cases, most authors with high-order collaborations have no experience in low-order collaborations; thus, authors with low-order collaboration are not added to the largest connected component. Note that in social sciences, there are the lowest values of

C I_{N_{G}}

among the analyzed universities. Regarding

| S |

, Figure 7 shows that

| S |

converges in low-order collaboration

i < 10

, which is consistent with the values of

C I_{| S |}

that are lower than

0.6

. In contrast,

I_{N}

converges in

i > 20

, and then the values of

C I_{| S |}

are higher than

0.6

, which results from the low percentage of articles with 20 or more authors. In the social sciences, there is the highest proportion of isolated authors, i.e.,

I_{N}

, which is

1 %

in IPN and

10 %

in UNAM and UAM. In the rest of the subject areas, the proportion of isolated authors is around

0.1 %

. It is important to mention that the proportion of isolated authors increases the value of the assortativity coefficient.

3.4. Community Structure

Finally, we analyze the community structure based on community quality metrics and community size metrics. Figure 8 shows the evolution of the following community quality metrics: the modularity M, the imbalance I, and the edge-cut size

L_{C}

. Since

M > 0.8

in all subject areas and universities, the community partitions detected have high quality, making them a good representation of the emergent research teams. However, note that M drops steadily as i increases for all subjects areas in UAM and social sciences in both UNAM and IPN, which may be correlated with the evolution of

N_{G}

. Let us remember from the previous subsection that in the mentioned areas, several authors only have high-order collaborations. In these cases, the community structure loses quality in high-order collaborations. In addition to M, Figure 8 shows that the values of I and

L_{C}

are also low, which confirms the high quality of the community partitions. Note that among all subject areas, the social sciences have the highest community quality. Turning now to the metrics of community size, Figure 9 shows the evolution of the following functions: the number of communities

n_{C}

, the relative size of the largest community

N_{C}

, and the average relative community size

〈 N_{C} 〉

. As we can see from Table 3, even though

N_{C}

varies at each collaboration order,

n_{c}

and

〈 N_{C} 〉

are defined in low-order collaboration, i.e.,

C I_{〈 N_{C} 〉} < 0.6

. Therefore, the research teams (communities) emerge in low-order collaborations and remain in high-order collaborations.

4. Conclusions

This study presents a methodology for the analysis of collaboration networks with higher-order interactions, which allows one to get a microscopic view of the collaboration orders. The methodology also includes the proposal of a new convergence index to determine the order of collaboration at which the global properties of collaboration networks converge. To support the proposed methodology, we present a case study on the scientific collaboration of the three main Mexican universities: UNAM, UAM, and IPN during the 2012–2021 period. The information in the research publications was retrieved from the Scopus database, and then the research articles were classified into the aforementioned universities and the following subject areas: life sciences, physical sciences, health sciences, and social sciences. Following the analysis, each set of articles was mapped into a multilayer coauthorship network

M

and its projection networks

P_{i} (M)

, where each layer i of

M

represents the coauthorship network of articles cowritten by exactly i authors, and each projection network

P_{i} (M)

represents the coauthorship network of articles cowritten by at most i authors. Finally, the evolution of different network metrics was followed across the projection networks in order to compute their convergence values and convergence indexes. The analyzed metrics were selected for their relevance in the context of collaboration networks and are classified into the following groups: basic network metrics, degree metrics, connectivity metrics, and community metrics.

The effectiveness of our methodology is manifested when our analysis shows differences in the collaborative patterns which cannot be identified or detected following the traditional approach of collaboration networks based on pairwise interactions.

As an example, we can see that the collaborative patterns in social sciences present clear divergences from the rest of the subject areas. Our analysis reveals that the proportion of the largest connected component gets the smallest values in the social sciences: UNAM

40 %

, IPN

3 %

, and UAM

8 %

. In addition, social sciences also have the largest number of connected components and the largest proportion of isolated nodes: IPN

1 %

and UNAM and UAM

10 %

. The results indicate that social scientists usually work in small isolated groups with similar collaboration order, as it is shown by the highest values of the assortative coefficient.

Using our methodology, the differences in collaborative patterns among university also becomes evident as an effect of the differences in the disciplinary focus and research priorities of universities.

Our methodology also permits one to draw another important conclusion for our case of study: community structure emerges in low-order collaborations and remains in high-order collaborations. The convergence index of the number of communities takes values between

0.3

and

0.5

. However, as the collaboration order increases, the quality of the community structure decreases, not only in the social sciences but in all subject areas of UAM. In contrast, it increases in the health, life, and physical science of UNAM and IPN.

In conclusion, this study contributes to the literature on coauthorship networks by proposing a novel methodology to capture the strength of coauthorship relations and applying it to investigate the collaborative behavior of researchers in Mexican universities. By analyzing coauthorship networks in the context of Mexico’s public universities, we provide insights into the collaborative behavior of researchers in different fields and at different stages of their careers. Our proposed methodology can be applied to other scientific domains to better capture the strength of collaborations among researchers [4,5,6,7,8].

Future studies should consider the time evolution of collaborative networks using this methodology.

Author Contributions

Conceptualization, D.A.-G. and R.B.-J.; Methodology, D.A.-G. and R.B.-J.; Validation, R.B.-J.; Formal analysis, D.A.-G. and R.B.-J.; Data curation, D.A.-G.; Writing—original draft, D.A.-G.; Writing—review & editing, R.B.-J.; Visualization, D.A.-G.; Supervision, R.B.-J. All authors have read and agreed to the published version of the manuscript.

Funding

This work supported by the Universidad Autonoma Metropolitana Cuajimalpa.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Newman, M. Coauthorship networks and patterns of scientific collaboration. Proc. Natl. Acad. Sci. USA 2004, 101, 5200–5205. [Google Scholar] [CrossRef] [PubMed]
Barabasi, A.L.; Jeong, H.; Neda, Z.; Ravasz, E.; Schubert, A.; Vicsek, T. Evolution of the social network of scientific collaborations. Phys. A Stat. Mech. Its Appl. 2002, 311, 590–614. [Google Scholar] [CrossRef]
Newman, M.E.J. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA 2001, 98, 404–409. [Google Scholar] [CrossRef] [PubMed]
Dorantes-Gilardi, R.; Ramírez-Álvarez, A.A.; Terrazas-Santamaría, D. Is there a differentiated gender effect of collaboration with super-cited authors? Evidence from junior researchers in economics. Scientometrics 2023, 128, 2317–2336. [Google Scholar] [CrossRef]
González Brambila, C.N.; Olivares-Vázquez, J.L. Patterns and evolution of publication and co-authorship in Social Sciences in Mexico. Scientometrics 2021, 126, 2595–2626. [Google Scholar] [CrossRef]
Lancho-Barrantes, B.S.; Cantú-Ortiz, F.J. Science in Mexico: A bibliometric analysis. Scientometrics 2019, 118, 499–517. [Google Scholar] [CrossRef]
Reyes-Gonzalez, L.; Gonzalez-Brambila, C.N.; Veloso, F. Using co-authorship and citation analysis to identify research groups: A new way to assess performance. Scientometrics 2016, 108, 1171–1191. [Google Scholar] [CrossRef]
Gonzalez-Brambila, C.N. Social capital in academia. Scientometrics 2014, 101, 1609–1625. [Google Scholar] [CrossRef]
Lung, R.I.; Gaskó, N.; Suciu, M.A. A hypergraph model for representing scientific output. Scientometrics 2018, 117, 1361–1379. [Google Scholar] [CrossRef]
Vasilyeva, E.; Kozlov, A.; Alfaro-Bittner, K.; Musatov, D.; Raigorodskii, A.M.; Perc, M.; Boccaletti, S. Multilayer representation of collaboration networks with higher-order interactions. Sci. Rep. 2021, 11, 5666. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Wang, G.; Wei, D. Dynamical evolution behavior of scientific collaboration hypernetwork. AIP Adv. 2022, 12, 115117. [Google Scholar] [CrossRef]
Rose, M.E.; Kitchin, J.R. Pybliometrics: Scriptable bibliometrics using a Python interface to Scopus. SoftwareX 2019, 10, 100263. [Google Scholar] [CrossRef]
Dirección General de Planeación de la UNAM. Agenda EstadíStica UNAM 2021. Available online: https://agendas.planeacion.unam.mx/pdf/Agenda-2021.pdf (accessed on 7 March 2023).
Dirección de Información Institucional del IPN. Anuario General Estadístico IPN 2021. Available online: https://www.ipn.mx/assets/files/coplaneval/docs/Evaluacion/ANUARIO_2021.pdf (accessed on 7 March 2023).
Unidad de Transparencia de la UAM. Anuario Estadístico UAM 2021. Available online: https://transparencia.uam.mx/inforganos/anuarios/anuario2021/anuario_estadistico_2021.pdf (accessed on 7 March 2023).
Boccaletti, S.; Bianconi, G.; Criado, R.; del Genio, C.; Gómez-Gardeñes, J.; Romance, M.; Sendiña-Nadal, I.; Wang, Z.; Zanin, M. The structure and dynamics of multilayer networks. Phys. Rep. 2014, 544, 1–122. [Google Scholar] [CrossRef] [PubMed]
Networks beyond pairwise interactions: Structure and dynamics. Phys. Rep. 2020, 874, 1–92. [CrossRef]
Newman, M.E.J. Networks: An Introduction; Oxford University Press: Oxford, UK; New York, NY, USA, 2010. [Google Scholar]
Estrada, E.; Knight, P. A First Course in Network Theory; Oxford University Press: Oxford, UK, 2015. [Google Scholar]
Barabási, A.L.; Pósfai, M. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Newman, M.E.J. Mixing patterns in networks. Phys. Rev. E 2003, 67, 026126. [Google Scholar] [CrossRef] [PubMed]
Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, 10008. [Google Scholar] [CrossRef]

Figure 1. Distribution of articles among the analyzed universities and subject areas.

Figure 2. Complementary cumulative distribution functions for the coauthor count in the set of articles analyzed, i.e.,

P [A > a]

: the horizontal axis represents the number of coauthors per article, while the vertical axis shows the proportion of articles that have more than a coauthors.

Figure 2. Complementary cumulative distribution functions for the coauthor count in the set of articles analyzed, i.e.,

P [A > a]

: the horizontal axis represents the number of coauthors per article, while the vertical axis shows the proportion of articles that have more than a coauthors.

Figure 3. Multilayer coauthorship network

M

and its projection networks

P_{i} (M)

, with 5 articles involved:

α

,

β

,

δ

,

γ

, and

ϵ

: (a) Multilayer coauthorship network

M

, where each layer i is given by the coauthorship network

G_{i}

of articles cowritten by exactly i authors. (b) Projection networks

P_{i} (M)

given by the coauthorship networks of articles cowritten by at most i authors.

Figure 3. Multilayer coauthorship network

M

and its projection networks

P_{i} (M)

, with 5 articles involved:

α

,

β

,

δ

,

γ

, and

ϵ

: (a) Multilayer coauthorship network

M

, where each layer i is given by the coauthorship network

G_{i}

of articles cowritten by exactly i authors. (b) Projection networks

P_{i} (M)

given by the coauthorship networks of articles cowritten by at most i authors.

Figure 4. Values of basic metrics in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: the vertical axis shows the number of nodes N, the number of edges L, the network density D, and the diameter of the largest connected component d, while the horizontal axis represents the collaboration order i of the networks

P_{i} (M)

.

Figure 4. Values of basic metrics in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: the vertical axis shows the number of nodes N, the number of edges L, the network density D, and the diameter of the largest connected component d, while the horizontal axis represents the collaboration order i of the networks

P_{i} (M)

.

Figure 5. Complementary cumulative degree distribution functions, i.e.,

P [K > k]

, of the coauthorship networks

P_{ℓ_{m a x}} (M)

: the horizontal axis represents the number of coauthors, i.e., the node degree k, while the vertical axis shows the proportion of authors, i.e., nodes, that have collaborated with more than k coauthors.

Figure 5. Complementary cumulative degree distribution functions, i.e.,

P [K > k]

, of the coauthorship networks

P_{ℓ_{m a x}} (M)

: the horizontal axis represents the number of coauthors, i.e., the node degree k, while the vertical axis shows the proportion of authors, i.e., nodes, that have collaborated with more than k coauthors.

Figure 6. Values of the average node degree

〈 k 〉

and the assortativity coefficient r in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: the horizontal axis represents the collaboration order i of the networks

P_{i} (M)

.

Figure 6. Values of the average node degree

〈 k 〉

and the assortativity coefficient r in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: the horizontal axis represents the collaboration order i of the networks

P_{i} (M)

.

Figure 7. Values of network connectivity in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: The vertical axis shows the relative size of the largest connected component

N_{G}

, the number of connected components

| S |

, and the proportion of isolated nodes

I_{N}

, while the horizontal axis represents the collaboration order i of the networks

P_{i} (M)

.

Figure 7. Values of network connectivity in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: The vertical axis shows the relative size of the largest connected component

N_{G}

, the number of connected components

| S |

, and the proportion of isolated nodes

I_{N}

, while the horizontal axis represents the collaboration order i of the networks

P_{i} (M)

.

Figure 8. Values of the community quality metrics in the coauthorship networks

P_{i} (M)

of the different universities and subject areas. The vertical axis shows the modularity M, the imbalance I, and the relative size of the edge cut

E_{C}

, while the horizontal axis represents the collaboration order i of networks

P_{i} (M)

.

Figure 8. Values of the community quality metrics in the coauthorship networks

P_{i} (M)

of the different universities and subject areas. The vertical axis shows the modularity M, the imbalance I, and the relative size of the edge cut

E_{C}

, while the horizontal axis represents the collaboration order i of networks

P_{i} (M)

.

Figure 9. Values of the community size metrics in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: the vertical axis shows the number of detected communities

n_{c}

, the relative size of the largest community

max N_{C}

, and the relative average community size

〈 N_{c} 〉

, while the horizontal axis represents the collaboration order i of networks

P_{i} (M)

.

Figure 9. Values of the community size metrics in the coauthorship networks

P_{i} (M)

of the different universities and subject areas: the vertical axis shows the number of detected communities

n_{c}

, the relative size of the largest community

max N_{C}

, and the relative average community size

〈 N_{c} 〉

, while the horizontal axis represents the collaboration order i of networks

P_{i} (M)

.

Table 1. Convergence indices

C I_{x}

and convergence values

\tilde{x}

of the number of nodes N, the number of edges L, and the diameter of the largest connected component d.

Table 1. Convergence indices

C I_{x}

and convergence values

\tilde{x}

of the number of nodes N, the number of edges L, and the diameter of the largest connected component d.

		All	Health	Life	Physics	Social
$C I_{N}$	UNAM	$0.79$	$0.8$	$0.77$	$0.8$	$0.68$
	IPN	$0.65$	$0.71$	$0.69$	$0.61$	$0.74$
	UAM	$0.63$	$0.79$	$0.79$	$0.61$	$0.85$
$\tilde{N}$	UNAM	104,082	35,302	45,118	57,076	13,173
	IPN	44,474	13,872	23,432	26,194	3342
	UAM	6790	2350	2839	3280	1145
$C I_{L}$	UNAM	$0.97$	$0.95$	$0.97$	$0.97$	$1.0$
	IPN	$0.92$	$0.95$	$0.95$	$0.97$	$1.0$
	UAM	$0.9$	$1.0$	$1.0$	$1.0$	$1.0$
$\tilde{L}$	UNAM	1,112,429	349,995	403,989	637,090	64,111
	IPN	323,264	102,398	172,398	162,240	16,869
	UAM	39,327	18,437	21,138	12,439	3424
$C I_{d}$	UNAM	$0.45$	$0.57$	$0.49$	$0.36$	$0.6$
	IPN	$0.49$	$0.59$	$0.47$	$0.48$	$0.31$
	UAM	$0.46$	$0.66$	$0.6$	$0.57$	$0.59$
$\tilde{d}$	UNAM	24	19	16	22	31
	IPN	17	17	16	16	8
	UAM	23	16	16	21	2

Table 2. Convergence indices

C I_{x}

and convergence values

\tilde{x}

of the relative size of the largest connected component

N_{G}

, the number of connected components

| S |

, and the proportion of isolated nodes

I_{N}

.

Table 2. Convergence indices

C I_{x}

and convergence values

\tilde{x}

of the relative size of the largest connected component

N_{G}

, the number of connected components

| S |

, and the proportion of isolated nodes

I_{N}

.

		All	Health	Life	Physics	Social
$C I_{N_{G}}$	UNAM	$0.5$	$0.53$	$0.46$	$0.46$	$0.68$
	IPN	$0.45$	$0.54$	$0.42$	$0.43$	$0.74$
	UAM	$0.61$	$0.79$	$0.65$	$0.57$	$0.95$
$\tilde{N_{G}}$	UNAM	$0.876$	$0.8057$	$0.895$	$0.882$	$0.4026$
	IPN	$0.91$	$0.8125$	$0.9$	$0.874$	$0.0338$
	UAM	$0.467$	$0.4863$	$0.5137$	$0.2225$	$0.0885$
$C I_{\| S \|}$	UNAM	$0.47$	$0.5$	$0.51$	$0.48$	$0.39$
	IPN	$0.53$	$0.59$	$0.59$	$0.53$	$0.49$
	UAM	$0.35$	$0.38$	$0.48$	$0.47$	$0.57$
$\tilde{\| S \|}$	UNAM	3308	964	590	1027	2705
	IPN	599	330	241	455	414
	UAM	653	149	157	334	348
$C I_{I_{N}}$	UNAM	$0.8$	$0.8$	$0.82$	$0.81$	$0.69$
	IPN	$0.65$	$0.72$	$0.69$	$0.63$	$0.74$
	UAM	$0.65$	$0.79$	$0.79$	$0.61$	$0.85$
$\tilde{I_{N}}$	UNAM	$0.0148$	$0.00294$	$0.00115$	$0.00372$	$0.1091$
	IPN	$0.00166$	$0.00093$	$0.00043$	$0.00141$	$0.01556$
	UAM	$0.02089$	$0.00468$	$0.00176$	$0.00884$	$0.1092$

Table 3. Convergence indices and convergence values of the number of communities

n_{c}

and the average relative community size

〈 N_{C} 〉

.

Table 3. Convergence indices and convergence values of the number of communities

n_{c}

and the average relative community size

〈 N_{C} 〉

.

		All	Health	Life	Physics	Social
$C I_{n_{c}}$	UNAM	$0.45$	$0.45$	$0.49$	$0.46$	$0.39$
	IPN	$0.49$	$0.52$	$0.42$	$0.51$	$0.49$
	UAM	$0.42$	$0.56$	$0.39$	$0.47$	$0.57$
$n_{c}$	UNAM	3422	1046	667	1116	2708
	IPN	675	387	315	523	415
	UAM	674	162	173	345	348
$C I_{〈 N_{C} 〉}$	UNAM	$0.45$	$0.42$	$0.49$	$0.46$	$0.39$
	IPN	$0.49$	$0.52$	$0.42$	$0.51$	$0.49$
	UAM	$0.42$	$0.56$	$0.39$	$0.47$	$0.57$
$〈 N_{C} 〉$	UNAM	$0.00029$	$0.00093$	$0.0015$	$0.0009$	$0.00037$
	IPN	$0.00148$	$0.00258$	$0.00317$	$0.00191$	$0.00241$
	UAM	$0.00148$	$0.00617$	$0.00578$	$0.0029$	$0.00287$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aguirre-Guerrero, D.; Bernal-Jaquez, R. A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions. Mathematics 2023, 11, 2265. https://doi.org/10.3390/math11102265

AMA Style

Aguirre-Guerrero D, Bernal-Jaquez R. A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions. Mathematics. 2023; 11(10):2265. https://doi.org/10.3390/math11102265

Chicago/Turabian Style

Aguirre-Guerrero, Daniela, and Roberto Bernal-Jaquez. 2023. "A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions" Mathematics 11, no. 10: 2265. https://doi.org/10.3390/math11102265

APA Style

Aguirre-Guerrero, D., & Bernal-Jaquez, R. (2023). A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions. Mathematics, 11(10), 2265. https://doi.org/10.3390/math11102265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions

Abstract

1. Introduction

2. Data and Methods

2.1. Data

2.2. Multilayer Coauthorship Networks

2.3. Convergence Index

2.4. Network Metrics

2.4.1. Basic Metrics

Nodes, Edges, and Density

Diameter

2.4.2. Degree Metrics

2.4.3. Network Connectivity

2.4.4. Community Structure

3. Results

3.1. Basic Metrics

3.2. Degree Metrics

3.3. Network Connectivity

3.4. Community Structure

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI