Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks

Xue, Haobai

doi:10.3390/info15100597

Open AccessArticle

Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks

by

Haobai Xue

Information and Intelligence Department, University Town Library of Shenzhen, 2239 Lishui Road, Nanshan District, Shenzhen 518055, China

Information 2024, 15(10), 597; https://doi.org/10.3390/info15100597

Submission received: 30 July 2024 / Revised: 12 September 2024 / Accepted: 19 September 2024 / Published: 30 September 2024

(This article belongs to the Special Issue Advances in Data and Network Sciences Applied to Computational Social Science)

Download

Browse Figures

Versions Notes

Abstract

:

This study investigates the coevolution of coauthorship and citation networks and their influence on scientific metrics such as the h-index and journal impact factors. Using a preferential attachment mechanism, we developed a model that integrated these networks and validated it with data from the American Physical Society (APS). While the correlations between reference counts, paper lifetime, and team sizes with scientific impact metrics are well-known, our findings demonstrate how these relationships vary depending on specific model parameters. For instance, increasing reference counts or reducing paper lifetime significantly boosts both journal impact factors and h-indexes, while expanding team sizes without adding new authors can artificially inflate h-indexes. These results highlight potential vulnerabilities in commonly used metrics and emphasize the value of modeling and simulation for improving bibliometric evaluations.

Keywords:

coauthorship network; citation network; coevolution; citation network; simulation experiment; impact factors; h-index; bibliometrics

1. Introduction

1.1. Research Background and Significance

The rapid development and widespread adoption of digital humanities tools have significantly transformed research methods in library and information science [1,2]. In bibliometrics, the use of software tools such as CiteSpace (https://citespace.podia.com/), VOSviewer (https://www.vosviewer.com/), and Bibexcel (https://homepage.univie.ac.at/juan.gorraiz/bibexcel/) has shifted the focus from traditional statistical approaches based on bibliometric laws to more detailed analyses of author collaboration networks, citation networks, and the temporal evolution of research frontiers [3,4]. While these tools excel at analyzing real-world data, they have limitations when it comes to simulating and predicting future trends or exploring hypothetical scenarios [5].

Simulation methods offer an important opportunity to overcome these limitations by generating virtual data that can be compared with real-world data. These simulations can reveal the underlying mechanisms driving macro-level phenomena, reduce uncertainty, and enable more accurate predictions about future developments or extreme scenarios [6]. For example, Bai et al. [7] introduced the SCIRank model, which integrates collaboration and citation networks, demonstrating how such models can provide a more precise understanding of the evolving dynamics of scientific collaboration and its impact. Although these methods are commonly applied in fields like network science, complex systems, and social computing, they remain underused in bibliometrics where they have the potential to significantly deepen our understanding of how collaboration and citation networks influence scientific impact [8].

These models are particularly relevant to the goals of scientometrics, which views science as a social network of researchers who produce and validate knowledge [8,9]. However, despite the clear link between network models and bibliometric research, there is still a lack of studies that fully integrate author collaboration networks, citation networks, and the traditional laws of bibliometrics. Addressing this gap through the use of simulation models could lead to a more comprehensive understanding of the complex relationships between collaboration, citation behavior, and scientific impact.

1.2. Literature Review

1.2.1. Citation Network

Research on citation networks has a long history. Price was the first to propose a generation mechanism for citation networks, hypothesizing the following: (1) Papers are continuously published; (2) The probability of being cited is proportional to the number of citations a paper has already received. This introduces the concept of cumulative advantage [10]. Barabási later formalized these hypotheses as “growth” and “preferential attachment”, leading to the creation of the Barabási–Albert model, which was extensively analyzed and simulated [11]. However, the B-A model does not account for variations in paper quality, so Bianconi introduced the concept of “fitness”, developing the Bianconi–Barabási model, which allows later-published, high-quality papers to surpass earlier, less impactful ones in citation count [12]. Additionally, a paper’s likelihood of being cited decreases over time as its novelty fades with the publication of subsequent related papers. This aging effect is often modeled using a negative exponential [13,14] or log-normal function [15]. The model that combines these four mechanisms—growth, preferential attachment, fitness, and aging—is known as the minimal citation model [16], and it will be used for citation network modeling in this study. Song et al. [17] expand on these models by examining the structural and temporal characteristics of negative links in citation networks, highlighting how these less-studied aspects can influence the dynamics of scientific impact. Research has demonstrated that normalizing citation distributions across different disciplines yields a unified distribution curve [18], suggesting that citation distributions and the minimal citation model have universal applicability.

While quantitative models provide valuable insights into citation networks, they often overlook the social, cultural, and institutional factors shaping citation behaviors. Recent studies emphasize that citation patterns are influenced not only by the intrinsic “fitness” of a paper but also by qualitative aspects such as collaboration dynamics, peer pressure, and strategic behaviors. Brunton [19] highlights how citation networks can be manipulated through practices like self-citation and coercive citation, creating a distorted view of scholarly impact. Herteliu et al. [20] further explore the power dynamics within citation networks, showing how editorial influence can skew citation patterns, reflecting institutional pressures rather than purely academic merit. Yousefi Nooraie et al. [21] advocate for a mixed-methods approach, blending quantitative metrics with qualitative insights to capture the complexity of citation networks, where personal motivations and collaborative relationships shape citation trajectories. Bologna [22] argues that open citations enable more transparent evaluations of research impact, complementing quantitative measures like the impact factor by considering the broader social and practical uses of research. By integrating these qualitative dimensions, we provide a more comprehensive analysis that accounts for the social and institutional contexts influencing citation behaviors, complementing the limitations of purely quantitative models.

1.2.2. Coauthorship Network

Research on author collaboration networks is relatively limited compared to citation networks [23]. Unlike citation networks where nodes represent papers and links are directed from new papers to existing ones, coauthorship networks represent authors as nodes with undirected links, allowing for new connections between existing nodes [24]. Newman conducted empirical studies on static author collaboration networks, using bibliographic data from various disciplines [23,25,26], while Tomassini examined the formation and evolution of these networks over time, using bibliographic data across different time slices [27]. In a simulation, Barabási proposed a collaborator network model with two key assumptions: (1) New authors join the network at a constant rate; (2) The probability of collaboration between authors is proportional to the product of their existing collaborations [24]. However, this model focuses solely on authors without establishing a direct link between authors and papers. To address this, a paper team assembly model should be developed to simulate internal collaboration among authors for each paper, updating the entire collaborator network in real time [28]. Guimera proposed a team assembly model based on team size, author seniority, and diversity parameters [29,30]. Yang et al. [31] contribute to this discourse by analyzing the evolution patterns of scientific knowledge through fine-granularity citation networks, offering a more detailed understanding of how individual collaborations can shape broader network dynamics. This study simplifies and integrates Guimera’s model with Barabási’s collaborator network model to simulate the evolution of author collaboration networks.

The structure and evolution of coauthorship networks are shaped not only by mathematical models but also by social dynamics such as interdisciplinary connections, power dynamics, and institutional settings. Pessoa et al. [32] highlight the key role of interdisciplinary collaborations in strengthening coauthorship networks, especially in the Brazilian academic context, where cross-disciplinary ties enhance network cohesion. Orzechowski [33] explores the asymmetry of social interactions, showing how power imbalances and individual prominence influence link predictability, with senior researchers and dominant institutions attracting more coauthors. Pelacho [34] reveals that citizen science contributes to new forms of collaboration through grassroots efforts and interdisciplinary engagement, while Chen [35] emphasizes that internal collaboration within scientific teams is shaped by organizational factors and shared goals. Hancean [36] adds that in elite European circles, social and professional dynamics, including long-standing relationships and reputational capital, significantly impact network formation. These qualitative dimensions offer a more comprehensive view of coauthorship networks, which recognizes that collaboration is influenced by social, cultural, and institutional factors beyond structural mechanisms.

1.2.3. Coevolution of Both Networks

Although modeling the coevolution of citation and coauthorship networks offers valuable insights, research in this area is still relatively limited [37]. Börner’s TARL model [38] incorporates the Matthew effect by favoring highly cited papers in random citation behavior, resulting in a fat-tailed citation distribution, but is limited by its assumption that each author produces a fixed number of papers annually. Xie et al. [39] introduced a graphical model that also integrates the Matthew effect, where older, more influential leaders attract more collaborators and citations. Singh et al. [37] explored the interdependence between evolving citation and coauthorship networks, finding that coauthorship proximity significantly influences citation patterns, with closer coauthors more likely to exchange citations and the effect diminishing rapidly with increased network distance. Building on these findings, Zhang et al. [40] demonstrate how novelty and bibliometric factors, combined with academic network influences, can significantly affect citation counts, further evidencing the complex interplay between these networks.

The coevolution of citation and coauthorship networks is shaped not only by quantitative mechanisms but also by social dynamics and collaboration patterns. Liu et al. [41] highlight how changes in one network, such as new collaborations, can influence the other, emphasizing the interdependence between collaboration and citation in shaping scholarly impact. Bai et al. [7] introduce the SCIRank model, showing how strong collaborative ties drive citation performance, with high-impact papers enhancing both metrics. Xie et al. [39] further illustrate how the Matthew effect operates across both networks, with prominent authors gaining more citations and collaborators, reinforcing their influence. Muñoz-Muñoz [42] explores how the thematic focus in specific research areas, like violence against women, shapes co-citation and coauthorship patterns, while Singh et al. [37] emphasize the role of personal and professional relationships in shaping these networks, driven not only by merit but also by social ties and institutional affiliations. These qualitative dimensions provide a more comprehensive understanding of how social and collaborative factors drive the coevolution of citation and coauthorship networks, complementing traditional quantitative models.

1.2.4. Scientific Impact Indicators

Scientific impact indicators, such as the

h

-index [43] for authors and the impact factor [44] for journals, typically take into account multiple factors like output and impact. This complexity makes them challenging to analyze in depth using qualitative methods. However, these indicators are often quantifiable, making them well-suited for quantitative studies through simulation, and several such studies have been conducted. For instance, Zhou et al. [45,46] combined a citation model with a peer review model to explore how factors such as review cycles and citation counts affect journal impact factors. Guns and Rousseau [47] used computer simulations to study the evolution of the

h

-index over time, finding that it generally grows linearly, with occasional S-shaped patterns. Ionescu and Chopard [48] investigated the distribution of the

h

-index within groups, while Medo and Cimini [6] conducted a comparative analysis of common scientific impact indicators, highlighting their respective strengths. However, these studies rely on simplified models and do not fully integrate citation and author collaboration networks to perform coevolution simulations.

To address the qualitative aspects of scientific impact indicators, it is important to consider how metrics like the

h

-index and journal impact factor (JIF) interact with broader social and disciplinary contexts. Bornmann et al. [49] critique the inappropriate use of JIF for evaluating individual articles and highlight alternative metrics, such as the SCImago Journal Rank and Eigenfactor, which focus on citation quality. Harzing [50] notes the limitations of relying solely on traditional databases like Web of Science and advocates for including Google Scholar and Scopus for broader coverage. McKiernan et al. [51] warn that overemphasis on JIF in promotions and tenure evaluations can distort the true impact of research, urging the use of alternative metrics that reflect societal and interdisciplinary relevance. Fire et al. [52] introduce “Goodhart’s Law”, where metrics like JIF and

h

-index become targets for optimization, sometimes at the expense of genuine contributions, emphasizing the need for balancing quantitative and qualitative evaluation. Chapman et al. [53] discuss how focusing on metrics can incentivize “gaming” behaviors, such as self-citation, and argue for qualitative evaluations that assess research contributions beyond citation counts. Waltman [54] reviews citation impact indicators, stressing that while useful, they often fail to capture interdisciplinary relevance and societal impact, underscoring the need to include qualitative factors for a fuller understanding of scientific impact.

1.3. Theoretical and Practical Implications

This study provides contributions to both the theoretical framework of bibliometrics and the practical application of scientific impact indicators.

1.: Theoretical Implications:

Advancement of Coevolution Models: By integrating coauthorship and citation networks, this study enhances the understanding of how these networks evolve together and influence scientific metrics. This provides a more nuanced view compared to existing models, offering new insights into the micro-mechanisms underlying scientific collaboration and impact.

Refinement of Bibliometric Indicators: The findings suggest that current scientific impact indicators, such as the

h

-index and journal impact factors, can be influenced or even manipulated under certain conditions. This challenges the robustness of these metrics and calls for the development of more resilient and accurate indicators grounded in the dynamic interactions captured by the coevolution model.

2.: Practical Implications:

Improving Research Management: The study’s results can inform the development of new tools and methodologies for assessing scientific impact, which can be applied in research management and academic evaluation. By understanding the potential vulnerabilities of current metrics, institutions can develop more reliable strategies for evaluating research quality and impact.

Development of Digital Humanities Tools: The versatile model proposed in this study can be adapted to create digital tools that aid in the visualization and analysis of complex scientific networks. These tools can enhance the ability of researchers and librarians to explore and understand the dynamics of scientific collaboration and citation patterns.

2. Model Formulation and Validation

2.1. APS Database

The coevolution model of coauthorship and citations relies on and is validated against the American Physical Society (APS) dataset, which is extensively employed in bibliometric studies [6,13,15,55] and publicly available through Ref. [56]. The APS dataset comprises two subsets: citing article pairs and article metadata. Citing article pairs consist of pairs of APS papers, with one paper citing another, making them suitable for constructing citation networks. On the other hand, article metadata includes fundamental details like doi, authors, and publication dates for all APS papers, facilitating the construction of coauthorship networks. In this study, we exclusively consider citation pairs in which both the citing and cited papers fall within the article metadata subset. This choice ensures a consistent and precise match between total reference and citation numbers at all times.

The APS datasets cover materials from 1893 to the end of 2021, providing a continuous span of 129 years of empirical data. For the subsequent simulation, we have chosen a time length of

T = 13

years, with each simulated year corresponding to approximately 10 years of empirical data. Although the APS datasets consist of 19 journals, this paper does not focus on comparing indicators across different journals. Instead, we treat the APS datasets as a unified virtual journal, where all references and citations occur within this virtual journal. Consequently, the simulation models only one journal with 12 issues per year.

2.2. Growth of Number of Papers and Number of Authors

The APS datasets comprise approximately 0.7 million papers and 0.5 million authors at the end of 2021. Figure 1a illustrates the annual growth of accumulated papers and authors. Utilizing the exponential growth model

P_{t} = α e x p (β t)

on the cumulative paper number from Figure 1a, the estimated annual growth rate

β

is determined to be 6.36%. In this model, the initial year’s 12 issues each contain

N_{1} = 10

papers, and with each subsequent year (

t

increasing),

N_{t}

increases by one paper. Consequently, by the end of the 13th year, each issue contains

N_{13} = 22

papers. These issue arrangements correspond to an annual paper growth rate of 6.68%, aligning closely with empirical results. Therefore, a total of

P = 2496

papers will be modeled in the subsequent simulation.

Examining Figure 1a, it becomes evident that the cumulative author number also follows an exponential increase over time. By plotting the cumulative author number against the cumulative paper number and performing linear fitting

y = k x

for the data (Figure 1b), it is observed that, on average, with each new paper increment, approximately

k = 0.679

new authors are added to the existing author list. Since each paper may involve multiple authors (e.g.,

m

authors), each author will be assessed independently whether he/she is a newcomer (with probability

p

) or an incumbent (with probability

1 - p

). As the number of newcomers within a single paper follows a binomial distribution, the expectation of newcomers for each paper is given by

k = m p

(1)

where

m

is the average team size (

m = 3.54

for APS datasets). Consequently, the probability of selecting newcomers can be calculated as

p = k / m = 0.192

.

In our model, we assume a fixed probability for the introduction of new authors in the coauthorship network. This assumption is grounded in both theoretical and empirical considerations. Theoretically, our approach aligns with the Simon–Yule model [57,58], a well-established framework for understanding the growth and distribution dynamics in networks.

The Simon–Yule model is based on two key assumptions:

The probability that the $(k + 1)$ − st word is a word that has already appeared exactly $i$ times is proportional to $i f (i, k)$ —that is, to the total numbers of occurrences of all the words that have appeared exactly $i$ times.
There is a constant probability, $α$ , that the $(k + 1)$ − st word be a new word—a word that has not occurred in the first $k$ words.

In the context of coauthorship networks, the assumptions of the Simon–Yule model translate to a fixed probability for the introduction of new authors (analogous to new words) and a higher likelihood of selecting authors with more collaborations (analogous to words with more occurrences) for new collaborations. This theoretical foundation supports the use of a fixed entry probability in our model, providing a robust basis even if the entry rate was variable [58,59]. This connection enhances the robustness of our model by situating it within a recognized theoretical framework.

Empirical data, as shown in Figure 1b, support this assumption by demonstrating a strong linear relationship between the accumulated number of papers and the accumulated number of authors, with an

R^{2}

value of approximately 0.998. This high

R^{2}

value indicates that our model captures the essential behavior observed in real-world scenarios, justifying the use of a fixed probability for a new author introduction.

While this assumption does not capture every nuance, such as variations across disciplines or over time, it provides a robust and tractable foundation for our analysis. By maintaining simplicity, our model remains computationally efficient and scalable, allowing for extensive simulations that yield valuable insights into the general behavior of scientific networks.

We acknowledge that this simplification introduces certain limitations. For instance, the model may not fully reflect the dynamics of an author introduction in fields where the rate of new author entry varies significantly. Future research could explore dynamic probabilities that adjust based on factors like discipline, time period, or network size, providing a more nuanced understanding of an author introduction. Nonetheless, when comparing our approach with other models, we find that while some may offer more complexity, the simplicity and empirical support of our method allow for efficient and reliable simulations that remain aligned with observed empirical patterns.

2.3. Paper Team Assembly

A paper team refers to a collective of researchers who coauthor a paper, and their names collectively appear in the authors’ field of a paper. Recent studies indicate that the average size of paper teams increases over time [28] and the distribution of paper team sizes exhibits a fat-tailed pattern [60]. Additionally, research by Bornmann [61] shows that larger coauthor teams tend to attract more citations, further emphasizing the impact of collaboration on scientific output. This trend is similarly observed in the APS dataset, as indicated by the blue circles in Figure 2. As previously mentioned in Section 2.1, one simulation year corresponds to roughly 10 actual years of APS metadata. Consequently, the data on paper team sizes in the APS datasets are divided into 13 intervals based on their publication date. The team size distribution of the

i

-th interval (

i = 1,2, \dots, 13

) is utilized to generate the distribution for the corresponding simulation year. The simulated results are represented by the red squares in Figure 2.

When observing Figure 2a, it is evident that at any time, the average paper team size in the simulated results closely aligns with the empirical data, displaying identical distributions. The black dots represent the annual fluctuations in average paper team sizes in the APS empirical data. However, in Figure 2b, a subtle distinction is noted in the distribution of the simulated results compared to the empirical one, indicating a higher occurrence of papers with smaller team sizes in the simulation. This discrepancy arises from the fact that the paper growth rate for each interval is

β_{10} = 10 β = 63.6 %

, whereas the growth rate for each simulation year is only 6.68%. Consequently, the distribution of empirical data is influenced more by the later intervals, resulting in higher proportions of papers with larger team sizes.

As mentioned in Section 2.2, the probability of selecting incumbents as authors is

1 - p

. If an incumbent is to be selected, the preferential attachment mechanism will be employed to determine which incumbent will be chosen. As demonstrated in Ref. [24], the probability

Π (k)

of a newcomer collaborating with an incumbent with connectivity

k

can be expressed as

Π (k) \propto k^{ν}

, where

ν \leq 1

. Meanwhile, the probability

Π (k_{1}, k_{2})

of an incumbent with connectivity

k_{1}

collaborating with another incumbent with connectivity

k_{2}

can be factorizes into the product

k_{1} k_{2}

, expressed as

Π (k_{1}, k_{2}) \propto k_{1} k_{2}

. Therefore, in this simulation, the probability

Π (k)

to select an incumbent with connectivity

k

is set as:

Π (k) = (1 - p) k / \sum_{i \propto A_{t}} k_{i}

(2)

where

p

represents the probability of selecting newcomers,

k_{i}

signifies the connectivity of each incumbent, and

A_{t}

denotes the list of incumbents at time

t

. Given that incumbents often engage in repeated collaborations, characterized by the parameter

q

in Ref. [29], the connectivity

k_{i}

in this simulation refers to the accumulated number of collaborations rather than the number of collaborators an author has. For authors with no collaborations, an initial connectivity

k_{0} = 1

is assigned to ensure each author has a finite initial probability of being selected for the first time. Utilizing an adjacency matrix to record collaboration numbers for each pair of authors, the coauthorship network can be established, as further discussed in Section 2.5. In the event a new author is selected with a probability of

p

, he/she will be added to the incumbent list

A_{t}

, and the adjacent matrix will be updated accordingly.

2.4. Author Ability and Paper Quality

Recent research suggests that each scientist may possess a hidden intrinsic parameter, denoted as

Q

, which characterizes their ability to transform a random idea into works with varying impacts [55]. An author with a high

Q

-factor consistently experiences a successful career, regardless of the novelty of the projects or ideas they engage with [16]. The

Q

-factor has been demonstrated to be relatively independent of author productivity [55]. Consequently, when a new author publishes their first paper, a random

Q

-factor is assigned. In this simulation, a log-normal distribution with

μ = 0.93

and

σ = 0.46

is assumed for the

Q

-factor, aligning with the data in Ref. [55], which is also based on the APS datasets. The distribution of authors’ abilities (

Q

-factor) is depicted in Figure 3a. As author ability is a continuous parameter, it is divided into 40 bins, and the binned results are illustrated as red squares in Figure 3a.

Once a paper team

a_{i}

is assembled, and the ability of each member

Q_{j}

(

j \in a_{i}

) is determined, the quality of the paper can be computed as

η_{i} = δ (\max_{j \in a_{i}} Q_{j})

, where

δ

represents a multiplicative noise term uniformly distributed in

[1 - δ^{*}, 1 + δ^{*}]

, introducing additional randomness to the paper creation process [6]. The distribution of the papers’ quality is visualized in Figure 3b, where a log-normal fitting is applied to the results, represented by the blue line in Figure 3b.

2.5. Coauthorship Network

Upon assembling a paper team, all its members essentially form a complete graph, prompting the update of the adjacent matrix of collaborations by incrementing each corresponding element by one. In this matrix

A

, each element

A_{i, j}

represents the number of collaborations between author

i

and author

j

. Simultaneously, the coauthorship network, essentially the collaborators’ network, can be constructed by replacing the elements

A_{i, j}

of the collaborations’ network with 0 (for zero elements) or 1 (for non-zero elements). Additionally, the incumbents’ list

A_{t}

in Section 2.3 not only records the name or ID of an incumbent but also tracks the authored paper number (or productivity) of each author. Papers are incrementally added at each time step, and both the incumbents’ list and coauthorship network evolve accordingly.

Once all

P = 2496

papers are incorporated, the final distributions of productivity and collaborators are depicted in Figure 4. The productivity distribution essentially mirrors Lotka’s law, as evident in Figure 4a, where the simulated results closely align with empirical data. The distribution of collaborators in the simulated results also exhibits a strong match with empirical data, which is illustrated in Figure 4b, thereby validating the coauthorship network model. Both distributions clearly display fat tails, and further discussions about the network of collaborators can be explored in Refs. [23,24,25,26,27].

2.6. Reference Model

As previously mentioned in Section 2.1, the total reference number precisely matches the total citation number at any given time in both the empirical dataset and the simulation. Consequently, since the average citation number gradually increases over time, the average reference number also experiences an upward trend, as depicted by the blue circles in Figure 5. This observation aligns with studies like Ahlgren [62], which show a clear relationship between the number of references a paper includes and its likelihood of being cited. Similar to the approach in Section 2.3, the reference number data for all papers are sorted based on their publication date and evenly divided into 13 intervals. The reference number distribution for the

i

-th interval is then utilized to generate the reference number distribution for the

i

-th simulation year (

i = 1, 2, \dots, 13

). The simulation results are represented by the red squares in Figure 5. As illustrated in Figure 5a, the yearly average reference numbers in the simulated results closely align with empirical data, sharing identical distributions. However, Figure 5b indicates a subtle difference in the reference number distribution of all simulation data compared to the empirical one, with more papers exhibiting lower reference numbers. This discrepancy arises from the much higher paper growth rate for each interval (10 years) than that of each simulation year, leading the distribution of the empirical data to be influenced more by later intervals and consequently having more papers with higher reference numbers.

2.7. Citation Network

Once the reference number for each paper is determined, the citation model can be established by determining which papers cite others. The citation network model utilized in this simulation is founded on the minimal citation model initially proposed by Wang et al. [15]. In this model, the probability that paper

i

is cited at time

t

after publication is determined by three independent factors: preferential attachment, fitness, and aging. The equation can be expressed as follows:

Π_{i} (t) = η_{i} c_{i}^{t} P_{i} (t)

(3)

where

η_{i}

represents the paper’s fitness term, analogous to the paper’s quality discussed earlier in Section 2.4, which captures the community’s response to the work.

c_{i}^{t}

is the preferential attachment term, indicating that the paper’s probability of being cited is proportional to the total number of citations it has received previously. It is noteworthy that the preferential attachment term

c_{i}^{t}

does not precisely equal the number of citations

n_{c i t e s} (t)

. This is because we assign an initial attractiveness

c_{0} = 1

to a new paper with zero citations, ensuring each new paper has a finite initial probability of being cited for the first time [16]. Finally, the long-term decay in a paper’s citation can be well approximated by a negative exponential aging term, expressed as

P_{i} (t) = e x p [- (t - τ_{i}) / θ]

, where

τ_{i}

is the publication date of the paper

i

, and

θ

is a parameter characterizing the lifetime of a paper [6]. In this paper, the value of

θ

is set to 48 months, which is consistent with the value employed by Refs. [6,13], as their analyses are based on the same APS datasets.

The conclusive distribution of the citation network is depicted in Figure 6a. Notably, it exhibits a fat-tailed pattern and aligns remarkably well with empirical data, thereby validating the citation network model.

2.8. Journal Impact Factor

The impact factor undergoes yearly fluctuations. When the counts of citations and papers are tallied from a given citation network, the yearly impact factor of a journal can be computed as follows:

I F (k) = \frac{n_{c i t e s} (k, k - 1) + n_{c i t e s} (k, k - 2)}{n_{p a p e r s} (k - 1) + n_{p a p e r s} (k - 2)}

(4)

where

I F (k)

denotes the impact factor of the

k

th year;

n_{p a p e r s} (k - 1)

denotes the number of papers published in the

(k - 1)

th year;

n_{c i t e s} (k, k - 1)

denotes the number of those citations received during the

k

th year by the papers published in the

(k - 1)

th year.

The fluctuation in the journal impact factor is illustrated in Figure 6b, where it can be observed that the simulated variations in the journal impact factor closely align with the empirical results of the APS dataset, thereby validating the citation network model.

2.9. h-Index

The

h

-index of an author is

h

if

h

of his papers have at least

h

citations and each of the remaining papers has fewer than

h

citations. To determine

h

, an author’s publications are sorted based on their citations, arranged from the most cited to the least cited. This results in a sorted paper list denoted as

Π = \{α_{1}, \dots, α_{i}, \dots, α_{n}\}

, where

c_{α_{i}} \geq c_{α_{i + 1}}

,

i \in [1, n - 1]

. The

h

-index is then identified as the last position in which

c_{α_{i}}

is greater than or equal to the position

i

.

h = \max_{i} \{\min_{α_{i} \in Π} [c_{α_{i}}, i]\}

(5)

The distributions and temporal variations of the

h

-index in both simulated and empirical results are illustrated in Figure 7. In Figure 7a, it is evident that the

h

-index distributions for both simulated and empirical data exhibit fat-tailed characteristics and closely align with each other. These distributions, as depicted in Figure 7a, also concur with the findings of Ref. [48], thereby validating the

h

-index outcomes from this simulation. Figure 7b presents the temporal dynamic growth of the top 3 researchers with the highest

h

-index. Notably, the general growth patterns in both simulated and empirical results are predominantly linear, consistent with the predictions in Ref. [47], adding credibility to the simulation results.

2.10. Rationale Behind Parameter Choices and Model Architecture

1.: Model Architecture and Parameterization:

The architecture of our model is designed to capture the coevolution of coauthorship and citation networks by evolving both structures concurrently over time. For each new paper, a team of authors is added to the network, with team sizes (

m

) empirically determined based on trends observed in real-world datasets such as the APS dataset. Studies, including those by Guimera et al. [29], show a clear trend of increasing team sizes over time, and our model reflects this by incorporating these empirical patterns, ensuring that the coauthorship network mirrors the actual growth in collaborative team sizes.

The probability of introducing new authors (

p

) follows a fixed rate inspired by the Simon–Yule distribution [57,58,59]. This distribution captures two essential dynamics in academic networks: preferential attachment, where established authors are more likely to attract new collaborators, and the steady introduction of newcomers. This balance mirrors the real-world dynamics in scientific collaboration networks, where established researchers maintain dominance while new researchers continue to enter and integrate into the network.

For citation dynamics, we employ Wang et al.’s minimal citation model [15,16], which is widely regarded for accurately simulating citation behavior. This model integrates three key components: preferential attachment (papers with more citations are more likely to be cited), a fitness parameter (which represents the intrinsic quality of the paper), and an aging term (

θ

) that accounts for the decline in a paper’s citation rate over time. The aging term is critical for capturing the temporal decay of relevance in scientific papers. Additionally, we base the reference count (

N

) on real-world APS data, ensuring that the citation model is grounded in observed patterns of citation behavior.

2.: Impact of Parameters:

Team Size (

m

): Larger teams lead to denser coauthorship networks and contribute to higher

h

-index values for established authors, as more coauthors and papers result in a greater citation footprint. This aligns with empirical findings that show a positive correlation between team size and the impact of papers, particularly in terms of citation counts and overall scientific influence.

Newcomer Probability (

p

): The introduction of new authors at a fixed rate promotes network growth and diversity by expanding the pool of collaborators. While this increases collaboration diversity, it may also result in a lower average

h

-index as new authors typically accumulate citations more slowly. However, the inclusion of newcomers is essential for reflecting the dynamism and evolution of academic networks.

Paper Lifetime (

θ

): The

θ

parameter governs how long a paper remains relevant and continues to receive citations. Longer lifetimes allow older papers to continue accumulating citations, which can inflate citation-based metrics like the

h

-index and journal impact factor. This parameter models the lasting influence of highly cited papers, consistent with observations that influential research remains part of the academic discourse over extended periods.

Reference Count (

N

): A higher reference count increases the likelihood that a paper will be cited, reflecting the well-documented correlation between the number of references a paper includes and its subsequent citation impact. This parameter creates a feedback loop that enhances citation metrics and mirrors real-world behaviors where papers with more references tend to attract more attention and citations.

2.11. Comparison with State-of-the-Art (SOTA) Baseline Models

Several state-of-the-art models, including Börner’s TARL model, Medo’s model, and Ionescu’s model, have made significant contributions to bibliometric and network analysis. However, they share a critical limitation: they assume a fixed distribution for author productivity and do not adequately model the dynamic evolution of the coauthorship network, which is essential for capturing real-world collaboration patterns.

Börner’s TARL Model: Börner et al.’s TARL (Topics, Aging, and Recursive Linking) model [38] integrates the Matthew effect and models citation behavior effectively. However, it assumes that authors produce a fixed number of papers annually, which oversimplifies the complex dynamics of scientific collaboration and does not account for variability in productivity across different authors or over time. This fixed approach limits the model’s ability to fully simulate the coevolution of citation and coauthorship networks, particularly in dynamic and evolving academic environments.

Medo’s Model: Medo’s model [6] simulates scientific impact based on a predefined productivity distribution sourced from the Microsoft Academic Search (MAS) dataset. While convenient, this model does not account for the evolving interactions within coauthorship networks, overlooking how collaborative relationships affect an author’s output. This static perspective on productivity fails to reflect the real-world variability driven by changes in team sizes, author roles, and collaboration frequency.

Ionescu’s Model: Ionescu et al.’s model [48] assumes that author productivity follows Lotka’s law, which provides a static view of productivity. Like the other models, it does not incorporate the dynamic nature of coauthorship networks, thus simplifying the complexity of how collaborations evolve and influence scientific output over time. This limitation leads to less accurate predictions of how coauthorship impacts scientific metrics like the h-index or citation count.

Our Model: In contrast, our model dynamically integrates the coauthorship network with productivity modeling, allowing the network’s structure and evolution to directly influence each author’s output. By doing so, we account for the evolving nature of collaborations, which in turn impacts key metrics such as the h-index and journal impact factor. This dynamic interaction between coauthorship and citation networks provides a more realistic representation of how scientific collaboration drives productivity and scientific impact. Our model simulates these interactions over time, offering a comprehensive framework for analyzing the coevolution of citation and coauthorship networks. This approach not only enhances the predictive power of the model but also captures the broader, real-world variability in academic productivity.

Our model addresses the limitations of the existing models by introducing a more flexible and accurate method for modeling the behavior of scientific networks. In doing so, it provides a more robust tool for both analyzing and forecasting scientific impact, ensuring that both the evolving nature of coauthorship networks and the dynamic productivity of authors are reflected in the results.

2.12. Generalizability across Disciplines

While this study primarily focuses on datasets from the physical sciences, the fundamental principles of our model—such as the introduction of new authors and the formation of collaboration networks—are applicable across a wide range of scientific disciplines [63]. To test the generalizability of our model, we considered its application to fields such as computer science, mathematics, and botany, each of which has distinct patterns of collaboration and citation.

In the field of computer science, especially in areas like artificial intelligence and software engineering, collaborations often occur in large teams. Studies by Newman [23,25,26] and Radicchi [18] demonstrate that despite larger team sizes and the prevalence of conference papers, the basic assumptions of our model regarding team size and author introduction remain applicable. While minor adjustments to the model’s parameters may be required due to the nature of conference papers, the overall structure is preserved.

In mathematics, collaboration typically occurs in smaller teams, and the publication rate is slower than in the physical sciences. Despite these differences, research by Golosovsky [64] and Newman [23,25,26] indicates that the collaborative network dynamics in mathematics still follow similar patterns, albeit with a lower frequency of new author introduction and slower citation accumulation. This supports the applicability of our model’s core dynamics in the mathematical sciences.

Botany, as a life science, presents different challenges due to its varied research topics and slower citation dynamics. However, studies by Liu [65] and Waltman [66] suggest that the fundamental structures of coauthorship networks and citation distributions are consistent across various scientific disciplines, indicating that our model could potentially be adapted to other fields, including the life sciences, with appropriate adjustments to the parameters.

Preliminary analyses suggest that our model’s predictions align well with observed patterns in these fields, although some adjustments may be necessary to accommodate field-specific dynamics. This indicates that while our model has broad applicability, further research is required to fine-tune parameters for different disciplines to enhance predictive accuracy.

3. Sensitivity Analysis and Results

The following sections provide an analysis of how variations in key parameters such as paper lifetime (

θ

), reference count (

N

), team size (

m

), and the probability of newcomers (

p

) impact the results of the model. This sensitivity analysis evaluates the robustness of the model’s predictions under different parameter values, with a particular focus on key metrics such as the

h

-index and journal impact factor.

3.1. Paper Lifetime θ

The paper lifetime (

θ

in Equation (3)) is a parameter that characterizes the duration of a paper. For instance, if

∆ t = t - τ_{i} = θ

, then

P_{i} (t) = 36.8 %

in all cases. A larger

θ

implies that older papers will receive more citations. It is known that

θ

varies across different disciplines; for instance, mathematics tends to have a larger

θ

compared to biology. The effects of

θ

on the journal impact factor are depicted in Figure 8. It is evident that as

θ

increases, the journal impact factor decreases monotonically. This is because a larger

θ

results in more citations being attributed to older papers, particularly those published more than 2 years ago. Since the total number of citations remains constant, fewer citations are available for papers published within the last 2 years, which forms the numerator of Equation (4). Consequently, the journal impact factor decreases accordingly, as illustrated in Figure 8.

The impact of paper lifetime

θ

on the distributions of

h

-index is illustrated in Figure 9a. It is evident that a smaller

θ

corresponds to larger proportions of researchers with low or moderate

h

-index and smaller proportions of researchers with a large

h

-index. This is attributed to the fact that a small

θ

results in more citations being allocated to recently published papers, typically authored by newcomers with lower

h

-index. In contrast, a large

θ

leads to more citations directed at older papers, often authored by established incumbents, thus contributing to a stronger Matthew effect and resulting in a higher prevalence of researchers with large

h

-index, as observed in Figure 9a. Researchers with lower or moderate

h

-index exhibit larger fractions, leading to a higher weighted average of distributions for smaller

θ

, as depicted in Figure 9b.

3.2. Reference Number

As the average number of references

N

equals the average number of citations, an increase in

N

leads to higher

c_{i}^{t}

in Equation (3) and consequently higher average citations. Given that the journal impact factor is directly influenced by the annual citations from papers published in the last 2 years, a higher

N

results in a higher journal impact factor, as depicted in Figure 10.

The

h

-index is influenced by both an author’s productivity and the citations received by each paper. While increasing the average reference number

N

has no direct impact on an author’s productivity, it does contribute to increased citations for each published paper. Consequently, authors tend to have higher

h

-index values, as illustrated in Figure 11a. The relationship between the average

h

-index of all authors and the reference number

N

is depicted in Figure 11b, where it is evident that the average

h

-index exhibits a monotonic increase with the reference number

N

.

3.3. Team Size m at Fixed p

The impact of team size

m

on the journal impact factor is minimal, as only the fitness term

η_{i}

in Equation (3) is slightly influenced by

m

. Consequently, our analysis will primarily focus on how

m

affects the distributions of the

h

-index, as depicted in Figure 12. With an increase in the average team size

m

, while keeping the probability of newcomers

p

constant, more authors/researchers are generated with each published paper. Despite a larger

m

resulting in each researcher being selected more frequently as a coauthor, the likelihood of getting selected each time decreases due to the higher number of researchers. Consequently, the average number of authored papers and the average

h

-index generally remain constant. Figure 12a indicates that with more researchers, the top researcher is more likely to achieve a higher

h

-index. However, since the total number of citations remains constant, fewer citations are available for the average researcher. As a result, the distributions of small team sizes tend to be higher than those of large team sizes in low to medium

h

-index regions, as shown in Figure 12a. As the average researcher occupies more fractions, the average

h

-index decreases with team size, as demonstrated in Figure 12b.

3.4. Probability of Newcomers p

Since the distributions of the author’s ability

Q

are the same for newcomers and incumbents, the variation of

p

has no impact on the paper’s quality

η_{i}

and thus does not affect the journal impact factor. When the probability of selecting newcomers

p

increases while keeping the average team size

m

constant, more newcomers are generated with each published paper, and the probability of selecting incumbents as authors decreases. Therefore, as the probability of newcomers

p

increases, the distributions of

h

-index will gradually become dominated by fresh researchers with low

h

-index, as shown in Figure 13a. It can be noted that the distributions of small

p

tend to be higher than those of large

p

. The average

h

-index will also decrease with the increasing

p

, as demonstrated in Figure 13b.

3.5. Team Size m at Fixed k

In Section 3.3, we analyzed the impact of increasing team size

m

while keeping the probability of selecting newcomers

p

constant on the

h

-index. In this section, we will examine the effect of increasing team size

m

while maintaining the number of new authors generated per each new paper

k

constant. Since larger

m

implies more frequent selections for each author, to uphold a constant

k

, the probability of selecting newcomers each time

p

should be reduced accordingly. According to Equation (1), the probabilities of selecting newcomers are

p = [0.767, 0.384, 0.192, 0.096, 0.048]

respectively when the team size is

m = [1.1, 1.6, 2.6, 5.2, 10.1]

. This case study simulates the scenario where incumbents intentionally enlarge their team size without the additional influx of newcomers. The impact of increasing

m

while keeping

k

constant on the distributions of

h

-index is shown in Figure 14a. It can be noted that the numbers of authors with medium to high

h

-index increase significantly with the increasing team size

m

. This is because increasing the team size

m

while keeping the new authors per paper

k

constant can inflate the productivity of authors, especially those with more collaborations, and thus inflate their

h

-index. Consequently, the average

h

-index increases significantly with the increasing team size

m

, as shown in Figure 14b.

4. Discussion

4.1. Interpretation of Findings in Relation to Previous Literature

The findings of this study align with previous research on the coevolution of coauthorship and citation networks while also extending existing models by highlighting the dynamic interactions between team size, newcomer probability, and paper lifetime. Foundational theories such as the Matthew effect [10] and preferential attachment [24] remain central to understanding patterns of success in academic networks. However, unlike many static models, our approach incorporates time-varying factors to simulate the coevolution of these networks in a more dynamic and realistic manner.

For instance, our results on the influence of team size and newcomer probability on

h

-index distribution are consistent with prior research [25,26,35], which emphasizes the social structure of coauthorship networks. The observed Matthew effect, where established researchers attract more collaborators and citations [39], further supports findings that established figures tend to reinforce their dominance within academic networks. However, our model advances this by integrating dynamic interactions between authorship and citation patterns, offering a more comprehensive and flexible framework for analyzing scientific impact.

Additionally, previous studies [32,34] have highlighted the importance of interdisciplinary collaborations in broadening scientific influence, a point reinforced by our findings. Cross-disciplinary research not only fosters larger citation and coauthorship networks but also enhances the diversity and reach of scientific contributions—an aspect often overlooked by traditional metrics.

This study also contributes to ongoing discussions about the manipulation of impact metrics [52]. We demonstrate how parameters such as reference count and team size can be adjusted to artificially inflate metrics like the journal impact factor and

h

-index, raising ethical concerns about the misuse of bibliometric indicators within the academic community.

4.2. Actionable Insights for Researchers, Publishers, and Policymakers

The findings of this study provide practical recommendations for researchers, publishers, and policymakers to improve scientific communication and evaluation processes.

1.: For Researchers:

Our results demonstrate that manipulating factors such as increasing reference counts or reducing paper lifetimes can artificially inflate metrics like the journal impact factor and

h

-index. Such practices distort the true value of scientific contributions, as highlighted by previous studies [49]. To maintain the integrity of research, scholars should avoid engaging in strategies aimed purely at boosting metrics. Instead, the focus should be on producing high-quality, innovative research that naturally attracts citations over time. As noted by earlier work [19,20], practices such as self-citation or citation cartels undermine the reliability of academic evaluations and damage the credibility of the research community. Encouraging interdisciplinary collaborations can lead to broader and more diverse impacts [32,50], promoting long-term citation success rather than short-term gains.

2.: For Publishers:

Publishers play a critical role in curbing unethical practices related to bibliometric indicators. The overemphasis on journal impact factor often encourages manipulative behaviors such as citation manipulation. This issue is exacerbated by Goodhart’s Law [52], where metrics become targets to be optimized, losing their original purpose as accurate reflections of research quality. To address this, publishers should consider incorporating alternative metrics like the SCImago Journal Rank (SJR) or Eigenfactor [49], which focus not just on citation quantity but also on the quality and broader societal engagement of research. These metrics provide a more balanced and transparent assessment, helping to ensure fairness in the peer-review process. Moreover, journals should actively discourage excessive self-citations and citation cartels to uphold ethical standards in scholarly publishing [53].

3.: For Policymakers:

Policymakers have a crucial role in shaping the standards by which academic work is evaluated. This study suggests that relying solely on quantitative metrics like the

h

-index or journal impact factor can lead to the unethical manipulation of these indicators. Policymakers should advocate for evaluation systems that combine traditional metrics with qualitative assessments [54], including peer reviews and the societal impact of research. By encouraging policies that promote cross-disciplinary collaboration [34], policymakers can foster more robust and impactful scientific networks. This approach not only enhances the quality of scientific contributions but also ensures that researchers are recognized for their genuine achievements rather than for gaming the system. Developing such guidelines will help shift the focus away from metric-driven evaluations, promoting a more ethical and comprehensive approach to assessing research.

4.3. Ethical Considerations in the Use of Scientific Impact Indicators

A key issue raised in this study is the potential misuse of scientific impact indicators, such as the journal impact factor and

h

-index. These metrics, when used improperly, can become targets for optimization rather than accurate measures of scholarly impact.

1.: Manipulation of Metrics:

Our findings show that metrics like the journal impact factor can be artificially inflated by increasing reference numbers or manipulating team size. This aligns with the observations of previous studies [52], which warn that when metrics become the goal, they lose their effectiveness, as articulated by Goodhart’s Law. Practices such as self-citation, citation cartels, and coercive citation [19,20] are among the most common methods of manipulating these metrics, undermining their credibility. These actions distort the scientific record and place undue pressure on researchers, especially early-career scientists, to engage in unethical behavior in order to advance their careers, ultimately harming the academic ecosystem as a whole.

2.: The Role of Institutions and Publishers:

Institutions and publishers play a vital role in addressing unethical practices related to scientific impact metrics. They must proactively monitor unusual citation patterns and discourage behaviors such as coercive citations and excessive self-citation [20]. Journals should implement automated tools to detect suspicious citation patterns and maintain transparency and fairness in the peer review process [37]. Additionally, establishing clear ethical guidelines, as proposed by various scholars [49], can help mitigate these negative practices. Combining traditional quantitative metrics with qualitative assessments, such as peer reviews and societal impact, can create a more balanced evaluation system, ensuring that research is assessed fairly and comprehensively.

3.: Promoting Ethical Research Practices:

To safeguard the integrity of academic research, it is essential to raise awareness about the consequences of metric manipulation. Institutions and journals should provide training for researchers on ethical citation practices, emphasizing the importance of integrity in academic publishing. By fostering a culture of ethical research and establishing clear guidelines, the academic community can prevent the distortion of impact indicators and ensure that these metrics reflect genuine scientific contributions [35]. Promoting transparency and fairness in research assessments will help restore trust in the evaluation of scholarly work and uphold the reputation of the academic community.

4.4. Limitations of the Study

Despite its contributions, this study has several limitations that may affect the generalizability and applicability of its findings.

1.: Disciplinary Specificity:

This study is based on data from the American Physical Society (APS) and focuses primarily on physics research. The coauthorship and citation network dynamics observed in physics may differ significantly from those in other fields such as biology, social sciences, or computer science. For example, disciplines like the social and life sciences often exhibit different collaboration patterns and citation behaviors [50], whereas interdisciplinary research in fields such as medicine or environmental science typically involves larger teams and faster citation accumulation [32]. In contrast, theoretical fields like mathematics may experience slower publication and citation growth. As a result, the conclusions drawn from this study may not directly apply to other disciplines without further validation and adjustment of model parameters.

2.: Basic Citation Impact Indicators:

Another limitation is the study’s reliance on traditional citation impact metrics like the journal impact factor and

h

-index. While these metrics are widely used, they have been criticized for oversimplifying the complexity of academic impact. Specifically, they fail to account for field-specific citation patterns or the evolution of citations over time. As suggested in previous research [54], incorporating more advanced metrics, such as the field-weighted citation impact (FWCI), could provide a more nuanced understanding of scientific impact across different disciplines. Future studies should consider integrating these normalized metrics to allow for a broader, more accurate evaluation of scientific contributions.

3.: Ethical Concerns and Metric Manipulation:

While this study highlights how different parameters can influence citation metrics, it also raises concerns about the potential for misuse. When metrics like the journal impact factor and

h

-index become optimization targets, they can be manipulated through practices like self-citation, citation cartels, or team size adjustments. Although these ethical issues are discussed in the study, it does not offer concrete solutions for preventing such manipulation. Future research should explore models that can detect and mitigate metric manipulation, building on existing work related to network coevolution [41]. By developing methods to safeguard the integrity of bibliometric indicators, future studies can ensure that these metrics remain reliable measures of academic impact.

4.5. Future Research Directions

This study opens up several avenues for future research:

1.: Inclusion of Field- and Time-Normalized Indicators:

A key next step is to incorporate normalized citation metrics, such as the Field-Weighted Citation Impact (FWCI) or citation percentiles, into the current model. These indicators account for field-specific citation behaviors and the temporal evolution of citations, providing a more balanced assessment of scientific impact [54]. By integrating these metrics, future research can make the model more applicable across a broader range of disciplines, addressing some of the limitations highlighted in this study.

2.: Cross-Disciplinary Validation:

To improve the generalizability of the model, future research should apply it to datasets from other fields, such as life sciences, engineering, or social sciences [32]. This would help determine whether the observed trends and dynamics are consistent across different disciplines with varying collaboration and publication practices. Adjusting model parameters, such as team size, collaboration frequency, and citation decay rates, to reflect the unique characteristics of each field could enhance the model’s adaptability and relevance [37].

3.: Development of Ethical Evaluation Metrics:

Given the potential for manipulating traditional impact metrics, there is a need for ethical evaluation frameworks that combine both quantitative and qualitative assessments [19]. Future research should explore the development of algorithms to detect unethical practices, such as excessive self-citation or citation cartels. These algorithms could flag suspicious citation patterns, while alternative metrics, like the Eigenfactor or SCImago Journal Rank, could prioritize citation quality over quantity. This would ensure a more accurate and fair assessment of research impact.

4.: Integration with Emerging Metrics:

As new forms of scholarly communication emerge, such as altmetrics, future studies should explore how these metrics can be integrated into the model. Altmetrics track non-traditional forms of impact, such as mentions in policy documents or engagement on social media, offering a more comprehensive view of a researcher’s societal influence. By incorporating altmetrics, future research could provide valuable insights into the broader societal relevance of scientific contributions, complementing traditional citation-based metrics [34,49].

By addressing these areas, future research can build on the foundation laid by this study, advancing toward a more comprehensive, ethical, and cross-disciplinary framework for assessing scientific impact.

5. Conclusions

In this paper, a mathematical model for the team assembly and citation process is established, and the coevolution of coauthorship and citation network is simulated. Scientific impact indicators, such as the journal impact factor and

h

-index, are calculated and validated against the empirical data from the APS datasets. Parametric studies are conducted to analyze the impact of different parameters, such as the paper lifetime

θ

, reference number

N

, team size

m

, and the probability of selecting newcomers

p

, on the journal impact factor and

h

-index. The following can be concluded from this research:

By using a few simple and reasonable assumptions, the mathematical models can effectively replicate most empirical data characteristics, including temporal dynamics and distributions of $h$ -index, thus indicating that modeling and simulation methods are reliable tools for exploring how different parameters affect scientific impact indicators.
Increasing the reference number $N$ or decreasing the paper lifetime $θ$ significantly boosted both the journal impact factor and average $h$ -index. Additionally, enlarging team size $m$ without adding new authors or reducing the probability of selecting newcomers notably increases the average $h$ -index. This implies that scientific impact indicators may have inherent weaknesses or can be manipulated by authors, making them unreliable for assessing the true quality of a paper.
The presented mathematical models can be easily extended to include other scientific impact indicators and scenarios. This versatility positions modeling and simulation methods as powerful tools for studying the impact of various parameters on scientific impact indicators, aiding in the development of improved indicators. Furthermore, these methods can serve as robust tools for validating underlying mechanisms or predicting different scenarios based on joint coauthorship and citation networks.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/info15100597/s1, file: Data and Code.

Funding

This research was supported by two projects: the 2024 Shenzhen Library and Information Science Research Project, “Co-evolution of Coauthorship-Citation Networks in the Context of Digital Humanities Tool Development: A Case Study of Parametric Analysis of Scientific Impact Indicators” (Project No. SWTQ2024230), and the 2023 Guangdong Provincial Library Key Research Project, “Joint Analysis and Data Governance of Papers and Patents in the Context of Smart Libraries” (Project No. GDTK23004).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All relevant data and code used in this study are provided in the Supplementary Materials, organized under the “Data and Code” folder. The code, developed in MATLAB, is available to support the reproducibility of the analysis. Due to the complexity and organization of the code, we recommend that researchers utilize tools such as AI-assisted analyzers (e.g., ChatGPT) to facilitate the interpretation of the code. We encourage other researchers to replicate the findings, and for any further inquiries or clarification, the corresponding author can be contacted directly.

Conflicts of Interest

The author declares no conflict of interest.

References

Su, F.; Li, S.; Liu, Q.; Zhang, Y. A bibliometric study of digital humanities research in China from 2012 to 2021. Inf. Res. Int. Electron. J. 2023, 28, 62–82. [Google Scholar] [CrossRef]
Geraldo, G.; Bisset-Alvarez, E.; Pinto, M.D.d.S. Digital Humanities and the Sustainable Development Goals: A reflection for Information Science. Transinformação 2023, 35, e227210. [Google Scholar] [CrossRef]
Osinska, V.; Klimas, R. Mapping science: Tools for bibliometric and altmetric studies. Inf. Res. 2021, 26, 909. [Google Scholar] [CrossRef]
Yan, C.; Li, H.; Pu, R.; Deeprasert, J.; Jotikasthira, N. Knowledge mapping of research data in China: A bibliometric study using visual analysis. Libr. Hi Tech. 2024, 42, 331–349. [Google Scholar] [CrossRef]
Dong, J.Q. Using simulation in information systems research. J. Assoc. Inf. Syst. 2022, 23, 408–417. [Google Scholar] [CrossRef]
Medo, M.; Cimini, G. Model-based evaluation of scientific impact indicators. Phys. Rev. E 2016, 94, 032312. [Google Scholar] [CrossRef] [PubMed]
Bai, X.; Zhang, F.; Li, J.; Xu, Z.; Patoli, Z.; Lee, I. Quantifying scientific collaboration impact by exploiting collaboration-citation network. Scientometrics 2021, 126, 7993–8008. [Google Scholar] [CrossRef]
Scharnhorst, A.; Börner, K.; Van den Besselaar, P. Models of Science Dynamics: Encounters between Complexity Theory and Information Sciences; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Zhang, H.; Guo, J.; Guo, F.; Zhang, W. Knowledge Networks, Collaboration Networks, and Local Search Behaviors. Group Organ. Manag. 2023, 10596011231203364. [Google Scholar] [CrossRef]
Price, D.d.S. A general theory of bibliometric and other cumulative advantage processes. J. Am. Soc. Inf. Sci. 1976, 27, 292–306. [Google Scholar] [CrossRef]
Barabási, A.-L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed]
Bianconi, G.; Barabási, A.-L. Competition and multiscaling in evolving networks. Europhys. Lett. 2001, 54, 436. [Google Scholar] [CrossRef]
Medo, M.; Cimini, G.; Gualdi, S. Temporal effects in the growth of networks. Phys. Rev. Lett. 2011, 107, 238701. [Google Scholar] [CrossRef] [PubMed]
Eom, Y.-H.; Fortunato, S. Characterizing and modeling citation dynamics. PLoS ONE 2011, 6, e24926. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Song, C.; Barabási, A.-L. Quantifying long-term scientific impact. Science 2013, 342, 127–132. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Barabási, A.-L. The Science of Science; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
Song, D.; Wang, W.; Fan, Y.; Xing, Y.; Zeng, A. Quantifying the structural and temporal characteristics of negative links in signed citation networks. Inf. Process. Manag. 2022, 59, 102996. [Google Scholar] [CrossRef]
Radicchi, F.; Fortunato, S.; Castellano, C. Universality of citation distributions: Toward an objective measure of scientific impact. Proc. Natl. Acad. Sci. USA 2008, 105, 17268–17272. [Google Scholar] [CrossRef]
Brunton, F. Making people and influencing friends: Citation networks and the appearance of significance. In Gaming the Metrics: Misconduct and Manipulation in Academic Research; The MIT Press: Cambridge, MA, USA, 2020; p. 243. [Google Scholar]
Herteliu, C.; Ausloos, M.; Ileanu, B.V.; Rotundo, G.; Andrei, T. Quantitative and qualitative analysis of editor behavior through potentially coercive citations. Publications 2017, 5, 15. [Google Scholar] [CrossRef]
Yousefi Nooraie, R.; Sale, J.E.; Marin, A.; Ross, L.E. Social network analysis: An example of fusion between quantitative and qualitative methods. J. Mix. Methods Res. 2020, 14, 110–124. [Google Scholar] [CrossRef]
Bologna, F.; Di Iorio, A.; Peroni, S.; Poggi, F. Do open citations give insights on the qualitative peer-review evaluation in research assessments? An analysis of the Italian National Scientific Qualification. Scientometrics 2023, 128, 19–53. [Google Scholar] [CrossRef]
Newman, M.E. Coauthorship networks and patterns of scientific collaboration. Proc. Natl. Acad. Sci. USA 2004, 101 (Suppl. 1), 5200–5205. [Google Scholar] [CrossRef] [PubMed]
Barabâsi, A.-L.; Jeong, H.; Néda, Z.; Ravasz, E.; Schubert, A.; Vicsek, T. Evolution of the social network of scientific collaborations. Phys. A Stat. Mech. Its Appl. 2002, 311, 590–614. [Google Scholar] [CrossRef]
Newman, M.E. Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 2001, 64, 016131. [Google Scholar] [CrossRef] [PubMed]
Newman, M.E. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA 2001, 98, 404–409. [Google Scholar] [CrossRef] [PubMed]
Tomassini, M.; Luthi, L. Empirical analysis of the evolution of a scientific collaboration network. Phys. A Stat. Mech. Its Appl. 2007, 385, 750–764. [Google Scholar] [CrossRef]
Wuchty, S.; Jones, B.F.; Uzzi, B. The increasing dominance of teams in production of knowledge. Science 2007, 316, 1036–1039. [Google Scholar] [CrossRef] [PubMed]
Guimera, R.; Uzzi, B.; Spiro, J.; Amaral, L.A.N. Team assembly mechanisms determine collaboration network structure and team performance. Science 2005, 308, 697–702. [Google Scholar] [CrossRef] [PubMed]
Twyman, M.; Contractor, N. Team assembly. In Strategies for Team Science Success: Handbook of Evidence-Based Principles for Cross-Disciplinary Science and Practical Lessons Learned from Health Researchers; Springer: Cham, Switzerland, 2019; pp. 217–240. [Google Scholar]
Yang, J.; Wu, L.; Lyu, L. Research on scientific knowledge evolution patterns based on ego-centered fine-granularity citation network. Inf. Process. Manag. 2024, 61, 103766. [Google Scholar] [CrossRef]
Pessoa Junior, G.J.; Dias, T.M.; Silva, T.H.; Laender, A.H. On interdisciplinary collaborations in scientific coauthorship networks: The case of the Brazilian community. Scientometrics 2020, 124, 2341–2360. [Google Scholar] [CrossRef]
Orzechowski, K.P.; Mrowinski, M.J.; Fronczak, A.; Fronczak, P. Asymmetry of social interactions and its role in link predictability: The case of coauthorship networks. J. Informetr. 2023, 17, 101405. [Google Scholar] [CrossRef]
Pelacho, M.; Ruiz, G.; Sanz, F.; Tarancón, A.; Clemente-Gallardo, J. Analysis of the evolution and collaboration networks of citizen science scientific publications. Scientometrics 2021, 126, 225–257. [Google Scholar] [CrossRef]
Chen, W.; Yan, Y. New components and combinations: The perspective of the internal collaboration networks of scientific teams. J. Informetr. 2023, 17, 101407. [Google Scholar] [CrossRef]
Hâncean, M.-G.; Perc, M.; Lerner, J. The coauthorship networks of the most productive European researchers. Scientometrics 2021, 126, 201–224. [Google Scholar] [CrossRef]
Singh, C.K.; Filho, D.V.; Jolad, S.; O’Neale, D.R. Evolution of interdependent co-authorship and citation networks. Scientometrics 2020, 125, 385–404. [Google Scholar] [CrossRef]
Börner, K.; Maru, J.T.; Goldstone, R.L. The simultaneous evolution of author and paper networks. Proc. Natl. Acad. Sci. USA 2004, 101 (Suppl. S1), 5266–5273. [Google Scholar] [CrossRef]
Xie, Z.; Xie, Z.; Li, M.; Li, J.; Yi, D. Modeling the coevolution between citations and coauthorship of scientific papers. Scientometrics 2017, 112, 483–507. [Google Scholar] [CrossRef]
Zhang, X.; Xie, Q.; Song, M. Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network. J. Informetr. 2021, 15, 101140. [Google Scholar] [CrossRef]
Liu, X.F.; Chen, H.-J.; Sun, W.-J. Adaptive topological coevolution of interdependent networks: Scientific collaboration-citation networks as an example. Phys. A Stat. Mech. Its Appl. 2021, 564, 125518. [Google Scholar] [CrossRef]
Muñoz-Muñoz, A.M.; Mirón-Valdivieso, M.D. Analysis of collaboration and co-citation networks between researchers studying violence involving women. Inf. Res. Int. Electron. J. 2017, 22, n2. [Google Scholar]
Hirsch, J.E. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA 2005, 102, 16569–16572. [Google Scholar] [CrossRef] [PubMed]
Garfield, E. Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies. Science 1972, 178, 471–479. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Cai, N.; Tan, Z.-Y.; Khan, M.J. Analysis of effects to journal impact factors based on citation networks generated via social computing. IEEE Access 2019, 7, 19775–19781. [Google Scholar] [CrossRef]
Zhou, J.; Feng, L.; Cai, N.; Yang, J. Modeling and simulation analysis of journal impact factor dynamics based on submission and citation rules. Complexity 2020, 2020, 3154619. [Google Scholar] [CrossRef]
Guns, R.; Rousseau, R. Simulating growth of the h-index. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 410–417. [Google Scholar] [CrossRef]
Ionescu, G.; Chopard, B. An agent-based model for the bibliometric h-index. Eur. Phys. J. B 2013, 86, 426. [Google Scholar] [CrossRef]
Bornmann, L.; Marx, W.; Gasparyan, A.Y.; Kitas, G.D. Diversity, value and limitations of the journal impact factor and alternative metrics. Rheumatol. Int. 2012, 32, 1861–1867. [Google Scholar] [CrossRef]
Harzing, A.-W.; Alakangas, S. Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison. Scientometrics 2016, 106, 787–804. [Google Scholar] [CrossRef]
McKiernan, E.C.; Schimanski, L.A.; Muñoz Nieves, C.; Matthias, L.; Niles, M.T.; Alperin, J.P. Use of the Journal Impact Factor in academic review, promotion, and tenure evaluations. eLife 2019, 8, e47338. [Google Scholar] [CrossRef]
Fire, M.; Guestrin, C. Over-optimization of academic publishing metrics: Observing Goodhart’s Law in action. GigaScience 2019, 8, giz053. [Google Scholar] [CrossRef] [PubMed]
Chapman, C.A.; Bicca-Marques, J.C.; Calvignac-Spencer, S.; Fan, P.; Fashing, P.J.; Gogarten, J.; Guo, S.; Hemingway, C.A.; Leendertz, F.; Li, B. Games academics play and their consequences: How authorship, h-index and journal impact factors are shaping the future of academia. Proc. R. Soc. B 2019, 286, 20192047. [Google Scholar] [CrossRef] [PubMed]
Waltman, L. A review of the literature on citation impact indicators. J. Informetr. 2016, 10, 365–391. [Google Scholar] [CrossRef]
Sinatra, R.; Wang, D.; Deville, P.; Song, C.; Barabási, A.-L. Quantifying the evolution of individual scientific impact. Science 2016, 354, aaf5239. [Google Scholar] [CrossRef] [PubMed]
American Physical Society. APS Data Sets for Research. 2023. Available online: https://journals.aps.org/datasets (accessed on 29 July 2024).
Simon, H.A. On a class of skew distribution functions. Biometrika 1955, 42, 425–440. [Google Scholar] [CrossRef]
Ijiri, Y.; Simon, H.A. Skew Distributions and the Sizes of Business Firms. R. Stat. Soc. J. Ser. A Gen. 1977, 140, 547–548. [Google Scholar] [CrossRef]
Simon, H.A.; Van Wormer, T.A. Some Monte Carlo estimates of the Yule distribution. Behav. Sci. 1963, 8, 203–210. [Google Scholar] [CrossRef]
Milojević, S. Principles of scientific research team formation and evolution. Proc. Natl. Acad. Sci. USA 2014, 111, 3984–3989. [Google Scholar] [CrossRef]
Bornmann, L. Is collaboration among scientists related to the citation impact of papers because their quality increases with collaboration? An analysis based on data from F1000Prime and normalized citation scores. J. Assoc. Inf. Sci. Technol. 2017, 68, 1036–1047. [Google Scholar] [CrossRef]
Ahlgren, P.; Colliander, C.; Sjögårde, P. Exploring the relation between referencing practices and citation impact: A large-scale study based on Web of Science data. J. Assoc. Inf. Sci. Technol. 2018, 69, 728–743. [Google Scholar] [CrossRef]
Fortunato, S.; Bergstrom, C.T.; Börner, K.; Evans, J.A.; Helbing, D.; Milojević, S.; Petersen, A.M.; Radicchi, F.; Sinatra, R.; Uzzi, B. Science of science. Science 2018, 359, eaao0185. [Google Scholar] [CrossRef]
Golosovsky, M. Universality of citation distributions: A new understanding. Quant. Sci. Stud. 2021, 2, 527–543. [Google Scholar] [CrossRef]
Liu, P.; Xia, H. Structure and evolution of co-authorship network in an interdisciplinary research field. Scientometrics 2015, 103, 101–134. [Google Scholar] [CrossRef]
Waltman, L.; van Eck, N.J.; van Raan, A.F. Universality of citation distributions revisited. J. Am. Soc. Inf. Sci. Technol. 2012, 63, 72–77. [Google Scholar] [CrossRef]

Figure 1. Evolution of cumulative papers and authors: (a) Yearly progression; (b) Author accumulation in relation to paper accumulation.

Figure 2. Model simulations vs. APS empirical data: (a) Annual average team size increase; (b) Distribution of paper team sizes.

Figure 3. Distributions of author ability and paper quality: (a) Author ability distribution; (b) Paper quality distribution.

Figure 4. Model simulations vs. APS empirical data: (a) Researcher productivity distribution; (b) Collaborator number distribution.

Figure 5. Model simulations vs. APS empirical data: (a) Annual average reference numbers increase; (b) Reference number distribution.

Figure 6. Model simulations vs. APS empirical data: (a) Citation number distribution; (b) Temporal variation of the journal impact factor of the APS dataset.

Figure 7. Model simulation versus APS empirical data: (a)

h

-index distribution in the final year; (b) Temporal variation of the

h

-index for the top 3 researchers.

Figure 7. Model simulation versus APS empirical data: (a)

h

-index distribution in the final year; (b) Temporal variation of the

h

-index for the top 3 researchers.

Figure 8. Impact of paper lifetime

θ

on journal impact factor: (a) Temporal variation of journal impact factor at different

θ

; (b) The journal impact factor as functions of

θ

at different year.

Figure 8. Impact of paper lifetime

θ

on journal impact factor: (a) Temporal variation of journal impact factor at different

θ

; (b) The journal impact factor as functions of

θ

at different year.

Figure 9. Impact of the paper lifetime

θ

on the

h

-index: (a) Distribution of

h

-index at different

θ

; (b) Average

h

-index as functions of

θ

at different years.

Figure 9. Impact of the paper lifetime

θ

on the

h

-index: (a) Distribution of

h

-index at different

θ

; (b) Average

h

-index as functions of

θ

at different years.

Figure 10. Impact of reference number

N

on journal impact factor: (a) Temporal variation of journal impact factor at different

N

; (b) The journal impact factor as functions of

N

at different years.

Figure 10. Impact of reference number

N

on journal impact factor: (a) Temporal variation of journal impact factor at different

N

; (b) The journal impact factor as functions of

N

at different years.

Figure 11. Impact of the reference number

N

on the

h

-index: (a) Distribution of

h

-index at different

N

; (b) Average

h

-index as functions of

N

at different years.

Figure 11. Impact of the reference number

N

on the

h

-index: (a) Distribution of

h

-index at different

N

; (b) Average

h

-index as functions of

N

at different years.

Figure 12. Impact of the average team size

m

on the

h

-index: (a) Distribution of

h

-index at different

m

; (b) Average

h

-index as functions of

m

at different year.

Figure 12. Impact of the average team size

m

on the

h

-index: (a) Distribution of

h

-index at different

m

; (b) Average

h

-index as functions of

m

at different year.

Figure 13. Impact of the probability of newcomers

p

on the

h

-index: (a)

h

-index at different

p

; (b) Average

h

-index as functions of

p

at different years.

Figure 13. Impact of the probability of newcomers

p

on the

h

-index: (a)

h

-index at different

p

; (b) Average

h

-index as functions of

p

at different years.

Figure 14. Impact of the team size

m

on the

h

-index: (a) Distribution of

h

-index at different

m

; (b) Average

h

-index as functions of

m

at different years.

Figure 14. Impact of the team size

m

on the

h

-index: (a) Distribution of

h

-index at different

m

; (b) Average

h

-index as functions of

m

at different years.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, H. Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks. Information 2024, 15, 597. https://doi.org/10.3390/info15100597

AMA Style

Xue H. Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks. Information. 2024; 15(10):597. https://doi.org/10.3390/info15100597

Chicago/Turabian Style

Xue, Haobai. 2024. "Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks" Information 15, no. 10: 597. https://doi.org/10.3390/info15100597

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks

Abstract

1. Introduction

1.1. Research Background and Significance

1.2. Literature Review

1.2.1. Citation Network

1.2.2. Coauthorship Network

1.2.3. Coevolution of Both Networks

1.2.4. Scientific Impact Indicators

1.3. Theoretical and Practical Implications

2. Model Formulation and Validation

2.1. APS Database

2.2. Growth of Number of Papers and Number of Authors

2.3. Paper Team Assembly

2.4. Author Ability and Paper Quality

2.5. Coauthorship Network

2.6. Reference Model

2.7. Citation Network

2.8. Journal Impact Factor

2.9. h-Index

2.10. Rationale Behind Parameter Choices and Model Architecture

2.11. Comparison with State-of-the-Art (SOTA) Baseline Models

2.12. Generalizability across Disciplines

3. Sensitivity Analysis and Results

3.1. Paper Lifetime θ

3.2. Reference Number

3.3. Team Size m at Fixed p

3.4. Probability of Newcomers p

3.5. Team Size m at Fixed k

4. Discussion

4.1. Interpretation of Findings in Relation to Previous Literature

4.2. Actionable Insights for Researchers, Publishers, and Policymakers

4.3. Ethical Considerations in the Use of Scientific Impact Indicators

4.4. Limitations of the Study

4.5. Future Research Directions

5. Conclusions

Supplementary Materials

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI