Next Article in Journal
The Time from Submission to Publication in Primary Health Care Journals: A Cross-Sectional Study
Previous Article in Journal
Application of ChatGPT in Information Literacy Instructional Design
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coping with the Inequity and Inefficiency of the H-Index: A Cross-Disciplinary Empirical Analysis

1
Dipartimento di Scienze per la Qualità della Vita, Università di Bologna, C.so d’Augusto 237, 47921 Rimini, Italy
2
Dipartimento di Scienze Statistiche “Paolo Fortunati”, Università di Bologna, 40126 Bologna, Italy
*
Author to whom correspondence should be addressed.
Publications 2024, 12(2), 12; https://doi.org/10.3390/publications12020012
Submission received: 3 November 2023 / Revised: 10 April 2024 / Accepted: 15 April 2024 / Published: 22 April 2024

Abstract

:
This paper measures two main inefficiency features (many publications other than articles; many co-authors’ reciprocal citations) and two main inequity features (more co-authors in some disciplines; more citations for authors with more experience). It constructs a representative dataset based on a cross-disciplinary balanced sample (10,000 authors with at least one publication indexed in Scopus from 2006 to 2015). It estimates to what extent four additional improvements of the H-index as top-down regulations (∆Hh = Hh − Hh+1 from H1 = based on publications to H5 = net per-capita per-year based on articles) account for inefficiency and inequity across twenty-five disciplines and four subjects. Linear regressions and ANOVA results show that the single improvements of the H-index considerably and decreasingly explain the inefficiency and inequity features but make these vaguely comparable across disciplines and subjects, while the overall improvement of the H-index (H1–H5) marginally explains these features but make disciplines and subjects clearly comparable, to a greater extent across subjects than disciplines. Fitting a Gamma distribution to H5 for each discipline and subject by maximum likelihood shows that the estimated probability densities and the percentages of authors characterised by H5 ≥ 1 to H5 ≥ 3 are different across disciplines but similar across subjects.

1. Introduction

To the best of our knowledge, few papers (e.g., [1,2]) suggest an index to evaluate interdisciplinary CVs (i.e., authors applying usual methodologies to unusual topics or vice versa). In particular, Zagonari [1] identifies the interdisciplinary percentage of any CV (i.e., articles in a discipline or subject quoted by articles in different disciplines or subjects) to be applied to the H-index characterising each author, where the H-index is chosen as an easily generated quantitative index. However, this interdisciplinary index requires a homogeneous H-index across disciplines or subjects to avoid gains for some interdisciplinary scientists (e.g., across medicine and computing) and losses for other interdisciplinary scientists (e.g., across art and mathematics) [3].
Within the huge theoretical and empirical literature on variants and extensions of the H-index (e.g., a-index, ar-index, m-quotient, raw h-rate, contemporary h-index, f-index, t-index, wu-index, maxpord index, q2-index within variants, and hw-index, hm-index, hi-index, hc-index, m-quotient, ht-index, fraction count on citation, fraction count on paper, age-based h-index) [4,5,6], some papers suggest some improvements of the H-index to increase homogeneity across disciplines (e.g., [7,8,9,10]). In particular, Zagonari [9] develops an empirically validated theoretical model of a researcher’s publication goal by providing two internal criteria (i.e., efficiency and equity) to evaluate a bibliometric index (i.e., it grounds these concepts on an analytical model representing the researchers’ incentives to maximise their H-index) and by suggesting two standardisations (i.e., calculate publications per author and citations per author per year) and two guidelines (i.e., neglect co-authors’ reciprocal citations and publications other than peer-reviewed articles) to predict which standardisations and guidelines are most likely to succeed in achieving efficiency and equity across disciplines. The following relationships between improvements, criteria, standardisations, and guidelines are identified:
  • Inefficiency (i.e., biased incentives to the research activity in terms of scientific achievements) is managed by focusing on articles instead of publications (i.e., publications include non-peer reviewed research) (Inefficiency a, Ifa hereafter) and by using net instead of gross citations (i.e., gross citations include co-authors’ reciprocal citations) (Inefficiency b, Ifb hereafter). In terms of H-index improvements, ∆H1 = H1 − H2 deals with the overvaluation of possibly non-original research such as reviews, proceedings, and editorials, where H1 = H-index based on publications and H2 = H-index based on articles; ∆H2 = H2 − H3 deals with the overemphasis put on co-authors’ reciprocal citations as a measure of actual knowledge diffusion, where H3 = H-index based on net citations for articles.
  • Inequity (i.e., biased rankings in favour of some authors and some disciplines) is managed by using a per-capita H-index to account for the different co-authorship practices prevailing in different disciplines (i.e., more co-authors in some disciplines) (Inequity a, Iqa hereafter) and by using a per-year H-index to account for the different citation periods related to authors with more scientific experience (i.e., they can rely on a longer citation period) (Inequity b, Iqb hereafter). In terms of H-index improvements, ∆H3 = H3 − H4 deals with the huge differences in the number of co-authors and thus in the number of articles in favour of some disciplines, where H4 = net per-capita H-index based on articles; ∆H4 = H4 − H5 deals with the obviously large number of citations received by researchers with more experience and thus the likely worse assessment of the scientific production in disfavour of researchers with less experience, where H5 = net per-capita per-year H-index based on articles.
Note that all acronyms and variables are described in Table 1. Zagonari [9] is focused on the degree of efficiency and equity rather than on the homogeneity across disciplines which can be achieved by the suggested standardisations and guidelines, and a theoretical approach is adopted (although the structural model is validated in terms of means and variances) rather than a statistical approach (where reduced models are tested in terms of residuals and distributions). Note that Ifa favours authors who minimise efforts and risks related to a collaborative and creative scientific activity by relying on the large number of citations to reviews. Ifb favours authors who spend efforts on networking at no risk rather than on a creative scientific activity. Iqa favours authors who reduce efforts and risks (i.e., they misuse the prevailing measurement of scientific activity based on the principle “one article with n co-authors is n articles”), by spending efforts on networking rather than on a creative scientific activity. Iqb favours authors who minimise efforts at no risk (i.e., they misuse the prevailing measurement of scientific activity based on the principle “the overall sum of citations matters”) by spending efforts on networking rather than on a creative scientific activity.
Within the recent empirical literature on indexes for interdisciplinary science (e.g., [11]), the purpose of this paper is to statistically test to what extent the suggested improvements of the H-indexes [9], considered as top-down regulations, account for different publication and citation habits characterising different disciplines and subjects in order to enable suitable comparisons of interdisciplinary scientists [1] across disciplines and subjects. To do so, we suggest some measures of inefficiency and inequity in Section 2. We construct a representative sample in Section 3. We provide results for each single H-index improvement ∆Hh based on linear regressions and analysis of variance (ANOVA) in Section 4.1 as well as results for H5 based on maximum likelihood fittings and quantile analysis in Section 4.2 by introducing the assumption that the observed H5 values for both disciplines and subjects are realisations of a gamma distribution. This is followed by a discussion of findings, weaknesses, and strengths in Section 5, before conclusions and final remarks about methodological and practical potentials in Section 6.
Note that the use of H-index improvements suggested by Zagonari [9] instead of other developments will be justified in Section 5. Moreover, our observation unit will be each author (Ai), rather than journals (e.g., [12]) or institutions (e.g., [13]), since our goal is the evaluation of interdisciplinary scientists. Finally, the use of H-index improvements as policies suggested by Zagonari [9] will be justified in Section 6.
In other words, the research questions of the present study can be summarised as follows:
  • Does each single H-index improvement ∆Hh properly solve inefficiency and inequity issues?
  • Does each single H-index improvement ∆Hh spread inefficiency and inequity issues uniformly across disciplines Dj and subjects Sk?
  • Can any discipline Dj and subject Sk be distinguished from other disciplines and subjects, respectively, net of ∆Hh?
  • Does the comprehensive H-index improvement H1–H5 properly solve inefficiency and inequity issues?
  • Does the comprehensive H-index improvement H1–H5 spread inefficiency and inequity issues uniformly across disciplines Dj and subjects Sk?
  • Can any discipline Dj and subject Sk be distinguished from other disciplines and subjects, respectively, net of H1–H5?
  • Are disciplines Dj and subjects Sk characterised by similar parametric distributions for H5 (i.e., plots have similar shapes) and by similar right tails (i.e., similar percentages of authors with H5 ≥ 1, H5 ≥ 1.5, H5 ≥ 2, H5 ≥ 2.5, H5 ≥ 3)?
Note that improvements of H-indexes are taken as policies based on limited information about each single author (e.g., year of the first publication) or discipline and subject (e.g., average number of citations per article). Moreover, the application of the interdisciplinary index to a homogeneous H-index refers to subjects to a greater extent than disciplines (i.e., really powerful interdisciplinary research is across subjects). Statistical analyses of this feature are presented for subjects in the Results and for disciplines in the Appendix A, Appendix B, Appendix C and Appendix D. Finally, as for a classification of studies on bibliometrics in terms of internal vs. external criteria (e.g., [14]) and in terms of theoretical vs. empirical approaches (e.g., [15]), the present paper refers to external theoretical concepts (i.e., efficiency and equity, by adding the concept of homogeneity across disciplines), but it adopts an empirical approach. This feature implies that our results will depend on the used sample: in other words, a theoretical proof based on internal criteria will not be attained, similarly to all other empirical studies. However, we will refer to external criteria of judgment supported by the structural model validated in Zagonari [9] and we will perform a statistical analysis of reduced forms of the same model by referring to the same dataset used in Zagonari [9].
In summary, by focusing on subjects Sk, apart from ∆H2, neither each single H-index improvement nor the comprehensive H-index improvement solves inefficiency and equity issues (i.e., answer NO to research questions 1 and 4), although the comprehensive H-index improvement makes them confidently uniform across subjects (i.e., answer NO to research question 2 and answer YES to research question 5). Moreover, apart from subject S1 for Ifa and subject S4 for Ifb, all subjects are similar in terms of inefficiency and inequity (i.e., answer NO to research question 3 and answer NO to research question 6). Finally, subjects show similar gamma distribution fits and similar right tails (i.e., answer YES to research question 7).

2. Measuring Inefficiency and Inequity by H-Indexes

In order to check if improved H-indexes solve efficiency and equity problems on average and to a greater or smaller extent in each single discipline, efficiency and equity and H improvements described in Section 1 are specified as follows (i.e., Npub = number of publications, Nart = number of articles, Ngro = number of citations including co-author citations, Nnet = number of citations excluding co-author citations, Naut = mean number of co-authors for each author, expert = authors with the first publication before 2011 to include up to 10 years from 2006 to 2015, inexpert = authors with the first publication after 2010 to include up to 5 years from 2006 to 2010):
  • Inefficiency a (i.e., many publications other than articles for each author):
N p u b N a r t   ~   ( H 1 H 2 ) + D 10 + + D j + + D 36
  • Inefficiency b (i.e., many co-authors’ reciprocal citations for each author):
N g r o N n e t   ~   ( H 2 H 3 ) + D 10 + + D j + + D 36
  • Inequity a (i.e., more co-authors in some disciplines):
N n e t N n e t / N a u t   ~   ( H 3 H 4 ) + D 10 + + D j + + D 36
  • Inequity b (i.e., more citations for authors with more experience):
N n e t / N a u t e x p e r t N n e t / N a u t i n e x p e r t   ~   ( H 4 H 5 ) + D 10 + + D j + + D 36
where issues are depicted on the left-hand side (lhs), policies and disciplines are depicted on the right-hand side (rhs), and “lhs ~ rhs” stands for “lhs depends on the variables on the rhs”. Note that Equations (1)–(4) are reduced analytical forms of the structural theoretical model specified in Zagonari [9]. In particular, each ∆Hh depicts to what extent each bias on the left-hand side is tackled by the improved H-index as a top-down regulation (i.e., ∆H1 = H1 − H2, ∆H2 = H2 − H3, ∆H3 = H3 − H4, ∆H4 = H4 − H5), whereas the dummy variables Dj (i.e., Dj takes value 1 for a discipline j and 0 for disciplines other than j) represent to what extent each discipline is not well represented by each ∆Hh. We will perform similar analyses for subjects Sk. Appendix A provides the list of the 27 disciplines used by Scopus, whereas the 4 subjects can be detailed as follows: 1. health (i.e., medicine, veterinary, nursing, dentistry, health professions), 2. life (i.e., pharmacology and toxicology, biological, neurology, agricultural, immunology), 3. physical (i.e., chemistry, physics and astronomy, mathematics, Earth and planetary, energy, environmental, materials, engineering, computing and information), and 4. social (i.e., psychology, economics and econometrics and finance, arts and humanities, business and management and accounting, decision, sociology).
Note that Npub is likely to be underestimated, since many reviews are published as articles. Moreover, estimations are based on differences if the bias under consideration affects the number of citations only (i.e., Equations (2) and (4)) as well as if the bias affects both the number of authors and the number of citations (i.e., Equations (1) and (3)). Finally, each subsequent bias is additional to the previous one. Thus, in order to test the performances of the comprehensive H-index improvement for addressing the overall bias, we will refer to the following equation:
N g r o e x p e r t N n e t / N a u t i n e x p e r t   ~   ( H 1 H 5 ) + D 10 + + D j + + D 36
where the lhs represents the overall bias, since the focus is on publications for expert authors and on articles for inexpert authors. We will perform similar analyses for subjects by replacing the dummy variables for disciplines Dj with the dummy variables for subjects Sk.
Note that Zagonari [9] does not include H4 and H5. Moreover, Zagonari [1] shows that interdisciplinary science requires an additional category, together with orthodox science (i.e., authors publish in a single discipline and in many journals, and the vast majority of the citations are in few disciplines but in many different journals) and heterodox science (i.e., authors publish in a single discipline and in a few journals devoted to that discipline, so that the vast majority of citations are in few disciplines and few journals), to be combined with H5 to reduce unfair rankings between interdisciplinary scientists (i.e., authors publish in many disciplines and journals, and the vast majority of citations are in many different disciplines and journals) across different disciplines as well as between interdisciplinary scientists and single-discipline scientists collaborating with many authors from different disciplines. Finally, Zagonari [9] does not include quantitative results based on linear regressions or parameter estimations.

3. Constructing the Dataset

In order to obtain a representative dataset for authors, we applied the following stratified sampling. The reference population consists of authors with at least one publication in the Scopus dataset from 2006 to 2015. This population is partitioned by discipline: we used the 27 scientific disciplines suggested by Scopus [16].
By preserving the percentages of authors in each scientific discipline, 10,000 authors are then randomly extracted from the Scopus database. This design required the attribution of each author to a single discipline: we used the attribution suggested by Scopus, where an author is linked to the discipline with the largest percentage of publications.
Table 2 and Table 3 provide the summary statistics for subjects Sk, while Appendix B provides the summary statistics for disciplines Dj. Altogether, the dataset includes 1,487,866 co-authors, 507,557 papers, 31,950 journals, and 562,688 citations. The Supplementary Materials provide the histograms of H1 and H5 for both disciplines Dj and subjects Sk.

4. Results

4.1. ANOVA and Linear Regressions

In order to check if the improved H-indexes solve efficiency and equity problems on average and to a greater or smaller extent in each discipline, we will perform ANOVA based on linear regressions [17] (see Appendix D for ANOVA based on a quasi-Poisson distribution). In particular, we will translate the theoretical models presented in Section 2 into regression models as follows (i.e., Aaut takes value 1 for authors with the first publication after 2010):
  • Inefficiency a (i.e., many publications other than articles for each author):
N p u b N a r t   ~   ( H 1 H 2 ) + D 11 + + D j + + D 35
  • Inefficiency b (i.e., many co-authors’ reciprocal citations for each author):
N g r o N n e t   ~   ( H 2 H 3 ) + D 11 + + D j + + D 35
  • Inequity a (i.e., more co-authors in some disciplines):
N n e t   ~   N a u t + ( H 3 H 4 ) + D 11 + + D j + + D 35
  • Inequity b (i.e., more citations for authors with more experience):
N n e t   ~   N a u t + A a u t + ( H 4 H 5 ) + D 11 + + D j + + D 35
where ~ means that the variable on the lhs is linearly dependent on the variables on the rhs.
Note that we disregarded discipline #10 (multidisciplinary) and discipline #36 (health professions), since few authors are attached to them in our sample (i.e., 2 authors for discipline #10 and 1 author for discipline #36). Moreover, in the model for Iqa (i.e., Equation (8)), Naut is moved to the rhs. Indeed, this inequity is based on the prediction that authors with several co-authors are likely to achieve more citations (i.e., Naut is a stochastic variable). We estimated this relationship before trying to explain the impact of this H-index improvement on rhs and check for residual heterogeneity explained by the discipline dummies. Finally, in the model for Iqb (i.e., Equation (9)), both Naut and Aaut are moved to the rhs. Indeed, this inequity is based on the prediction that expert authors with several co-authors are likely to achieve more citations (i.e., Naut and Aaut are stochastic variables). Again, we estimated these relationships before trying to explain the impact of this H-index improvement on rhs and check for residual heterogeneity explained by the discipline dummies. Thus, the overall bias can be represented by:
N g r o   ~   N n e t + N a u t + A a u t + ( H 1 H 5 ) + D 11 + + D j + + D 35
where the focus is on articles. We will perform similar analyses for subjects Sk.
Note that we will check firstly for the significance levels of variables and secondly for their coefficient values. Moreover, Equation (10) can be obtained by summing up terms of rhs and lhs in Equations (6)–(9). Finally, we will check for differences between specific disciplines or subjects only if their general explanation of variability is significant.
A methodological remark is worth making here. An ANOVA exercise as a descriptive method (i.e., calculated significance levels) relies on the assumption of normal distributions, although its descriptive statistics (i.e., estimated coefficients and explained variance) allows a simple interpretation of results. Each ANOVA table in Section 4.1 is associated with an analogous table in Appendix D based on a log-linear model involving Poisson and negative-binomial distributions; additional methodological details are provided in Appendix D.

4.1.1. Inefficiency a (Ifa) (Many Publications Other Than Articles)

All authors in our dataset have H1 = H2 (i.e., ∆H1 does not affect Inefficiency a). In other words, publications other than articles do not affect their H-index (e.g., eminent authors are asked to write a review or a book). Consequently, we will apply ANOVA only to disciplines and subjects. In particular, Table 4 shows that the variance explained by disciplines Dj is mildly significant but tiny. In contrast, Table 5 shows that the variance explained by subjects Sk is significant but tiny. Note that we will hereafter use slightly significant whenever * applies (i.e., significant at 95%), mildly significant whenever ** applies (i.e., significant at 99%), and significant whenever *** applies (i.e., significant at 99.9%).
Next, Table A4 in Appendix C shows that, apart from D27 and D35, all disciplines Dj are characterised by a percentage of publications other than articles smaller than 1. Similarly, Table 6 shows that, apart from S1, all subjects Sk are characterised by a percentage of publications other than articles smaller than 1.

4.1.2. Inefficiency b (Ifb) (Many Co-Authors’ Reciprocal Citations)

The application of ANOVA to Ifb (i.e., Equation (7)) shows that ∆H2 explains 26.79% of its variability. In particular, Table 7 and Table 8 show that the residual variances explained by disciplines Dj and subjects Sk are slightly significant and tiny (i.e., ∆H2 makes Dj and Sk homogeneous with respect to Ifb). In other words, 26.79% of the Ifb variability is explained by ∆H2. The remaining 73.21% of its variability can be decomposed in variance between disciplines in Table 7 (subjects in Table 8) that accounts for only 0.29% (0.06% in Table 8) and variance within disciplines (subjects) that accounts for the remaining 72.92% (73.15% in Table 8). Thus, even if disciplines and subjects are slightly significant, these factors explain very little of Ifb (i.e., we can safely affirm that, once the Ifb variability explained by ∆H2 is removed, the remaining variance is within disciplines and subjects to a greater extent than across disciplines and subjects: 72.92% > 0.29%).
Next, Table A5 in Appendix C shows that, apart from D13, D27, and D31, all disciplines Dj are characterised by an insignificant intercept in the linear regression (7) (i.e., ∆H2 depicts intercepts for those disciplines), where two significant coefficients are positive and large (i.e., larger than 0.1). Similarly, Table 9 shows that, apart from S4, all subjects Sk are characterised by a significant and positive intercept in the linear regression (7) (i.e., ∆H2 depicts intercepts only for that subject), where one significant coefficient is positive and tiny (i.e., larger than 0.1).
Note that we did not detail comparisons between subjects and disciplines, since the variance explained by subjects and disciplines altogether is slightly significant and tiny.

4.1.3. Inequity a (Iqa) (More Co-Authors in Some Disciplines)

The application of ANOVA to Iqa (i.e., Equation (8)) shows that ∆H3 describes 4.38% of its variability. In particular, Table 10 shows that the variance explained by disciplines Dj is significant and small (i.e., ∆H3 does not make Iqa homogeneous across disciplines Dj). In contrast, Table 11 shows that the variance explained by subjects Sk is significant and tiny (i.e., ∆H3 makes Iqa homogeneous across subjects Sk).
Note that Table 12 for subjects and Table A6 in Appendix C for disciplines suggest that adding one co-author to an author significantly affects the number of net citations by around 0.01. In other words, Iqa, although it is statistically significant, it turns out to be a marginal feature for most authors (i.e., to have a substantial effect on the number of citations, an author should publish with several hundreds or thousands of co-authors). However, ∆H3 contains more information than the number of authors to explain the number of net citations, both for subjects (i.e., 1.68 > 0.0097) and disciplines (i.e., 1.61 > 0.0105).
Next, Table A6 in Appendix C shows that, apart from D12, D18, D20, D21, D26, D29, D30, D34, and D35, all disciplines Dj are characterised by a significant intercept in the linear regression (8) (i.e., ∆H3 depicts intercepts only for those disciplines), where four significant coefficients are positive and large (i.e., larger than 5). Similarly, Table 12 shows that all subjects Sk are characterised by a significant and positive intercept in the linear regression (7) (i.e., ∆H3 does not depict intercepts for subjects), where two significant coefficients are positive and large (i.e., larger than 5).
Note that Table 12 and Table A6 in Appendix C show that there is still some heterogeneity not explained by ∆H3 across disciplines and subjects. In particular, looking at the estimated coefficients and their standard errors, it seems that Subject 1 (health) is similar to Subject 2 (life) and Subject 3 (physical) is similar to Subject 4 (social). The significance levels reported in Table 13 sustain the above conjecture: once the effect of ∆H3 is removed, the residual Iqa is different between Subject 1 (health) and Subject 2 (life) on one side and between Subject 4 (social) and the other subjects on the other side. Similarly, Figure S5 in the Supplementary Materials reports the significance levels of the differences between the dummies for disciplines Dj: there is evidence of residual heterogeneity only for disciplines 13, 27, and 28. This suggests that the set of disciplines could be partitioned into two groups, with disciplines 13, 27, and 28 in the first group, and the other disciplines in the second group.

4.1.4. Inequity b (Iqb) (More Citations for Authors with More Experience)

The application of ANOVA to Iqb (i.e., Equation (9)) shows that ∆H4 explains 0.46% of its variability. In particular, Table 14 shows that the variance explained by disciplines Dj is significant and small (i.e., ∆H4 does not make Iqb homogeneous across disciplines Dj). In contrast, Table 15 shows that the variance explained by subjects Sk is significant and tiny (i.e., ∆H4 makes Iqb homogeneous across subjects Sk).
Note that, looking at the estimated coefficients, an expert researcher receives on average 8.84 and 8.86 more net citations per article than an inexpert researcher (Table 16 for subjects and Table A7 in Appendix C for disciplines). However, this relation describes 4.48% of the net citation variability (Table 14 for disciplines and Table 15 for subjects). In other words, Iqb is significantly present in our sample.
Next, Table A7 in Appendix C shows that, apart from D18 and D29, all disciplines Dj are characterised by a significant intercept in the linear regression (9) (i.e., ∆H4 depicts intercepts only for those disciplines), where ten significant coefficients are positive and large (i.e., larger than 10). Similarly, Table 16 shows that all subjects Sk are characterised by a significant and positive intercept in the linear regression (9) (i.e., ∆H4 does not depict intercepts for subjects), where two significant coefficients are positive and large (i.e., larger than 10).
Note that Table 17 shows that, once Iqb is explained by ∆H4, there is still some heterogeneity across subjects, where Subject 1 (health) is similar to Subject 2 (life) on one side and Subject 3 (physical) is similar to Subject 4 (social) on the other side. Similarly, Figure S6 in the Supplementary Materials shows that apart from disciplines 13, 28, 27, and 30, all disciplines are similar.

4.1.5. Overall Bias including All Inefficiency and Inequity

The application of ANOVA to the overall bias (i.e., Equation (10)) shows that ∆H5 explains 0.04% of its variability. In particular, Table 18 shows that the variance explained by disciplines Dj is significant and null (i.e., ∆H5 makes Dj homogeneous with respect to the overall bias). In contrast, Table 19 shows that the variance explained by subjects Sk is slightly significant and null (i.e., ∆H5 makes Sk homogeneous with respect to the overall bias).
Next, Table A8 in Appendix C shows that, apart from D12, D18, D26, and D29, all disciplines Dj are characterised by a significant intercept in the linear regression (10) (i.e., ∆H5 depicts intercepts only for those disciplines), where one significant coefficient is negative and large (i.e., larger than 0.5). Similarly, Table 20 shows that all subjects Sk are characterised by a significant and negative intercept in the linear regression (10) (i.e., ∆H5 does not depict intercepts for subjects), where all coefficients are negative and small (i.e., smaller than 0.5). Note that Figure S7 in the Supplementary Materials highlights an overall homogeneity across disciplines, apart from D31 and D16 (i.e., only D31 and D16 are statically different from the other dummies), while we did not detail the differences between subjects, since the variance explained by subjects altogether is slightly significant and tiny. In other words, ∆H5 explains the overall bias across disciplines, except for those three disciplines.
Therefore, H5 turns out to be satisfactory in making the overall bias homogeneous across disciplines and subjects. In Section 4.2, we will focus on H5 to perform additional analyses. Note that the author profile within Scopus enables the calculation of H1, H2, and H3.

4.2. Maximum Likelihood Fittings

Figure 1 shows the histograms of H5 for the 25 disciplines. Figure 2 presents the maximum likelihood fittings of gamma distributions for the 25 disciplines. Table A9 in Appendix C shows the percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5, and 3 for the 25 disciplines.
Thus, disciplines are characterised by different gamma distributions and different quantiles.
Figure 3 shows the histograms of H5 for the four subjects. Figure 4 presents the maximum likelihood fittings of gamma distributions for the four subjects. Table 21 shows the percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5, and 3 for the four subjects.
Thus, subjects are characterised by similar gamma distributions and similar quantiles.

5. Discussion

We applied ANOVA analyses and linear regressions together with maximum likelihood fittings and quantile analyses to answer the seven research questions specified in Section 1.
The main specific insights obtained can be summarised as follows. By focusing on disciplines Dj, apart from ∆H2, neither each single H-index improvement nor the comprehensive H-index improvement solves inefficiency and equity issues (i.e., answer NO to research questions 1 and 4), although the comprehensive H-index improvement makes them slightly different across disciplines (i.e., answer NO to research question 2 and answer NO to research question 5). Moreover, apart from D27 and D35, all disciplines are characterised by a similar level of Inefficiency a; apart from D13, D27, and D31, all disciplines are characterized by an insignificant level of Inefficiency b; apart from D12, D18, D20, D21, D26, D29, D30, D34, and D35, all disciplines are characterised by a significant level of Inequity a; apart from D18 and D29, all disciplines are characterised by a significant level of Inequity b; apart from D12, D18, D26, and D29, all disciplines are characterised by a significant level of overall bias, and some disciplines are similar but some disciplines are different in terms of inefficiency and inequity (i.e., answer YES to research question 3 and answer YES to research question 6). Finally, disciplines show different gamma distributions and different quantiles (i.e., answer NO to research question 7).
The main general insights obtained can be summarised as follows. By referring to Table 18 for disciplines, the variability of the overall bias amounts to 10,162 (i.e., sum of squares is 1562 + 71 + 8529), where 15.37% (i.e., 1562/10,162) of this bias could have been reduced by using ∆H5. The remaining bias is within disciplines for 83.93% (i.e., 8529/10,162) and across disciplines for less than 1%. Similarly, by referring to Table 19 for subjects, the variability of the overall bias amounts to 10,162 (i.e., sum of squares is 1562 + 14 + 8586), where 15.37% (i.e., 1562/10,162) of this bias could have been reduced by using ∆H5. The remaining bias is within subjects for 84.49% (i.e., 8586/10,162) and across subjects for less than 1%.
Note that Inefficiency b (i.e., many co-authors’ reciprocal citations) turning out to be statistically significant for few disciplines (i.e., D13, D27, and D31) could be interpreted as inequity.
Therefore, the present study shows that the net per-capita per-year H-index based on articles can be used to evaluate interdisciplinary scientists. In particular, the empirical approach adopted in the present study highlighted that the suggested improvements of the H-index as policies did not implement efficiency and equity across disciplines and subjects in the dataset under consideration, although all suggested improvements combined produced homogeneity across subjects (i.e., a crucial feature in evaluating interdisciplinary science). Note that a homogeneous H-index across subjects is a necessary condition for a proper assessment of interdisciplinary science, whereas interdisciplinary science does not solve theoretical and empirical problems identified for the H-index. Next, the empirical demonstration that the suggested improvements of the H-index represent a useful tool to evaluate interdisciplinary science does not imply that it will be used whenever comparisons between subjects are required (e.g., in allocating funds in interdisciplinary departments), although it can be easily implemented (e.g., an algorithm is available on Scopus.com to compute the efficiency improvements of the H-index; Zagonari [1] provides software to calculate all suggested improvements of the H-index) [18]. In other words, the adoption of homogeneity as a criterion is a political/academic decision rather than a technical/scientific issue [19].
Nevertheless, two main limits of the present study must be highlighted. First, the applications of ANOVA and linear regressions are justified by the straightforward interpretation of their results, although they provide a statistical description of the sample under consideration based on the assumption of a normal distribution. However, the references to reduced forms (see Equations (1)–(5)) of the structural model developed by Zagonari [9] (i.e., a very plausible model for authors who aim at maximising their H-index) and the consistent results (see Appendix D) obtained by applying a weighted quasi-Poisson distribution (i.e., a very plausible distribution for the stochastic phenomenon of articles’ citations) [20] seem to also support similar insights outside the sample under consideration. Second, the association of each author with a single discipline is justified by the classification of authors adopted by the Scopus dataset, although it might be too simplistic for some authors. However, a continuous classification of authors in terms of percentages to weigh all disciplines in the publication experience of each author would require a similar continuous classification for all journals and all articles.
Some methodological remarks are worth making here.
Improvements of the H-index other than those suggested by Zagonari [9] could have been used. In particular, we disregarded:
  • Impact factors [21]. However, this feature is misleading, since a paper poorly cited but published in a high-impact journal should be punished rather than rewarded, since it wasted a popular stage.
  • Gender differences [22]. However, this feature is irrelevant in making disciplines and subjects homogeneous.
  • Google or WoS indexes [23,24,25]. However, these datasets are shown to be more easily manipulated than the Scopus dataset.
  • Negative citations [6]. However, this bias is likely to be negligible, since papers criticising a paper do not need to quote it many times.
  • Country differences [26]. However, this feature is irrelevant in making disciplines and subjects homogeneous.
  • Co-authorship networks [27]. However, this feature is misleading in focusing on inefficiency and inequity across authors in different disciplines and subjects.
Note that we omitted the editors’ trick of magnifying citations of papers published in a journal as a precondition to publish in it, since some journals are often tightly linked to some topics.
In contrast, we emphasised:
  • H-index dynamics. In fact, other papers focused on the same feature [11].
  • Linear regressions. However, non-linear estimations require additional assumptions (e.g., a Poisson distribution based on random and over-time independent citations for over-time constant authors) and make interpretations of results more complicated (e.g., impacts of alternative policies ∆Hh and different disciplines Dj or subjects Sk are non-additive) [6].
  • Gamma distributions. In fact, other papers used the same distribution [24].
Note that we standardised with respect to each author rather than with respect to disciplines, while possible specific features of disciplines and subjects are caught by dummies Dj and Sk.
In summary, the main strength of the present study is the reference to scientific 27 disciplines and four subjects. For example, Ryan [28] estimates the same variant of the H-index (i.e., H5), but it refers to 474 observations in five colleges. Next, the main weakness of the present study is its descriptive rather than predictive purpose, by discussing which top-down regulation could have made disciplines and subjects homogeneous in the sample rather than which top-down regulation could make disciplines and subjects homogeneous in the future. For example, Moreira et al. [29] apply the functional form of the distribution of the asymptotic number of citations but to 1283 authors in seven disciplines only (i.e., a similar topic but a smaller sample). Similarly, Kupper [30] applies random forests and gradient boosting machines to 111,156 authors in a single discipline but to predict gender bias (i.e., a similar sample but a narrower topic).

6. Conclusions

The purpose of the present study was to identify an improvement of the H-index, as an easily generated quantitative index based on a readily accessible set of information, in order to enable suitable comparisons of interdisciplinary scientists. We succeeded by considering alternative H-index improvements as top-down regulations (i.e., aware that there is no single bibliometric index accounting for all biases in all disciplines and subjects) and by focusing on both disciplines and subjects (i.e., aware that differences across subjects are more important than differences across disciplines to compare interdisciplinary scientists). Indeed, the net per-capita per-year H-index based on articles does not account for the total variance, although it makes disciplines significant but irrelevant (i.e., research question 1) and subjects insignificant and irrelevant in explaining the total variance (i.e., research questions 2 and 5). Moreover, some disciplines and subjects are highlighted for some H-index improvements (i.e., research questions 3 and 6). Finally, the net per-capita per-year H-index based on articles produces similar gamma distributions and quantiles for subjects but not for disciplines (i.e., research question 7).
In fact, we did much more than identifying an H-index improvement to compare interdisciplinary scientists by suggesting a procedure to empirically evaluate alternative bibliometric indexes. Indeed, it is weak to criticise an index because it does not identify a specific award in a given year (e.g., [4,31,32]) (i.e., critiques from outside but empirical). Moreover, it is not possible to identify a bibliometric index accounting for the many different practices across disciplines (e.g., patents are useful for engineering and chemistry, but inapplicable to arts or economics; the many authors in physics and the many citations in computing cannot be properly compared with the few authors in humanities and the few citations in economics) [33]. Finally, it is weak to criticise an index because it does not account for a specific feature (i.e., critiques from inside but theoretical) [34].
In other words, without any ambition to solve all different (good and bad) practices across disciplines and subjects by relying on general information, we criticised alternative variants of the H-indexes in terms of external and theoretically straightforward criteria (i.e., inefficiency and inequity) by testing the improvements of the H-indexes as policies in achieving an external and empirically straightforward goal such as homogeneity of disciplines and subjects (i.e., theoretical critiques from outside but empirically tested). Note that a possible change in standards in measuring scientific production towards the net per-capita per-year H-index based on articles could foster a potential change in behaviours in publication practices. For example, instead of adding authors as a costless practice, one could organise a network of authors in triplets, where one author appears in each triplet. However, this practice is not costless in terms of coordination efforts and it will favour a smaller number of authors such as the department heads.
The present study could be developed by using a more recent dataset to test the same structural model behind it. However, researchers at Scopus should be engaged to produce a similar sample (i.e., a stratified random sampling requires the complete list of authors). Moreover, the structural model we referred to in our study should be validated again. Finally, in case of adoption of our improved H-indexes within the Scopus framework, everybody could test the present study in any alternative period of time by referring to the same statistics for the whole population of authors.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/publications12020012/s1. Figure S1: Histograms of (log linear) H1 for disciplines Dj; Figure S2: Histograms of (log linear) H5 for disciplines Dj; Figure S3: Histograms of (log linear) H1 for subjects Sk; Figure S4: Histograms of (log linear) H5 for subjects Sk; Figure S5: Linear Regression of Nnet on Naut, ∆H3 and disciplines Dj. Significance levels on the differences between disciplines. Black = 99.9% (D13 = 13 > D28 = 4 > D27 = 3), dark-grey = 99%, light-grey = 95%, white < 95%; Figure S6: Linear Regression of Nnet on Naut, Aaut, ∆H4 and disciplines Dj. Significance levels on the differences between disciplines. Nnet = N. net citations, Naut = N. of co-authors, Aaut = 1 for young authors. Black = 99.9% (D13 = 16 >D28 = 5 > D27 = 2), dark-grey = 99%, light-grey = 95%, white < 95%; Figure S7: Linear Regression of Ngro on Nnet, Naut, Aaut, ∆H5 and disciplines Dj. Significance levels on the differences between disciplines. Ngro = N. gross citations, Nnet = N. net citations, Naut = N. of co-authors, Aaut = 1 for young authors. Black = 99.9% (D31 = 5 > D16 = 1), dark-grey = 99%, light-grey = 95%, white < 95%.

Author Contributions

The authors contributed equally to all activities required by the present study (i.e., conceptualization, methodology, validation, formal analysis, investigation, resources, writing—original draft preparation, writing—review and editing, visualization). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author, although some data are not available due to restrictions from Scopus.

Acknowledgments

We thank Alberto Zigoni and Jeroen Baas, Elsevier, for extracting the stratified sample of authors according to our suggestions and for providing us with the required set of information on the related publications.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. The List of Disciplines

Table A1. The 27 disciplines Dj suggested by Scopus.
Table A1. The 27 disciplines Dj suggested by Scopus.
10-Multidisciplinary
11-Agricultural and Biological Sciences
12-Arts and Humanities
13-Biochemistry, Genetics, and Molecular Biology
14-Business, Management, and Accounting
15-Chemical Engineering
16-Chemistry
17-Computer Science
18-Decision Sciences
19-Earth and Planetary Sciences
20-Economics, Econometrics, and Finance
21-Energy
22-Engineering
23-Environmental Science
24-Immunology and Microbiology
25-Materials Science
26-Mathematics
27-Medicine
28-Neuroscience
29-Nursing
30-Pharmacology, Toxicology, and Pharmaceutics
31-Physics and Astronomy
32-Psychology
33-Sociology
34-Veterinary
35-Dentistry
36-Health Professions

Appendix B. Summary Statistics for Disciplines

Table A2. Summary statistics for disciplines Dj. Notations: mean (SD) is in the first row for each discipline, median [min–max] is in the second row for each discipline; in columns, Npub = No. publications, Nart = No. articles, Naut = No. co-authors, Ngro = No. gross citations, Nnet = No. net citations.
Table A2. Summary statistics for disciplines Dj. Notations: mean (SD) is in the first row for each discipline, median [min–max] is in the second row for each discipline; in columns, Npub = No. publications, Nart = No. articles, Naut = No. co-authors, Ngro = No. gross citations, Nnet = No. net citations.
DjNpubNartNautNgroNnet
115.124 (13.123)5.113 (13.079)5.742 (4.859)4.757 (7.608)4.663 (7.504)
1 [1–206]1 [1–206]5 [1–73.600]2 [0–62.250]2 [0–61.375]
121.866 (1.980)1.866 (1.980)1.565 (1.322)1.093 (2.847)1.086 (2.813)
1 [1–11]1 [1–11]1 [1–8]0 [0–19.600]0 [0–19.400]
135.717 (12.506)5.660 (12.390)10.014 (21.699)13.223 (27.822)12.919 (27.217)
2 [1–172]2 [1–170]7.263 [1–461]5 [0–358]4.800 [0–358]
143.080 (4.737)3.080 (4.737)5.279 (24.033)7.894 (23.229)7.854 (23.204)
1 [1–34]1 [1–34]3 [1–228]2 [0–209]2 [0–209]
152.989 (5.624)2.989 (5.624)4.447 (1.637)5.303 (11.145)5.267 (11.144)
1 [1–40]1 [1–40]4.364 [1–10]1 [0–66]1 [0–66]
166.083 (12.779)6.055 (12.718)5.471 (2.707)7.453 (15.998)7.383 (15.959)
2 [1–138]2 [1–138]5 [1–51]3 [0–256]2.975 [0–256]
173.843 (7.803)3.817 (7.793)4.121 (2.400)6.019 (14.562)5.988 (14.504)
1 [1–74]1 [1–74]4 [1–24]2 [0–130]1.979 [0–130]
184.800 (6.870)4.800 (6.870)2.169 (0.289)11.676 (7.419)11.151 (7.600)
2 [1–17]2 [1–17]2 [2–2.667]14 [2–19]12.667 [2–19]
195.452 (12.361)5.449 (12.333)6.793 (12.239)6.348 (17.586)6.107 (17.251)
2 [1–122]2 [1–121]5 [1–206.500]2.267 [0–232]2.142 [0–231.500]
203.605 (5.497)3.605 (5.497)2.484 (0.965)2.769 (4.889)2.735 (4.829)
2 [1–35]2 [1–35]2.229 [1–5]1 [0–28.500]1 [0–28.500]
212.053 (2.371)2.053 (2.371)4.167 (2.004)4.541 (6.729)4.526 (6.690)
1 [1–16]1 [1–16]4 [1–8.500]1 [0–31]1 [0–31]
223.999 (12.250)3.984 (12.215)4.121 (3.046)3.187 (6.608)3.150 (6.566)
1 [1–240]1 [1–240]4 [1–70.250]1 [0–54.250]1 [0–54.250]
234.940 (8.694)4.928 (8.671)5.046 (3.788)6.189 (10.212)6.101 (10.169)
1 [1–52]1 [1–52]4.750 [1–54]2.500 [0–86]2.500 [0–86]
243.495 (4.639)3.495 (4.639)7.822 (3.837)10.456 (17.103)10.312 (17.036)
1 [1–25]1 [1–25]7.500 [2–34]6 [0–106]5.750 [0–106]
255.165 (10.605)5.143 (10.572)5.452 (2.459)6.695 (20.817)6.632 (20.671)
2 [1–120]2 [1–120]5 [1–31]1.545 [0–221.583]1.500 [0–219.833]
264.797 (9.797)4.797 (9.797)2.689 (1.072)3.048 (9.191)2.959 (9.144)
2 [1–95]2 [1–95]2.667 [1–8]0.500 [0–97]0.500 [0–97]
275.180 (14.520)5.125 (14.247)13.161 (75.762)8.222 (25.264)8.093 (24.715)
1 [1–444]1 [1–435]6.500 [1–2060]2.400 [0–988]2.333 [0–965]
285.271 (7.440)5.250 (7.384)6.175 (2.885)13.877 (23.233)13.804 (23.237)
2 [1–36]2 [1–36]6 [1–18]6 [0–123]6 [0–123]
291.867 (2.134)1.867 (2.134)2.807 (1.964)2.622 (4.680)2.622 (4.680)
1 [1–9]1 [1–9]3 [1–7]0 [0–15]0 [0–15]
302.562 (3.894)2.550 (3.894)6.072 (2.479)3.463 (5.216)3.449 (5.211)
1 [1–23]1 [1–23]5.433 [2–14.500]1.167 [0–30.500]1.167 [0–30.500]
319.992 (35.266)9.948 (35.239)69.653 (323.694)5.945 (11.936)5.515 (11.206)
2 [1–468]2 [1–468]5.556 [1–2837.009]2 [0–110]2 [0–110]
322.948 (3.783)2.939 (3.768)4.938 (11.434)9.726 (26.189)9.680 (26.190)
1 [1–28]1 [1–28]4 [1–124]3.778 [0–212]3.333 [0–212]
332.899 (4.235)2.896 (4.221)2.404 (1.830)3.642 (7.926)3.606 (7.890)
1 [1–39]1 [1–39]2 [1–22]1 [0–99]1 [0–99]
342.931 (3.973)2.914 (3.975)5.890 (2.517)3.667 (4.845)3.644 (4.810)
2 [1–27]2 [1–27]5.500 [2–16]2 [0–20.500]2 [0–20]
354.113 (7.992)4.032 (7.735)5.532 (2.645)6.185 (8.875)6.097 (8.713)
1 [1–42]1 [1–42]5 [1–19]3.417 [0–49]3.417 [0–49]
Table A3. Summary statistics for disciplines Dj. Notations: mean (SD) is in the first row for each discipline, median [min-max] is in the second row for each discipline.
Table A3. Summary statistics for disciplines Dj. Notations: mean (SD) is in the first row for each discipline, median [min-max] is in the second row for each discipline.
DjH1H2H3H4H5
111.761 (2.677)1.761 (2.677)1.718 (2.517)0.443 (0.748)0.207 (0.306)
1 [0–27]1 [0–27]1 [0–24]0.236 [0–6]0.125 [0–3.167]
120.549 (0.905)0.549 (0.905)0.549 (0.905)0.418 (0.725)0.168 (0.328)
0 [0–4]0 [0–4]0 [0–4]0 [0–3]0 [0–1.800]
132.615 (4.329)2.615 (4.329)2.564 (4.020)0.418 (0.666)0.227 (0.295)
1 [0–72]1 [0–72]1 [0–60]0.200 [0–6.479]0.143 [0–2.853]
141.341 (1.653)1.341 (1.653)1.318 (1.623)0.597 (0.812)0.352 (0.388)
1 [0–9]1 [0–9]1 [0–9]0.333 [0–4.333]0.250 [0–1.500]
151.022 (1.282)1.022 (1.282)1 (1.256)0.252 (0.350)0.141 (0.174)
1 [0–7]1 [0–7]1 [0–7]0.167 [0–1.667]0.083 [0–0.750]
162.216 (3.205)2.216 (3.205)2.189 (3.144)0.508 (0.801)0.261 (0.349)
1 [0–21]1 [0–21]1 [0–21]0.250 [0–5.767]0.167 [0–2.250]
171.463 (2.067)1.463 (2.067)1.455 (2.054)0.467 (0.733)0.249 (0.349)
1 [0–16]1 [0–16]1 [0–16]0.250 [0–5.417]0.167 [0–2.917]
182.600 (2.510)2.600 (2.510)2.400 (2.074)1.550 (1.681)0.679 (0.471)
2 [1–7]2 [1–7]2 [1–6]1 [0.500–4.500]0.500 [0.250–1.375]
191.892 (2.660)1.892 (2.660)1.795 (2.360)0.437 (0.642)0.216 (0.275)
1 [0–17]1 [0–17]1 [0–16]0.226 [0–5.083]0.125 [0–1.833]
201.302 (1.802)1.302 (1.802)1.302 (1.802)0.621 (0.927)0.287 (0.375)
1 [0–10]1 [0–10]1 [0–10]0.500 [0–6]0.167 [0–1.800]
210.982 (1.232)0.982 (1.232)0.982 (1.232)0.322 (0.620)0.238 (0.447)
1 [0–8]1 [0–8]1 [0–8]0.200 [0–4]0.125 [0–3]
221.101 (1.769)1.101 (1.769)1.088 (1.742)0.344 (0.588)0.181 (0.291)
1 [0–22]1 [0–22]1 [0–22]0.200 [0–5.333]0.067 [0–2.889]
231.851 (2.658)1.851 (2.658)1.826 (2.587)0.529 (1.096)0.276 (0.464)
1 [0–20]1 [0–20]1 [0–20]0.250 [0–12.417]0.167 [0–5.333]
242.103 (2.172)2.103 (2.172)2.093 (2.161)0.353 (0.419)0.187 (0.177)
1 [0–12]1 [0–12]1 [0–12]0.200 [0–3]0.125 [0–0.852]
251.764 (2.603)1.764 (2.603)1.739 (2.527)0.387 (0.573)0.203 (0.279)
1 [0–17]1 [0–17]1 [0–16]0.200 [0–4.267]0.125 [0–1.834]
261.165 (1.629)1.165 (1.629)1.135 (1.561)0.496 (0.687)0.228 (0.309)
1 [0–7]1 [0–7]1 [0–7]0.333 [0–4]0.167 [0–2]
271.935 (3.177)1.935 (3.177)1.908 (3.059)0.330 (0.559)0.173 (0.248)
1 [0–43]1 [0–43]1 [0–39]0.167 [0–8.367]0.111 [0–4.111]
282.469 (2.667)2.469 (2.667)2.458 (2.655)0.568 (0.737)0.301 (0.305)
1 [0–11]1 [0–11]1 [0–11]0.250 [0–3.667]0.200 [0–1.583]
290.467 (0.834)0.467 (0.834)0.467 (0.834)0.262 (0.766)0.130 (0.312)
0 [0–3]0 [0–3]0 [0–3]0 [0–3]0 [0–1.200]
301.062 (1.444)1.062 (1.444)1.062 (1.444)0.221 (0.303)0.119 (0.168)
1 [0–11]1 [0–11]1 [0–11]0.200 [0–1.667]0.065 [0–1]
312.490 (4.351)2.490 (4.351)2.353 (3.924)0.448 (0.675)0.212 (0.287)
1 [0–42]1 [0–42]1 [0–37]0.250 [0–6.500]0.125 [0–2.033]
321.513 (1.564)1.513 (1.564)1.496 (1.530)0.529 (0.673)0.299 (0.343)
1 [0–7]1 [0–7]1 [0–7]0.333 [0–5]0.200 [0–2.667]
331.188 (1.629)1.188 (1.629)1.179 (1.598)0.649 (0.873)0.343 (0.466)
1 [0–12]1 [0–12]1 [0–12]0.333 [0–5.083]0.179 [0–2.643]
341.190 (1.177)1.190 (1.177)1.190 (1.177)0.282 (0.377)0.153 (0.198)
1 [0–5]1 [0–5]1 [0–5]0.167 [0–1.833]0.094 [0–1]
351.871 (3.144)1.871 (3.144)1.823 (2.945)0.389 (0.645)0.218 (0.344)
1 [0–15]1 [0–15]1 [0–14]0.200 [0–3.351]0.118 [0–2.343]

Appendix C. Additional Results for Disciplines

Table A4. Number of publications Npub and of articles Nart in disciplines Dj. Nobs = No. observations.
Table A4. Number of publications Npub and of articles Nart in disciplines Dj. Nobs = No. observations.
NobsNpubNart%
Discipline 116913541353380.23
Discipline 128215315300.00
Discipline 1386049174868491.00
Discipline 148827127100.00
Discipline 159127227200.00
Discipline 1672644164396200.45
Discipline 172681030102370.68
Discipline 185242400.00
Discipline 193321810180910.06
Discipline 208631031000.00
Discipline 215711711700.00
Discipline 2281132433231120.37
Discipline 232351161115830.26
Discipline 2410737437400.00
Discipline 2544923192309100.43
Discipline 2613363863800.00
Discipline 27356218,45018,2541961.06
Discipline 289650650420.40
Discipline 2915282800.00
Discipline 308020520410.49
Discipline 3163163056277280.44
Discipline 3211533933810.29
Discipline 333571035103410.10
Discipline 345817016910.59
Discipline 356225525051.96
Table A5. Linear regression of Ngro-Nnet on ∆H2 and disciplines Dj. *** = significant at 99.9%. Note that all disciplines apart from D13, D27, and D31 cannot be distinguished statistically from other disciplines.
Table A5. Linear regression of Ngro-Nnet on ∆H2 and disciplines Dj. *** = significant at 99.9%. Note that all disciplines apart from D13, D27, and D31 cannot be distinguished statistically from other disciplines.
EstimateStd. Errort ValueProb(>|t|)Significance
∆H2 = H2 − H31.9973540.03342659.755<2 × 10−16***
Discipline 110.0075020.0352510.2130.831
Discipline 120.0072890.1022440.0710.943
Discipline 130.2019670.0316186.3881.76 × 10−10***
Discipline 14−0.0050150.098700−0.0510.959
Discipline 15−0.0075510.097059−0.0780.938
Discipline 160.0151720.0343740.4410.659
Discipline 170.0165340.0565560.2920.770
Discipline 180.1260190.4141110.3040.761
Discipline 190.0491920.0509150.9660.334
Discipline 200.0337970.0998380.3390.735
Discipline 210.0142540.1226330.1160.907
Discipline 220.0102290.0325140.3150.753
Discipline 230.0366830.0604020.6070.544
Discipline 240.1253320.0895071.4000.161
Discipline 250.0146240.0437020.3350.738
Discipline 260.0281250.0802890.3500.726
Discipline 270.0739940.0155404.7621.95 × 10−06***
Discipline 280.0522340.0944960.5530.580
Discipline 290.0000000.2390560.0001.000
Discipline 300.0142380.1035140.1380.891
Discipline 310.1575780.0371384.2432.23 × 10−05***
Discipline 320.0112300.0863390.1300.897
Discipline 330.0192540.0490020.3930.694
Discipline 340.0232760.1215710.1910.848
Discipline 35−0.0090520.117595−0.0770.939
Table A6. Linear regression of Nnet on Naut, ∆H3, and disciplines Dj. ***, **, and * = significant at 99.9%, 99%, and 95%. Naut = No. co-authors. Note that disciplines D12, D18, D20, D21, D26, D29, D30, D34, and D35 cannot be distinguished statistically from other disciplines.
Table A6. Linear regression of Nnet on Naut, ∆H3, and disciplines Dj. ***, **, and * = significant at 99.9%, 99%, and 95%. Naut = No. co-authors. Note that disciplines D12, D18, D20, D21, D26, D29, D30, D34, and D35 cannot be distinguished statistically from other disciplines.
EstimateStd. Errort ValueProb(>|t|)Significance
Naut0.010510.002075.0773.90 × 10−07***
∆H3 = H3 − H41.612700.0813019.836<2 × 10−16***
Discipline 112.547150.731673.4810.000501***
Discipline 120.858372.103100.4080.683176
Discipline 139.352330.6718713.920<2 × 10−16***
Discipline 146.635202.030943.2670.001090**
Discipline 154.013961.997272.0100.044488*
Discipline 164.615280.719626.4141.49 × 10−10***
Discipline 174.350711.166013.7310.000192***
Discipline 189.757398.517091.1460.251978
Discipline 193.845241.050853.6590.000254***
Discipline 201.610782.054320.7840.433004
Discipline 213.417722.523021.3550.175571
Discipline 221.906710.671392.8400.004521**
Discipline 233.957531.246673.1740.001506**
Discipline 247.423671.846374.0215.85 × 10−05***
Discipline 254.394200.905294.8541.23 × 10−06***
Discipline 261.899821.652141.1500.250206
Discipline 275.409240.3433415.755<2 × 10−16***
Discipline 2810.691291.949635.4844.27 × 10−08***
Discipline 292.262244.917210.4600.645479
Discipline 302.027762.130270.9520.341183
Discipline 311.709730.782432.1850.028901*
Discipline 328.069151.777574.5395.71 × 10−06***
Discipline 332.726591.008822.7030.006889**
Discipline 342.117802.501680.8470.397266
Discipline 353.727562.421361.5390.123726
Table A7. Linear regression of Nnet on Naut, Aaut, ∆H4, and disciplines Dj. *** and ** = significant at 100% and 99%. Naut = No. co-authors, Aaut = 1 for inexpert authors. Note that disciplines D18 and D29 cannot be distinguished statistically from other disciplines.
Table A7. Linear regression of Nnet on Naut, Aaut, ∆H4, and disciplines Dj. *** and ** = significant at 100% and 99%. Naut = No. co-authors, Aaut = 1 for inexpert authors. Note that disciplines D18 and D29 cannot be distinguished statistically from other disciplines.
EstimateStd. Errort ValueProb(>|t|)Significance
Naut0.0179540.0020298.847<2 × 10−16***
Aaut−8.8599960.406800−21.780<2 × 10−16***
∆H4 = H4 − H5−0.4502220.497463−0.9050.365468
Discipline 118.8975950.76577011.619<2 × 10−16***
Discipline 125.9241502.1091082.8090.004982**
Discipline 1317.2241800.69319024.848<2 × 10−16***
Discipline 1412.4003692.0358276.0911.16 × 10−09***
Discipline 1510.3001872.0000735.1502.66 × 10−07***
Discipline 1611.9112980.75483115.780<2 × 10−16***
Discipline 1710.6072811.1869058.937<2 × 10−16***
Discipline 1813.2761128.4743261.5670.117234
Discipline 1910.0338941.0667299.406<2 × 10−16***
Discipline 206.9621992.0612933.3780.000734***
Discipline 219.6190672.5188693.8190.000135***
Discipline 227.7046200.71000510.852<2 × 10−16***
Discipline 2310.6113991.2651518.387<2 × 10−16***
Discipline 2413.9731161.8418047.5873.58 × 10−14***
Discipline 2511.4714570.93296712.296<2 × 10−16***
Discipline 267.2286071.6628724.3471.39 × 10−05***
Discipline 2712.3899920.40118130.884<2 × 10−16***
Discipline 2817.8744251.9492259.170<2 × 10−16***
Discipline 298.5378684.8944411.7440.081120
Discipline 308.3694262.1302823.9298.60 × 10−05***
Discipline 318.5409130.80908610.556<2 × 10−16***
Discipline 3213.5474501.7810057.6073.07 × 10−14***
Discipline 337.9450131.0418807.6262.65 × 10−14***
Discipline 347.8736182.4947813.1560.001604**
Discipline 3510.3620042.4149674.2911.80 × 10−05***
Table A8. Linear regression of Ngro on Nnet, Naut, Aaut, ∆H5, and disciplines Dj. ***, **, and * = significant at 99.9%, 99%, and 95%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors. Note that disciplines D12, D18, D26, and D29 cannot be distinguished statistically from other disciplines.
Table A8. Linear regression of Ngro on Nnet, Naut, Aaut, ∆H5, and disciplines Dj. ***, **, and * = significant at 99.9%, 99%, and 95%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors. Note that disciplines D12, D18, D26, and D29 cannot be distinguished statistically from other disciplines.
EstimateStd. Errort ValueProb(>|t|)Significance
Nnet1.01560530.00058991721.619<2 × 10−16***
Naut0.00103150.00012438.295<2 × 10−16***
Aaut0.19271000.02836806.7931.19 × 10−11***
∆H5 = H1 − H50.15977330.004427736.085<2 × 10−16***
Discipline 11−0.37940020.0509842−7.4421.11 × 10−13***
Discipline 12−0.25880000.2001352−1.2930.196009
Discipline 13−0.43550880.0471928−9.228<2 × 10−16***
Discipline 14−0.40841440.1377048−2.9660.003028**
Discipline 15−0.39781040.1484945−2.6790.007402**
Discipline 16−0.56284390.0509699−11.043<2 × 10−16***
Discipline 17−0.43726830.0803038−5.4455.35 × 10−08***
Discipline 180.00379880.48954160.0080.993809
Discipline 19−0.24168870.0723745−3.3390.000844***
Discipline 20−0.34208270.1457847−2.3460.018978*
Discipline 21−0.36470700.1761518−2.0700.038450*
Discipline 22−0.34093200.0521047−6.5436.44 × 10−11***
Discipline 23−0.43939620.0855634−5.1352.89 × 10−07***
Discipline 24−0.44303990.1136581−3.8989.79 × 10−05***
Discipline 25−0.50872480.0650363−7.8225.94 × 10−15***
Discipline 26−0.24467810.1261528−1.9400.052475
Discipline 27−0.47634610.0286499−16.626<2 × 10−16***
Discipline 28−0.64046460.1211245−5.2881.28 × 10−07***
Discipline 29−0.36544190.4895745−0.7460.455421
Discipline 30−0.36624110.1509943−2.4260.015311*
Discipline 31−0.18077210.0541289−3.3400.000843***
Discipline 32−0.44430980.1151165−3.8600.000115***
Discipline 33−0.31067060.0743621−4.1782.98 × 10−05***
Discipline 34−0.34910000.1713866−2.0370.041695*
Discipline 35−0.44740590.1623079−2.7570.005857**
Table A9. Percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5, and 3 for disciplines Dj. Normal = middle value, Bold = high value, Italics = low value.
Table A9. Percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5, and 3 for disciplines Dj. Normal = middle value, Bold = high value, Italics = low value.
H5 ≥ 1 (0)H5 ≥ 1.5 (00)H5 ≥ 2 (000)H5 ≥ 2.5 (0000)H5 ≥ 3 (000000)
Discipline 110.02141950000.00319497000.00042440900.00005263740.0000006237
Discipline 120.11326100000.03382570000.00894266000.00219338000.0005108530
Discipline 130.02123030000.00309156000.00039965000.00004814750.0000005535
Discipline 140.10649400000.02623240000.00550583000.00104808000.0001867700
Discipline 150.00263204000.00009412380.00000028030.00000000750.0000000001
Discipline 160.05373980000.01246150000.00258991000.00050386300.0000937419
Discipline 170.04818250000.01044300000.00202123000.00036547800.0000631270
Discipline 180.25703700000.08383380000.02238030000.00526728000.0011368700
Discipline 190.02521700000.00392447000.00054122700.00006947630.0000008500
Discipline 200.09942150000.02910840000.00759995000.00184962000.0004288500
Discipline 210.06249730000.01702310000.00422642000.00099180900.0002238670
Discipline 220.03410360000.00646757000.00109837000.00017463600.0000265688
Discipline 230.07223310000.01926800000.00461934000.00103781000.0002230200
Discipline 240.00242750000.00010109100.00000036110.00000001180.0000000003
Discipline 250.02576030000.00414438000.00059313500.00007919110.0000100960
Discipline 260.04513400000.00661945000.00080720400.00008817700.0000008954
Discipline 270.00994794000.00102655000.00009404770.00000080350.0000000655
Discipline 280.03178790000.00456563000.00056212000.00006315780.0000006678
Discipline 290.08983580000.02865420000.00834996000.00230130000.0006097310
Discipline 300.00057307500.00001063340.00000001670.00000000020.0000000000
Discipline 310.04267790000.01136840000.00281474000.00066668100.0001531040
Discipline 320.05762130000.01271170000.00247682000.00044780000.0000769826
Discipline 330.20193800000.08295480000.03075220000.01067680000.0035321700
Discipline 340.00327192000.00016454800.00000071430.00000002840.0000000010
Discipline 350.02581300000.00417177000.00060012600.00008056400.0000103297

Appendix D. Additional Results Based on a Quasi-Poisson Distribution

Table A10. ANOVA on Npub—Nart for disciplines. DF = degree of freedom, *** = significant at 99.9%. Npub = No. publications, Nart = No. articles.
Table A10. ANOVA on Npub—Nart for disciplines. DF = degree of freedom, *** = significant at 99.9%. Npub = No. publications, Nart = No. articles.
DFDeviance% Totp-ValueSignificance
∆H1 = H1 − H2
Disciplines24171.056.093.52 × 10−08***
Residual99722636.9393.91
Table A11. ANOVA on Npub—Nart for subjects. DF = degree of freedom, *** = significant at 99.9%. Npub = No. publications, Nart = No. articles.
Table A11. ANOVA on Npub—Nart for subjects. DF = degree of freedom, *** = significant at 99.9%. Npub = No. publications, Nart = No. articles.
DFDeviance% Totp-ValueSignificance
∆H1 = H1 − H2
Subjects393.903.341.71 × 10−07***
Residual99932714.0896.66
Table A12. ANOVA on Ngro—Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations.
Table A12. ANOVA on Ngro—Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations.
DFDeviance% Totp-ValueSignificance
∆H2 = H2 − H31103,349.2262.602.20 × 10−16***
Disciplines245414.823.282.20 × 10−16***
Residuals997156,328.7634.12
Table A13. ANOVA on Ngro—Nnet for subjects. DF = degree of freedom, *** and ** = significant at 99.9% and 99%. Ngro = No. gross citations, Nnet = No. net citations.
Table A13. ANOVA on Ngro—Nnet for subjects. DF = degree of freedom, *** and ** = significant at 99.9% and 99%. Ngro = No. gross citations, Nnet = No. net citations.
DFDeviance% Totp-ValueSignificance
∆H2 = H2 − H31103,349.2262.602.20 × 10−16***
Subjects3755.670.460.0059**
Residual999260,987.9136.94
Table A14. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
Table A14. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
DFDeviance% Totp-ValueSignificance
Naut16934.911.031.57 × 10−09***
∆H3 = H3 − H41134,670.5620.042.20 × 10−16***
Disciplines2429,260.174.352.20 × 10−16***
Residual9970501,264.0074.58
Table A15. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
Table A15. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
DFDeviance% Totp-ValueSignificance
Naut16934.911.031.57 × 10−09***
∆H3 = H3 − H41134,670.5620.042.20 × 10−16***
Subjects38877.901.322.20 × 10−16***
Residual9991521,646.3077.61
Table A16. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table A16. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFDeviance% Totp-ValueSignificance
Naut16934.911.034.98 × 10−13***
Aaut126,246.133.902.20 × 10−16***
∆H4 = H4 − H5144,367.966.602.20 × 10−16***
Disciplines2466,316.779.872.20 × 10−16***
Residual9969528,263.8878.60
Table A17. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table A17. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFDeviance% Totp-ValueSignificance
Naut16934.911.038.56× 10−12***
Aaut126,246.133.902.20 × 10−16***
∆H4 = H4 − H5144,367.966.602.20 × 10−16***
Subjects326,287.243.912.20 × 10−16***
Residual9990568,293.4184.55
Table A18. ANOVA on Ngro for disciplines. DF = degree of freedom, *** = significant at 99.9%. Ngro = N. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table A18. ANOVA on Ngro for disciplines. DF = degree of freedom, *** = significant at 99.9%. Ngro = N. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFDeviance% Totp-ValueSignificance
Nnet1173,188.8325.562.2 × 10−16***
Naut110,785.581.592.2 × 10−16***
Aaut128,534.061.602.2 × 10−16***
∆H5 = H1 − H51116,157.7219.762.2 × 10−16***
Disciplines2424,249.433.582.2 × 10−16***
Residual7123324,755.0047.92
Table A19. ANOVA on Ngro for subjects. DF = degree of freedom, *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table A19. ANOVA on Ngro for subjects. DF = degree of freedom, *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFDeviance% Totp-ValueSignificance
Nnet1173,188.8325.562.2 × 10−16***
Naut110,785.581.592.2 × 10−16***
Aaut128,534.061.602.2 × 10−16***
∆H5 = H1 − H51116,157.7219.762.2 × 10−16***
Subjects36713.680.992.2 × 10−16***
Residual7144342,290.8050.51
The dependent variable in the model behind Table A10 and Table A11 is the number of publications that are not articles. This count model has been estimated by an over-dispersed Poisson (also known as quasi-Poisson) GLM with the canonical log link function. In particular, Table A10 and Table A11 report the resulting ANOVA deviance tables, where the p-values are computed by likelihood ratio tests [20]. In contrast, the models behind Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 explain an average of different kinds of citations per article (i.e., an average of count data): for example, Ngro, Nnet, or Ngro-Nnet in Table A12 and Table A13. We used an over-dispersed Poisson distribution to model these dependent variables. In particular, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 show ANOVA deviance analysis for weighted GLM regressions based on an over-dispersed Poisson distributional assumptions (also known as weighted quasi-Poisson regressions).
Note that the concept of variance used in Table 4, Table 5, Table 7, Table 8, Table 10, Table 11, Table 14, Table 15, Table 18 and Table 19 is now replaced by the concept of deviance in Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19, where deviance is not simply a sum or average of squared residuals, although the levels of deviances represent a measure of information progressively explained by each factor.
Moreover, the quasi-Poisson regression is robust with respect to the distribution specification as it relies only on an assumption of proportionality between the variance and expectation parameters rather than on the specific distribution shape. In particular, with a proper parameterisation, the widely used negative binomial shows the following property: Var[Y] = E[Y]/p, when Y is negative binomial, and p is its “success probability” parameter [20,35]. Since the use of the hurdle or zero-inflated model would have required the specification of the dependence of its additional parameter on the exogeneous variable in each model [35], we chose to avoid this level of complexity for our statistical models and analyses; the extremely high significance levels obtained seem to support our distributional choice.
Finally, the variance progressively explained by each factor and the residual variance shown in Table 4, Table 5, Table 7, Table 8, Table 10, Table 11, Table 14, Table 15, Table 18 and Table 19 have a similar interpretation of the deviance reported in Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19. In particular, a similar but weaker phenomenon is observed. For example, the deviance associated with ∆H5 in Table A18 and Table A19 on the overall bias is around 5 times the deviance associated with disciplines (i.e., 19.76%/3.58%) and around 20 times the deviance associated with subjects (i.e., 0.99%/19.76%), whereas the analogous ratios computed on variances reported in Table 18 and Table 19 are around 22 times (i.e., 1562/71) and around 111 times (i.e., 1562/14).

References

  1. Zagonari, F. Scientific Production and Productivity for Characterizing an Author’s Publication History: Simple and Nested Gini’s and Hirsch’s Indexes Combined. Publications 2019, 7, 32. [Google Scholar] [CrossRef]
  2. Abramo, G.; D’angelo, C.A.; Zhang, L. A comparison of two approaches for measuring interdisciplinary research output: The disciplinary diversity of authors vs. the disciplinary diversity of the reference list. J. Informetr. 2018, 12, 1182–1193. [Google Scholar] [CrossRef]
  3. Brito, A.C.M.; Silva, F.N.; Amancio, D.R. Analyzing the influence of prolific collaborations on authors’ productivity and visibility. Scientometrics 2023, 128, 2471–2487. [Google Scholar] [CrossRef]
  4. Ghani, R.; Qayyum, F.; Afzal, M.T.; Maurer, H. Comprehensive evaluation of h-index and its extensions in the domain of mathematics. Scientometrics 2019, 118, 809–822. [Google Scholar] [CrossRef]
  5. Fassin, Y. The HF-rating as a universal complement to the H-index. Scientometrics 2020, 125, 965–990. [Google Scholar] [CrossRef]
  6. Usman, M.; Mustafa, G.; Afzal, M.T. Ranking of author assessment parameters using Logistic Regression. Scientometrics 2021, 126, 335–353. [Google Scholar] [CrossRef]
  7. Mingers, J.; Meyer, M. Normalizing Google Scholar data for use in research evaluation. Scientometrics 2017, 112, 1111–1121. [Google Scholar] [CrossRef] [PubMed]
  8. Yuret, T. Author-weighted impact factor and reference return ratio: Can we attain more equality among fields? Scientometrics 2018, 116, 2097–2111. [Google Scholar] [CrossRef]
  9. Zagonari, F. Coping with the inequity and inefficiency of the H-index: A cross-disciplinary analytical model. Publ. Res. Q. 2019, 35, 285–300. [Google Scholar] [CrossRef]
  10. Fassin, Y. The ha-index: The average citation h-index. Quant. Sci. Stud. 2023, 4, 756–777. [Google Scholar] [CrossRef]
  11. Bradshaw, C.J.A.; Chalker, J.M.; Crabtree, S.A.; Eijkelkamp, B.A.; Long, J.A.; Smith, J.R.; Trinajstic, K.; Weisbecker, V. A fairer way to compare researchers at any career stage and in any discipline using open-access citation data. PLoS ONE 2021, 16, e0257141. [Google Scholar] [CrossRef] [PubMed]
  12. Halim, Z.; Khan, S. A data science-based framework to categorize academic journals. Scientometrics 2019, 119, 393–423. [Google Scholar] [CrossRef]
  13. Ibrahim, N.; Habacha Chaibi, A.; Ben Ahmed, M. New scientometric indicator for the qualitative evaluation of scientific production. New Libr. World 2015, 116, 661–676. [Google Scholar] [CrossRef]
  14. Brandão, L.C. A multi-criteria approach to the h-index. Eur. J. Oper. Res. 2019, 276, 357–363. [Google Scholar] [CrossRef]
  15. Bihari, A.; Tripathi, S.; Deepak, A. Iterative weighted EM and iterative weighted EM′-index for scientific assessment of scholars. Scientometrics 2021, 126, 5551–5568. [Google Scholar] [CrossRef]
  16. Kim, E.; Jeong, D.Y. Dominant Characteristics of Subject Categories in a Multiple-Category Hierarchical Scheme: A Case Study of Scopus. Publications 2023, 11, 51. [Google Scholar] [CrossRef]
  17. Draper, N.R.; Smith, H. Applied Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1998; Volume 326. [Google Scholar]
  18. Konar, T. Author-suggested. weighted citation index: A novel approach for determining the contribution of individual researchers. Publications 2021, 9, 30. [Google Scholar] [CrossRef]
  19. Põder, E. What Is Wrong with the Current Evaluative Bibliometrics? Front. Res. Metr. Anal. 2021, 6, 824518. [Google Scholar] [CrossRef] [PubMed]
  20. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman & Hall: London, UK, 1989. [Google Scholar]
  21. Alshdadi, A.A.; Usman, M.; Alassafi, M.O.; Afzal, M.T.; AlGhamdi, R. Formulation of rules for the scientific community using deep learning. Scientometrics 2023, 128, 1825–1852. [Google Scholar] [CrossRef]
  22. Andersen, J.P.; Nielsen, M.W. Google Scholar and Web of Science: Examining gender differences in citation coverage across five scientific disciplines. J. Informetr. 2018, 12, 950–959. [Google Scholar] [CrossRef]
  23. Wildgaard, L. A comparison of 17 author-level bibliometric indicators for researchers in Astronomy. Environmental Science, Philosophy and Public Health in Web of Science and Google Scholar. Scientometrics 2015, 104, 873–906. [Google Scholar] [CrossRef]
  24. Harzing, A.-W.; Alakangas, S.; Adams, D. hIa: An individual annual H-index to accommodate disciplinary and career length difference. Scientometrics 2014, 99, 811–821. [Google Scholar] [CrossRef]
  25. Loan, F.A.; Nasreen, N.; Bashir, B. Do authors play fair or manipulate Google Scholar H-index? Libr. High Tech 2022, 40, 676–684. [Google Scholar] [CrossRef]
  26. Tokmachev, A.M. Hidden scales in statistics of citation indicators. J. Informetr. 2023, 17, 101356. [Google Scholar] [CrossRef]
  27. De Stefano, D.; Zaccarin, S. Co-authorship networks and scientific performance: An empirical analysis using the generalized extreme value distribution. J. Appl. Stat. 2016, 43, 262–279. [Google Scholar] [CrossRef]
  28. Ryan, J.C. A validation of the individual annual h-index (hIa): Application of the hIa to a qualitatively and quantitatively different sample. Scientometrics 2016, 109, 577–590. [Google Scholar] [CrossRef]
  29. Moreira, J.A.G.; Zeng, X.H.T.; Nunes Amaral, L.A. The distribution of the asymptotic number of citations to sets of publications by a researcher or from an academic department are consistent with a discrete lognormal model. PLoS ONE 2015, 10, e0143108. [Google Scholar] [CrossRef] [PubMed]
  30. Kuppler, M. Predicting the future impact of Computer Science researchers: Is there a gender bias? Scientometrics 2022, 127, 6695–6732. [Google Scholar] [CrossRef]
  31. Jin, Y.; Yuan, S.; Shao, Z.; Hall, W.; Tang, J. Turing Award elites revisited: Patterns of productivity. collaboration, authorship and impact. Scientometrics 2021, 126, 2329–2348. [Google Scholar] [CrossRef]
  32. Koltun, V.; Hafner, D. The h-index is no longer an effective correlate of scientific reputation. PLoS ONE 2021, 16, e0253397. [Google Scholar] [CrossRef]
  33. Lykke, M.; Amstrup, L.; Hvidtfeldt, R.; Pedersen, D.B. Mapping research activities and societal impact by taxonomy of indicators: Uniformity and diversity across academic fields. J. Doc. 2023, 79, 1049–1070. [Google Scholar] [CrossRef]
  34. Todeschini, R.; Baccini, A. Handbook of Bibliometric Indicators: Quantitative Tools for Studying and Evaluating Research; Wiley-VCH: Weinheim, Germany, 2016. [Google Scholar]
  35. Ketzler, R.; Zimmermann, K.F. A citation-analysis of economic research institutes. Scientometrics 2013, 95, 1095–1112. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Histograms of H5 for disciplines Dj. Different colors for different disciplines. The numbers of disciplines from 11 to 35 are consistent with the scientific categories used by the Scopus dataset.
Figure 1. Histograms of H5 for disciplines Dj. Different colors for different disciplines. The numbers of disciplines from 11 to 35 are consistent with the scientific categories used by the Scopus dataset.
Publications 12 00012 g001
Figure 2. Gamma Probability Density Functions of H5 for disciplines Dj. Different colors for different disciplines, consistently with colors used in Figure 1.
Figure 2. Gamma Probability Density Functions of H5 for disciplines Dj. Different colors for different disciplines, consistently with colors used in Figure 1.
Publications 12 00012 g002
Figure 3. Histograms of H5 for subjects Sk. Pink = Health, Green = Life, Blue = Physical, Purple = Social.
Figure 3. Histograms of H5 for subjects Sk. Pink = Health, Green = Life, Blue = Physical, Purple = Social.
Publications 12 00012 g003
Figure 4. Gamma Probability Density Functions of H5 for subjects Sk. Pink = Health, Green = Life, Blue = Physical, Purple = Social.
Figure 4. Gamma Probability Density Functions of H5 for subjects Sk. Pink = Health, Green = Life, Blue = Physical, Purple = Social.
Publications 12 00012 g004
Table 1. Description of acronyms and variables.
Table 1. Description of acronyms and variables.
AcronymsDescription
IfaInefficiency a
IfbInefficiency b
IqaInequity a
IqbInequity b
Variables
NpubNumber of publications
NartNumber of articles
NgroNumber of citations including co-author’s citations
NnetNumber of citations excluding co-author’s citations
NautMean number of co-authors
H1H-index based on publications
H2H-index based on articles
H3H-index based on net citations per article
H4Net per-capita H-index based on articles
H5Net per-capita per-year H-index based on articles
∆Hh = Hh − Hh+1H-index improvement, h = 1, 2, 3, 4, 5
DjDummy variable for discipline j (j = 1,…, 27)
(i.e., Dj takes value 1 for a discipline j and 0 for disciplines other than j)
SkDummy variable for subject k (k = 1, 2, 3, 4)
(i.e., Sk takes value 1 for a subject k and 0 for subjects other than k)
AautDummy variable for age
(Aaut = 1 if author’s first publication is after 2009, otherwise Aaut = 0)
Table 2. Summary statistics on independent variables for subjects Sk. Notations: mean (SD) is in the first row for each subject, median [min–max] is in the second row for each subject; in columns, Npub = No. publications, Nart = No. articles, Naut = No. co-authors, Ngro = No. gross citations, Nnet = No. net citations.
Table 2. Summary statistics on independent variables for subjects Sk. Notations: mean (SD) is in the first row for each subject, median [min–max] is in the second row for each subject; in columns, Npub = No. publications, Nart = No. articles, Naut = No. co-authors, Ngro = No. gross citations, Nnet = No. net citations.
NpubNartNautNgroNnet
Health5.112 (14.302)5.057 (14.032)12.875 (74.372)8.091 (24.841)7.966 (24.302)
1 [1–444]1 [1–435]6.400 [1–2060]2.333 [0–988]2.333 [0–965]
Life5.203 (11.979)5.171 (11.901)7.904 (15.337)9.480 (21.171)9.289 (20.764)
2 [1–206]2 [1–206]6 [1–461]3 [0–358]3 [0–358]
Physical5.709 (17.865)5.687 (17.835)15.874 (135.262)5.646 (13.910)5.510 (13.720)
2 [1–468]2 [1–468]4.500 [1–2837.009]1.667 [0–256]1.600 [0–256]
Social2.909 (4.246)2.906 (4.237)3.061 (9.622)4.775 (14.610)4.737 (14.590)
1 [1–39]1 [1–39]2 [1–228]1 [0–212]1 [0–212]
Table 3. Summary statistics on all H-indexes for subjects Sk. Notations: mean (SD) is in the first row for each subject, median [min–max] is in the second row for each subject.
Table 3. Summary statistics on all H-indexes for subjects Sk. Notations: mean (SD) is in the first row for each subject, median [min–max] is in the second row for each subject.
H1H2H3H4H5
Health1.916 (3.151)1.916 (3.151)1.889 (3.033)0.329 (0.559)0.173 (0.249)
1 [0–43]1 [0–43]1 [0–39]0.167 [0–8.367]0.111 [0–4.111]
Life2.188 (3.525)2.188 (3.525)2.147 (3.300)0.423 (0.681)0.216 (0.291)
1 [0–72]1 [0–72]1 [0–60]0.200 [0–6.479]0.143 [0–3.167]
Physical1.774 (2.913)1.774 (2.913)1.728 (2.747)0.430 (0.705)0.220 (0.320)
1 [0–42]1 [0–42]1 [0–37]0.225 [0–12.417]0.125 [0–5.333]
Social1.209 (1.606)1.209 (1.606)1.198 (1.576)0.601 (0.839)0.313 (0.419)
1 [0–12]1 [0–12]1 [0–12]0.333 [0–6]0.167 [0–2.667]
Table 4. ANOVA on Npub—Nart for disciplines. DF = degree of freedom, ** = significant at 99%. Npub = No. publications, Nart = No. articles.
Table 4. ANOVA on Npub—Nart for disciplines. DF = degree of freedom, ** = significant at 99%. Npub = No. publications, Nart = No. articles.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
∆H1 = H1 − H2
Disciplines244.670.470.1944731.95410.003504**
Residuals9972992.4399.530.099521
Table 5. ANOVA on Npub—Nart for subjects. DF = degree of freedom, *** = significant at 99.9%. Npub = No. publications, Nart = No. articles.
Table 5. ANOVA on Npub—Nart for subjects. DF = degree of freedom, *** = significant at 99.9%. Npub = No. publications, Nart = No. articles.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
∆H1 = H1 − H2
Subjects32.860.290.952319.57162.621 × 10−06***
Residuals9993994.2499.710.09949
Table 6. Npub and Nart in subjects Sk. Nobs = No. observations, Npub = No. publications, Nart = No. articles.
Table 6. Npub and Nart in subjects Sk. Nobs = No. observations, Npub = No. publications, Nart = No. articles.
NobsNpubNart%
Health369818,90418,7022021.07
Life183495439483600.63
Physical373321,31121,230810.38
Social7332132213020.09
Table 7. ANOVA on Ngro—Nnet for disciplines. DF = degree of freedom, *** and * = significant at 99.9% and 95%. Ngro = No. gross citations, Nnet = No. net citations.
Table 7. ANOVA on Ngro—Nnet for disciplines. DF = degree of freedom, *** and * = significant at 99.9% and 95%. Ngro = No. gross citations, Nnet = No. net citations.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
∆H2 = H2 − H313140.626.793140.633663.7511<2 × 10−16***
Disciplines2434.00.291.421.65470.02316*
Residuals99718547.372.920.86
Table 8. ANOVA on Ngro—Nnet for subjects. DF = degree of freedom, *** and * = significant at 99.9% and 95%. Ngro = No. gross citations, Nnet = No. net citations.
Table 8. ANOVA on Ngro—Nnet for subjects. DF = degree of freedom, *** and * = significant at 99.9% and 95%. Ngro = No. gross citations, Nnet = No. net citations.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
∆H2 = H2 − H313140.626.793140.633659.968<2 × 10−16***
Subjects37.20.062.402.7920.0389*
Residuals99928574.273.150.86
Table 9. Linear regression of Ngro—Nnet on ∆H2 and subjects Sk. *** and ** = significant at 99.9% and 99%. Ngro = No. gross citations, Nnet = No. net citations.
Table 9. Linear regression of Ngro—Nnet on ∆H2 and subjects Sk. *** and ** = significant at 99.9% and 99%. Ngro = No. gross citations, Nnet = No. net citations.
EstimateStd. Errort Valuep-ValueSignificance
∆H2 = H2 − H32.009980.0332560.442<2 × 10−16***
Health0.071160.015264.6633.16 × 10−06***
Life0.107680.021674.9686.88 × 10−07***
Physical0.041890.015242.7480.006**
Social0.016040.034220.4690.639
Table 10. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
Table 10. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
Naut125,3950.6625,39570.0203<2.2 × 10−16***
∆H3 = H3 − H41168,7774.38168,777465.3608<2.2 × 10−16***
Disciplines2446,9071.2219545.38892.728 × 10−16***
Residuals99703,615,92793.75363
Table 11. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
Table 11. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
Naut125,3950.6625,39569.551<2.2 × 10−16***
∆H3 = H3 − H41168,7774.38168,777462.241<2.2 × 10−16***
Subjects314,8370.38494613.5458.129 × 10−09***
Residuals99913,647,99794.58365
Table 12. Linear regression of Nnet on Naut, ∆H3 and subjects Sk. *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
Table 12. Linear regression of Nnet on Naut, ∆H3 and subjects Sk. *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors.
EstimateStd. Errort Valuep-ValueSignificance
Naut0.0097050.0020544.7252.33 × 10−06***
∆H3 = H3 − H41.6808210.08085920.787<2 × 10−16***
Health5.2207200.33796015.448<2 × 10−16***
Life6.3153270.46689913.526<2 × 10−16***
Physical3.1749120.3296909.630<2 × 10−16***
Social3.7038910.7073835.2361.67 × 10−07***
Table 13. Differences between subjects Sk (below diagonal) and related p-values (above diagonal).
Table 13. Differences between subjects Sk (below diagonal) and related p-values (above diagonal).
HealthLifePhysicalSocial
Health-0.04504.1 × 10−060.0508
Life1.0946-9.3 × 10−090.0019
Physical−2.0458−3.1404-0.4944
Social−1.5168−2.6114−0.529-
Table 14. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table 14. ANOVA on Nnet for disciplines. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
Naut125,3950.6625,39570.9415<2.2 × 10−16***
Aaut1172,8924.48172,892482.9768<2.2 × 10−16***
∆H4 = H4 − H5117,8690.4617,86949.91751.711 × 10−12***
Disciplines2472,2371.8730108.4082<2.2 × 10−16***
Residuals99693,568,61392.52358
Table 15. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table 15. ANOVA on Nnet for subjects. DF = degree of freedom, *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
Naut125,3950.6625,39570.158<2.2 × 10−16***
Aaut1172,8924.48172,892477.644<2.2 × 10−16***
∆H4 = H4 − H5117,8690.4617,86949.3662.262 × 10−12***
Subjects324,7940.64826522.8331.002 × 10−14***
Residuals99903,616,05693.75362
Table 16. Linear regression of Nnet on Naut, Aaut, ∆H4 and subjects Sk. *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table 16. Linear regression of Nnet on Naut, Aaut, ∆H4 and subjects Sk. *** = significant at 99.9%. Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
EstimateStd. Errort Valuep-ValueSignificance
Naut0.0176340.0020158.752<2 × 10−16***
Aaut−8.8442210.408641−21.643<2 × 10−16***
∆H4 = H4 − H5−0.4585330.498655−0.9200.358
Health12.2669760.39849530.783<2 × 10−16***
Life13.5416540.51247726.424<2 × 10−16***
Physical9.7901190.40911523.930<2 × 10−16***
Social9.0496640.75711011.953<2 × 10−16***
Table 17. Differences between subjects Sk (below diagonal) and related p-values (above diagonal).
Table 17. Differences between subjects Sk (below diagonal) and related p-values (above diagonal).
HealthLifePhysicalSocial
Health-0.01912.2 × 10−083.1 × 10−05
Life1.2747-5.1 × 10−126.9 × 10−08
Physical−2.4769−3.7515-0.336
Social−3.2173−4.4920−0.7405-
Table 18. ANOVA on Ngro for disciplines. DF = degree of freedom, *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table 18. ANOVA on Ngro for disciplines. DF = degree of freedom, *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
Nnet13,790,91199.723,790,9113.1661 × 10+06<2.2 × 10−16***
Naut13020.013022.5244 × 10+02<2.2 × 10−16***
Aaut1190.00191.5815 × 10+017.053 × 10−05***
∆H5 = H1 − H5115620.0415621.3042 × 10+03<2.2 × 10−16***
Disciplines24710.0032.4638 × 10+008.895 × 10−05***
Residuals712385290.221
Table 19. ANOVA on Ngro for subjects. DF = degree of freedom, *** and * = significant at 99.9% and 95%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table 19. ANOVA on Ngro for subjects. DF = degree of freedom, *** and * = significant at 99.9% and 95%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
DFSum Squares% TotMean SquaresF Valuep-ValueSignificance
Nnet13,790,91199.723,790,9113.1542 × 10+06<2.2 × 10−16 ***
Naut13020.013022.5149 × 10+02<2.2 × 10−16 ***
Aaut1190.00191.5756 × 10+017.277 × 10−05 ***
∆H5 = H1 − H5115620.0415621.2993 × 10+03<2.2 × 10−16 ***
Subjects3140.0053.7473 × 10+000.01053*
Residuals714485860.231
Table 20. Linear regression of Ngro on Nnet, Naut, Aaut, ∆H5, and subjects Sk. *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
Table 20. Linear regression of Ngro on Nnet, Naut, Aaut, ∆H5, and subjects Sk. *** = significant at 99.9%. Ngro = No. gross citations, Nnet = No. net citations, Naut = No. co-authors, Aaut = 1 for inexpert authors.
EstimateStd. Errort Valuep-ValueSignificance
Nnet1.01533040.00058701729.830<2 × 10−16***
Naut0.00113050.00012299.200<2 × 10−16***
Aaut0.18084380.02828576.3931.72 × 10−10***
∆H5 = H1 − H50.15896540.004403836.097<2 × 10−16***
Health−0.46549630.0282006−16.507<2 × 10−16***
Life−0.41648530.0346145−12.032<2 × 10−16***
Physical−0.37471200.0275985−13.577<2 × 10−16***
Social−0.34033570.0523529−6.5018.53 × 10−11***
Table 21. Percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5 and 3 for subjects Sk.
Table 21. Percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5 and 3 for subjects Sk.
H5 ≥ 1H5 ≥ 1.5H5 ≥ 2H5 ≥ 2.5H5 ≥ 3
Health0.010122500.001052780.000097200.000000830.00000006
Life0.018991600.002605410.000316830.000035880.00000038
Physical0.042206600.009226970.001821360.000338140.00006023
Social0.142366000.049028400.015103800.004340770.00118783
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zagonari, F.; Foschi, P. Coping with the Inequity and Inefficiency of the H-Index: A Cross-Disciplinary Empirical Analysis. Publications 2024, 12, 12. https://doi.org/10.3390/publications12020012

AMA Style

Zagonari F, Foschi P. Coping with the Inequity and Inefficiency of the H-Index: A Cross-Disciplinary Empirical Analysis. Publications. 2024; 12(2):12. https://doi.org/10.3390/publications12020012

Chicago/Turabian Style

Zagonari, Fabio, and Paolo Foschi. 2024. "Coping with the Inequity and Inefficiency of the H-Index: A Cross-Disciplinary Empirical Analysis" Publications 12, no. 2: 12. https://doi.org/10.3390/publications12020012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop