**5. Comments**

Theorem 1 shows that for 0 ≤ *γ* < 1, the tail of the corresponding distribution is not heavy. Namely, the distribution has finite moments of all positive orders. However, the tail becomes heavier with growing *γ* ∈ [0, <sup>1</sup>). In the case of *γ* ∈ [0, 1] the distribution is unimodal with mode equal to 1. For the values *γ* ∈ [1, <sup>∞</sup>), the distribution has a power-type tail, which is heavier than the ones occurring for *γ* ∈ [0, <sup>1</sup>). In the case *γ* ∈ [1, 2) the conditional distribution under condition *X* < ∞ does not have the finite mean. However, for growing values of *γ* ∈ [1, ∞) the tails of conditional distributions look to be less heavy. In the case of *γ* ∈ [1, ∞) the conditional distribution has mode at 1.

#### **6. The Case of Growing** *pn*

Above, we considered the case of the probability of event *A* decreasing with increasing iment number. For completeness, consider the case of an increase of this probability.

Namely, suppose that in (1) *pn* = 1 − *q*/*n<sup>γ</sup>* for *q* ∈ (0, 1) and *γ* > 0. Then

$$\mathbb{P}\{X=n\} = (1-q/n^{\gamma})\prod\_{k=1}^{n-1} \frac{q}{k^{\gamma}} = \frac{q^{n-1}}{((n-1)!)^{\gamma}} - \frac{q^n}{(n!)^{\gamma}}.\tag{24}$$

It is clear that IP{*X* = ∞} = 0, and the tail of the distribution

$$T\_m = \frac{q^{m-1}}{(\Gamma(m))^\gamma}$$

is a quickly decreasing function of *m*. Of course, distribution of *X* has finite moments of all orders and it may have a mode not only at 1.

#### **7. Back to the Distribution of Citation Number of One Author**

We suppose now that the distribution of citation number of one paper has the form (5):

$$\mathbb{P}\{X=n\} = \frac{p}{n^{\gamma}} \cdot \prod\_{k=1}^{n-1} (1 - \frac{p}{k^{\gamma}}), \quad n = 1, 2, \dots$$

with *γ* > 0. Corresponding probability generating function is

$$\mathcal{P}(z) = \sum\_{n=1}^{\infty} z^n \mathbb{P}\{X = n\}. \tag{25}$$

As was mentioned above, the number of cited paper is distributed according to geometric law with probability generating function (1):

$$Q(z) = \frac{q}{1 - (1 - q)z}, \quad q \in (0, 1).$$

The probability generating function of citation number of one author equals to the composition of P and *Q*, i.e., it is P(*Q*(*z*)). It is clear that the tail of corresponding distribution is not heavy for *γ* ∈ [0, <sup>1</sup>), it is heavy for *γ* = 1, and the distribution is improper for *γ* > 1.

Although the case of improper distribution seems to be not realistic, we discuss it for some particular cases below, after consideration of proper cases *γ* ∈ [0, 1].

Let us remind that the case *γ* ∈ (0, 1) leads to the light tailed distributions while *γ* = 1 leads to the laws with the heavy tail. The choice between models with light or heavy tails can only be made based on real data. Below we analyze some data of this kind.

#### *7.1. Analyzing Data from Scholar Google "Mathematics"*

Let us give the data for the part "Mathematics" on 16 February 2020 (see Table 1). The data given concern are the first 10 in the number of citations of authors. We do not give the names of these scientists. The table shows:



**Table 1.** Citations "Mathematics".

Table 1 shows the first scientist has 2.76 times more citations than the second. In other words, the maximum of the observations is essentially greater than previous one. This observation leads us to think that the corresponding distribution has heavy tails (see [8,9]). As we have seen, it is possible for the case *γ* = 1 only.

#### *7.2. Analyzing Data from Scholar Google "Biostatistics"*

Let us give the data for the part "Biostatistics" on 16 February 2020 (see Table 2). The structure of Table 2 is the same as that of Table 1.


**Table 2.** Citations "Biostatistics".

Table 2 shows the first scientist has 1.59 times more citations than the second. Although it is it is less than the case of Table 1, the number is large enough to support our hypothesis on the presence of a heavy tail.

We do not give the data on the part "Statistics" but mention the situation is similar to that of the Tables 1 and 2.

#### *7.3. Final Model for the Distribution of Citations*

From the considerations of the two previous subsections, it follows that the most natural way to describe the distribution of citations is to choose *γ* = 1. This means

$$\mathcal{P}(z) = 1 - (1 - z)^p, \quad \mathcal{Q}(z) = \frac{q}{1 - (1 - q)z}.$$

and the probability generation function of citations distribution is given by

$$\mathcal{R}(z) = \mathcal{P}(\mathcal{Q}(z)) = 1 - \left(1 - \frac{q}{1 - (1 - q)z}\right)^p.$$

Denote by *Y* the number of citations of a given scientist. It is clear that IP{*Y* = *n*} may be found as the *n*-th coefficient of expansion R(*z*) in power series. We have

$$\begin{aligned} \mathcal{R}(z) &= 1 - (1 - q)^p (1 - z)^p \left( 1 - (1 - q)z \right)^{-p} \\ &= 1 - (1 - q)^p \sum\_{s = 0}^{\infty} (-1)^s \left( \sum\_{m = 0}^{s} \binom{-p}{m} \binom{p}{s - m} (1 - q)^m \right) z^s \\ &= 1 - (1 - q)^p + \sum\_{s = 1}^{\infty} (-1)^{s + 1} \binom{p}{s} \,\_2F\_1(p \,\_s - s, 1 + p - s, 1 - q) z^s, \end{aligned}$$

where 2*F*1 is a hypergeometric function. Therefore,

$$\begin{aligned} \mathbb{P}\{Y=0\} &= 1 - (1-q)^p; \\ \mathbb{P}\{Y=s\} &= (-1)^{s+1} \binom{p}{s} \,\_2F\_1(p,-s, 1+p-s, 1-q), \quad s = 1, 2, \dots \end{aligned} \tag{26}$$

It is possible to verify that IP{*Y* = 0} > IP{*Y* = 0} > IP{*Y* = *s*} for all integers *s* ≥ 2. Therefore, we meet a scientist without papers or with citing papers with maximal probability. If we limit ourselves by consideration of the scientists having at least one citation then the highest probability corresponds to authors with one citation.

The Laplace transform of the distribution of *Y* has form

$$\mathcal{R}(e^{-t}) = 1 - \left(1 - \frac{q}{1 - (1 - q)e^{-t}}\right)^p, \quad t \in [0, \infty).$$

Its asymptotic as *t* → 0 is

$$1 - \mathcal{R}(e^{-t}) \sim \left(\frac{1 - q}{q}\right)^p \cdot t^p, \quad \text{as} \quad t \to +0. \tag{27}$$

This relation shows that the random variable *Y* has moments of order less than *p* and does not have moments of higher order. Because *p* < 1 the variable *Y* has infinite mean. In practice, this means that some scholars have a very large number of citations. These citations refer to publications by a relatively small number of scholars. Of course, the data in Tables 1 and 2 are in agreemen<sup>t</sup> with these statements. It is important that the model is built on the assumption of the same capabilities of scientists. Even so, we must observe a greater variability in the number of citations of their publications. Thus, the difference in the number of citations can be purely random and not say anything about the real contribution of the scientist into corresponding science field.

Of course, the proposed model is very idealistic, since it does not take into account the real difference in the capabilities of scientists, as well as in their equipping with the necessary tools and equipment. Taking into account the noted differences is likely to lead to the need to consider mixtures of the proposed distributions with different parameters *p* and *q*. However, such a complication will not make it possible to distinguish scientists with a large contribution to science from those with a smaller impact.

Surely, the arguments presented for the choice of *γ* = 1 are rather crude, i.e., in reality, it may happen that *γ* is close to unity. Although in this case, the distribution tail is not heavy, but over a very large (but finite) interval it is close to heavy. So, qualitatively, our conclusions will remain unchanged.

Based on the foregoing, we conclude that it is practically senseless to use the number of citations of a scientist's work to assess his contribution to science.

#### *7.4. Remarks on the Model with γ* > 1

In this subsection, we are trying to justify the possibility of using models with gamma greater than one. As already noted, in this model the probability IP{*Y* = ∞} is not equal to zero. It is unlikely that this corresponds to the situation with the consideration of all scientists working in this field of science. However, a very long citation process (ideally, endless) is quite possible in the case of the most prominent scientists. For example, in the field of Mathematics, the works of Professor Andrei Nikolaevich Kolmogorov (1903–1987) continue to be cited. Over the past 15 years, they have been cited about 30,000 times, although more than 30 years have passed since the death of their author. It is highly probable that the citation process for these works will continue for a long time.

In addition, the concept of citation is somewhat arbitrary in our opinion. For example, in Mathematics, some theorems or other objects bear the names of scientists who were related to their preparation. Does the mention of these theorems and the corresponding names in some articles mean their citation? For example, many articles and books mention the Gaussian distribution without reference to the corresponding publication by Gauss. Is this mention a quotation? It seems to us that such kind of nominal results are not counted in determining the citation index. However, they certainly indicate the scientific significance of the result. It is very likely that for accounting for citations of this kind, models with a *γ* greater than 1 may be required.

## **8. Hirsch Index**

Recall that the definition of the Hirsch index was given on Page 1. Hirsch states that the proposed index *h* is intended to rank authors of articles in the field of Physics. At the same time, it is noted that the index can be used in other fields of science. Since the number of citations is used in determining the index *h*, it seems plausible that *h* is associated with this number. Hirsch notes that the number of citations is given by *N* = *κh*2. He wrote: "I find empirically that *κ* ranges between 3 and 5" (We change notations of Hirsch. Namely, his *a* is our *κ*.). Further, Hirsch wrote: "*κ* > 5 is very atypical value".

Below we show that the Hirsch statements presented here are doubtful. In addition, the use of this index seems unreasonable.

Let's start by analyzing the data in Tables 1 and 2. Remind that the column 5 gives corresponding values of *κ*. Table 1 does not contain any *κ* ≤ 6 while Table 2 has only one such value *κ* = 4.69. Other values of *κ* are "very atypical", especially for Table 1. Table 2 contains 2 values of *κ* ∈ (5, <sup>6</sup>). Therefore, at least for such fields as "Mathematics" and "Biostatistics", Hirsch's conclusion about the "typical" form of proportionality between the number of citations of an author and the square of corresponding Hirsch's index seems to be incorrect. However, was Hirsch right in the field of "Physics"?

#### *8.1. Data in "Physics"*

Now we give the data on field "Physics", arranging them into a table in the same way as for Table 1.

Again, Table 3 has only one *κ* ≤ 5, namely *κ* = 4.88. However, there are six values *κ* ∈ (5, <sup>6</sup>). The kappa values for the "Physics" area look smaller than for the "Biostatistics" area and significantly smaller than for the "Mathematics" area. The value of the Hirsch index for Physics has much less variability than for Biostatistics and Mathematics. The differences in citation numbers are much greater for Mathematics than in the case of Physics.

So, we see that Hirsch's understanding of the situation in Physics is closer to reality than in the case of Biostatistics and, especially, Mathematics.

## *8.2. Data Comparison*

Continue the analysis of the data in Tables 1–3.


**Table 3.** Citations "Physics".

The average value of the Hirsch index in the case of Table 1 is 99.3 with a standard deviation of 66.45. The same indicators for Table 2 are 153.8 and 47.97, and for Table 3—198.2 and 21.73. We see that the standard deviation of the Hirsch index in the case of Mathematics is three times greater than in the case of Physics. On the contrary, the average value of the index is maximum in the case of Physics and minimum in the case of Mathematics. This shows that if Hirsch index is useful in the field of Physics, then its usefulness in the field of Mathematics is doubtful. Probably, it is true for Biostatistics too.

Authors with a higher Hirsch index are often inferior to others in the number of citations of the most popular works. For example, in Table 1, Author 1, having the highest Hirsch index, is inferior to Authors 2, 4, 5, 6 and 7 in the number of citations of the most popular work. In this case, Author 1 wrote his most cited work with co-authors, while author 2 did without co-authors.

It is clear that the Hirsch index does not exceed the number of cited publications of the author, which has an exponential distribution. Thus, the distribution of the Hirsch index has a light tail. Since the number of citations has a heavy tail, it is more variable than the Hirsch index. However, these two indicators are stochastically strongly related. Indeed, for the data in Table 1, the sample correlation coefficient between these indicators is *ρ*1 = 0.94. On the other hand, the correlation coefficient between the Hirsch index and the number of citations of the most popular works is *ρ*2 = −0.23. This coefficient indicates a small relationship between the indicators, and it is negative. In other words, a large Hirsch index is most likely not found among authors with highly cited individual articles. For Table 2, the values of the correlation coefficients equal to *ρ*1 = 0.702, *ρ*2 = 0, and for Table 3 *ρ*1 = 0.36, *ρ*2 = −0.57.

The increase in the Hirsch index with a decrease in the number of citations of the most popular work may result in the division of the work into a series of publications. However, when assessing the quality of a scientist's contribution, one should take into account that the publication of a series of articles instead of one may be caused not by a desire to increase the number of publications, but, for example, by a gradual insight into the essence of the problem under consideration. Such insight often requires a very long time, i.e., publication of a series of articles is justified. It should be noted that the publication of a series of articles naturally leads to an increase in the number of self-citations. This increase cannot be considered as a flaw of the author and does not mean attempts to artificially increase the number of citations. At the same time, the presence of a series of publications (which increases the Hirsch index) cannot be considered as preferable to one highly cited work.

The presence of higher values of the Hirsch index in Physics compared to Mathematics can be explained by the use in modern Physics of expensive equipment in experimental Physics and/or the results obtained on it in theoretical Physics. Often this equipment is used by some laboratory or scientific group, and then transferred to another or others. After some time, this equipment again becomes available to the first group. Thus, new experimental facts arrive intermittently, and during the break they are processed and published. A theoretical analysis of the observed facts is also taking place. Then comes new information related to new experiments. Therefore, the very flow of information (both experimental and theoretical) contributes to the publication of not a single article, but a series of articles. This circumstance leads to an increase in the Hirsch index with a relative decrease in the number of citations of popular works.

A similar situation is absent in Pure Mathematics. Therefore, there the appearance of the series has much fewer reasons. Separate works appear, which often cover a substantial part of the problem under consideration. They cause a stream of citation of this particular work, and in a series of works. Thus, the Hirsch index becomes smaller than it would be if a series of articles were published instead of this one, but the most popular work causes more citations than each individual work in the series.

So, the use of the Hirsch's index has some basis in the field of Physics, but it is not related to what is happening in Mathematics.

For some areas of Applied Mathematics, a situation may be observed that is intermediate between what is happening in Physics and in Pure Mathematics.

However, it is not clear to us why not replace the Hirsch index with two. The first of these could be the number of all citations, and the second - the number of citations of the most popular work. The Hirsch index is stochastically quite closely linked to the number of all citations, so it and this number are "interchangeable". However, after the termination of the work of a scientist in a given field of science, the number of his publications does not increase and, therefore, the Hirsch index remains limited, while the number of citations can continue to grow unlimitedly. This is exactly what happens with the works of the most outstanding scientists of the past.

#### **9. Distribution of the Hirsch Index**

In this section, we obtain the probability distribution of the Hirsch index.

We introduce some notation. It is clear that the Hirsch index is a random variable. Let us denote it by *H*. We will denote the values of this *H* by *h*. Our aim here is to determine the probabilities that *H* = *h*, i.e., IP{*H* = *h*}. In order for the event *H* = *h* to occur, it is necessary and sufficient that:


Suppose that *l* works are published, and *l* ≥ *h*. The probability of this event is *q*(<sup>1</sup> − *<sup>q</sup>*)*<sup>l</sup>*. Recall, the probability that a published work will be quoted *k* times equals to (*p*/*k*) ∏*<sup>k</sup>*−<sup>1</sup> *j*=1 (1 − *p*/*j*). Therefore, the probability that the published work will be cited at least *h* times equals to

$$\sum\_{k=h}^{\infty} \frac{p}{k} \cdot \prod\_{j=1}^{k-1} (1 - p/j) = \frac{\Gamma(h - p)}{\Gamma(h) \cdot \Gamma(1 - p)} \gamma$$

where Γ is Euler gamma function.

> The probability that a published work will be cited less than *h* times is defined as

$$1 - \frac{\Gamma(h - p)}{\Gamma(h) \cdot \Gamma(1 - p)}.$$

Thus, the probability that *l* papers are published, and the Hirsch index *H* has taken the value *h* is

$$q(1-q)^l \binom{l}{h} \cdot \left(\frac{\Gamma(h-p)}{\Gamma(h)\cdot\Gamma(1-p)}\right)^h \cdot \left(1 - \frac{\Gamma(h-p)}{\Gamma(h)\cdot\Gamma(1-p)}\right)^{l-h}.$$

Now we see that

$$\mathbb{P}\{H=h\} = \sum\_{l=h}^{\infty} q(1-q)^l \binom{l}{h} \cdot \left(\frac{\Gamma(h-p)}{\Gamma(h)\cdot\Gamma(1-p)}\right)^h \cdot \left(1 - \frac{\Gamma(h-p)}{\Gamma(h)\cdot\Gamma(1-p)}\right)^{l-h}$$

$$= \left(\frac{\Gamma(h-p)}{\Gamma(h)\cdot\Gamma(1-p) - \Gamma(h-p)}\right)^h \cdot q \cdot \frac{\mu^h}{(1-\mu)^{h+1}},$$

where

$$\mu = \left(1 - \frac{\Gamma(h - p)}{\Gamma(h) \cdot \Gamma(1 - p)}\right) \cdot (1 - q).$$

So, the random variable *H* has the following distribution

$$\mathbb{P}\{H=h\} = (1-\nu)\cdot\nu^h{}\_{\nu}$$

where

$$\nu = \frac{(1-q)\Gamma(h-p)}{q\Gamma(h)\Gamma(1-p) + (1-q)\Gamma(h-p)}.$$

Note that this distribution is not geometric one because the value of *ν* depends on *h*.

Next, we are interested in estimating the tail of the distribution of *H*. To do this, we estimate the asymptotic behavior of the *ν*. An application of the Stirling formula allows one to easily obtain that

$$\nu = \nu(h) \sim \frac{1-q}{q\Gamma(1-p)} \cdot \frac{1}{h^p} \dots$$

This formula immediately leads us to an asymptotic expression for the logarithm of probability IP{*H* = *h*} for *h* → ∞. Namely,

$$
\log \mathbb{P}\{H = h\} \sim p \cdot h \cdot \log h, \quad h \to \infty.
$$

It follows that the probability of the event {*H* = *h*} decreases faster than the exponential function for *n* → ∞. Of course, the tail of the distribution of *H* also decreases faster than the exponential function. Therefore, there are moments of all orders of this distribution. Note that the distribution of the number of citations of articles by this author has an infinite mean value. So, if an author has a fairly large number of citations, then the ratio of the number of citations to the square of the Hirsch index can be arbitrarily large. This fact contradicts Hirsch's claim that *κ* is bounded.

**Author Contributions:** Conceptualization, L.B.K.; investigation, L.B.K., Y.V.K. and Z.E.V. The authors have equally contributed to the writing, editing and style of the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study was partially supported by gran<sup>t</sup> GACR 19-04412S (Lev Klebanov). ˇ

**Conflicts of Interest:** The authors declare no conflict of interest.
