**An Investigation on the Prime and Twin Prime Number Functions by Periodical Binary Sequences and Symmetrical Runs in a Modified Sieve Procedure**

## **Bruno Aiazzi** *∗***, Stefano Baronti, Leonardo Santurri and Massimo Selva**

Institute of Applied Physics "Nello Carrara", IFAC-CNR, Research Area of Florence, 50019 Sesto Fiorentino, Italy; s.baronti@ifac.cnr.it (S.B.); l.santurri@ifac.cnr.it (L.S.); m.selva@ifac.cnr.it (M.S.) **\*** Correspondence: b.aiazzi@ifac.cnr.it; Tel.: +39-055-5226451

Received: 18 April 2019; Accepted: 4 June 2019; Published: 10 June 2019

**Abstract:** In this work, the Sieve of Eratosthenes procedure (in the following named Sieve procedure) is approached by a novel point of view, which is able to give a justification of the Prime Number Theorem (P.N.T.). Moreover, an extension of this procedure to the case of twin primes is formulated. The proposed investigation, which is named Limited INtervals into PEriodical Sequences (LINPES) relies on a set of binary periodical sequences that are evaluated in limited intervals of the prime characteristic function. These sequences are built by considering the ensemble of deleted (that is, 0) and undeleted (that is, 1) integers in a modified version of the Sieve procedure, in such a way a symmetric succession of runs of zeroes is found in correspondence of the gaps between the undeleted integers in each period. Such a formulation is able to estimate the prime number function in an equivalent way to the logarithmic integral function Li(*x*). The present analysis is then extended to the twin primes, by taking into account only the runs whose size is two. In this case, the proposed procedure gives an estimation of the twin prime function that is equivalent to the one of the logarithmic integral function Li2(*x*). As a consequence, a possibility is investigated in order to count the twin primes in the same intervals found for the primes. Being that the bounds of these intervals are given by squares of primes, if such an inference were actually proved, then the twin primes could be estimated up to infinity, by strengthening the conjecture of their never-ending.

**Keywords:** prime numbers; Prime Number Theorem (P.N.T.); modified Sieve procedure; binary periodical sequences; prime number function; prime characteristic function; limited intervals; logarithmic integral estimations; twin prime numbers

## **1. Introduction**

The Sieve procedure is able to achieve heuristic justifications of the Prime Number Theorem (P.N.T.) [1]. Such a theorem gives the asymptotic trend of the prime number function *π*(*x*), where *π*(*x*) denotes the quantity of prime numbers *p* less or equal to *x* ∈ R, that is,

$$
\pi(\mathbf{x}) = \text{number of primes } p, \text{ } p \le \mathbf{x}. \tag{1}
$$

Let log(*x*) be the natural logarithm of *x*. If the real functions *A*(*x*) and *B*(*x*) are asympthotically equal, that is, lim*x*→<sup>∞</sup> *<sup>A</sup>*(*x*)/*B*(*x*) = 1, then we say that *<sup>A</sup>*(*x*) and *<sup>B</sup>*(*x*) are equivalent as *<sup>x</sup>* <sup>→</sup> <sup>∞</sup>, and we write *A*(*x*) ∼ *B*(*x*). Consequently, the P.N.T. can be written as

$$
\pi(\mathbf{x}) \sim \mathbf{x} / \log(\mathbf{x}).\tag{2}
$$

After the infinitude of primes was recognized since ancient times, the estimation (2) was conjectured by Gauss [2] and Legendre [3] at the end of the 18*th* century. Gauss himself improved Equation (2), by considering the logarithmic integral function Li(x), which is defined as

$$\operatorname{Li}(\mathbf{x}) = \int\_2^\mathbf{x} \, \frac{dt}{\log t}. \tag{3}$$

Again, the function (3) is such that

$$
\pi(x) \sim \text{Li}(x) \tag{4}
$$

but the approximation (4) is much more precise than (2). In fact, it can be demonstrated that the piece *x*/ log(*x*) is only the first term of the series expansion of (3). The aim of this work is to introduce a novel heuristic procedure (LINPES, Limited INtervals into PEriodical Sequences) that is equivalent to the Li(*x*) approximation, in the sense of Equation (4), apart from a simple multiplicative constant, by exploiting some binary periodic sequences, and related symmetrical runs. Pieces of these sequences compose limited intervals of the prime characteristic function *ξ <sup>p</sup>*(*n*), which is defined as

$$\mathcal{J}\_{\mathcal{P}}(n) = \begin{cases} 1 & \text{if } n \text{ is prime} \\ 0 & \text{otherwise} \end{cases} \tag{5}$$

As a matter of fact, a topic that is very much discussed nowadays in the literature just concerns the possible discovering of some regularities and periodicities in the distribution of the primes in certain intervals of the integer sequence [4]. In this work, the implications of the LINPES procedure are also investigated, in particular with an extension to the twin primes, whose distribution is given by a function known as twin prime function *π*2(*x*), which is similar to (1), that is,

$$
\pi\_2(\mathbf{x}) = \text{number of pairs of twin primes } (p, \, p+2), \,\, p \le \mathbf{x}. \tag{6}
$$

Unlike the case of primes, the infinitude of twin primes is still unproved. However, analogously to the P.N.T., the density of the twin primes has been conjectured [5], by considering that the probability to be a prime of an integer *n* is equal to 1/ log(*n*). Consequently, the probability that *n* and *n* + 2 are both prime can be computed, in such a way the strong twin prime conjecture[6] gives an equivalence between the twin prime function *π*2(*x*) and the logarithmic integral function Li2(*x*), that is,

$$
\pi\_2(\mathbf{x}) \sim \mathbb{C} \operatorname{Li}\_2(\mathbf{x})\tag{7}
$$

where Li2(*x*) is defined as

$$\operatorname{Li}\_2(\mathbf{x}) = \int\_2^\mathbf{x} \frac{dt}{\left(\log t\right)^2} \tag{8}$$

and *C* = 2 Π<sup>2</sup> 1.3203 is a multiplicative constant that takes into account the statistical dependence of the primes *n* and *n* + 2 [5]. The related constant Π<sup>2</sup> 0.6602 is named twin prime constant, that is,

$$\Pi\_2 = \prod\_{p>2, p \text{ prime}} \left( 1 - \frac{1}{(p-1)^2} \right). \tag{9}$$

As it will be shown later, the proposed LINPES procedure is able to estimate the twin prime function in an equivalent way as the Li2(*x*) function, apart from a multiplicative constant. However, this is made by admitting that a basic relation, which is true for the primes, is also valid for the twin primes. In this case, the contribution of the present work will be a more probable assertion of the infinitude of twin primes.

Before starting our discussion, we itemize the variables utilized in this paper


This paper is organized as follows: Section 2 reports a well-known heuristic method, which is able to estimate the prime number function *π*(*x*) in the sense of (2), apart from a multiplicative constant. Section 3 shows instead how the LINPES procedure is able to obtain an estimation of *π*(*x*) that is equivalent to the logarithmic-integral function Li(*x*). Section 4 extends the proposed procedure to the case of twin primes. Finally, future research and conclusive remarks are provided in Section 5.

## **2. A Heuristic Estimation of** *π***(***x***) Equivalent to the** *x***/ log(***x***) Function**

In this section, a well-known heuristic method to justify the P.N.T. in a probabilistic way is briefly resumed, by starting from the Sieve procedure, which splits the primes from the composites in a list of integers up to a given number *N*. The Sieve procedure is the most common way to obtain the primes, and it is also presently a research topic in order to improve its efficiency [7]. Let *p*(*n*) be the arithmetic function whose *n-th* element is the *n-th* prime, with *n* ∈ N [8,9]. The Sieve procedure can be summarized by the following steps:


In order to directly compute the characteristic function of primes *ξ <sup>p</sup>*(*n*), we can memorize the status of each integer in a binary vector ranging from 1 to *IN*. In practice, we associate the value 0 to an integer that has been struck out by the procedure, and the value 1 otherwise. Such a vector is initialized by all 1 values, because no integer is deleted when the procedure starts. Then, in each iteration of the Sieve procedure, a 0 value is assigned to the cells that identify the deleted integers (that is, the composite integers). At the end of the procedure, only the cells related to the prime numbers will retain the initial 1 value.

The Sieve procedure is able to obtain heuristic justifications of the relation (2) by considering purely probabilistic considerations [10]. To show this, let be *N* an integer whose order of magnitude is large enough to allow sufficiently robust statistics. In the first step (*n* = 1), the multiples of *p*(1) = 2 are struck out, starting from *p*(1)<sup>2</sup> = 4, and the number of deleted integers is approximately given by

$$
\left\lfloor \frac{N}{2} \right\rfloor - 1 \simeq \frac{N}{2}.\tag{10}
$$

Therefore, the quantity of residual integers is about *Rs*(1) *N*/2. In the following step (*n* = 2), the multiples of *p*(2) = 3 are struck out. Given the independence of the congruences modulo *p*, where *p* is a prime, about 1/3 of the residual integers will be deleted (for the Chinese Remainder Theorem [9]). The updated number of the residual integers *Rs*(2) will be given by

$$R\_s(2) \simeq \left(1 - \frac{1}{2}\right) \times \left(1 - \frac{1}{3}\right) \times N. \tag{11}$$

In general, about 1 − 1/*p*(*k*) of the residual integers will be struck out in the *k* − *th* step of the Sieve procedure. The procedure ends when the greatest prime number not exceeding *N*1/2 is reached, that is, *p*(*K*), where *K* is such that *p*(*K*)<sup>2</sup> is the greatest prime square lower than *N*. At this point, we obtain an estimation *πR*(*N*) of the number of residual integers *Rs*(*K*), and consequently of the quantity of primes *π*(*N*), that is,

$$\pi\_{\mathbb{R}}(N) = \left(1 - \frac{1}{2}\right) \times \left(1 - \frac{1}{3}\right) \times \left(1 - \frac{1}{p(\mathcal{K})}\right) \times N = N \times \prod\_{k=1}^{K} \left(1 - \frac{1}{p(k)}\right) = N \times \prod\_{k=1}^{K} \frac{p(k) - 1}{p(k)}.\tag{12}$$

Let us apply the Merten's Third Theorem [11] to the reciprocal of the product structure (12), by taking the limit as *N* → ∞, that is, as *K* → ∞. We obtain

$$\lim\_{K \to \infty} \prod\_{k=1}^{K} \frac{p(k)}{p(k) - 1} \times \frac{1}{\log \left( p(K)^2 \right)} = \frac{1}{2} \times \varepsilon^{\gamma} \simeq \frac{1}{2} \times 1.7811 \simeq 0.8905\tag{13}$$

where *γ* is the *Eulero-Mascheroni constant*. Consequently, we can get the limit of *πR*(*N*) as *N* → ∞, that is, an approximation of the limit of *π*(*N*), by considering

$$\lim\_{N \to \infty} \pi\_{\mathbb{R}}(N) = \lim\_{N \to \infty} N \times \prod\_{k=1}^{K} \frac{p(k) - 1}{p(k)} = \lim\_{N \to \infty} N \times \frac{c}{\log N} = \lim\_{N \to \infty} \frac{cN}{\log N} \tag{14}$$

that is, *<sup>π</sup>R*(*N*) <sup>∼</sup> *c N* log *<sup>N</sup>* , with *<sup>c</sup>* <sup>=</sup> <sup>2</sup> *<sup>e</sup>*−*<sup>γ</sup>* 1/0.8905 1.1229, and being lim *<sup>N</sup>*→<sup>∞</sup> *<sup>N</sup>* <sup>=</sup> lim *<sup>K</sup>*→<sup>∞</sup> *<sup>p</sup>*(*K*)2. Noticeably, from the relations (2) and (14), the real quantity of prime numbers in the interval *IN* = [1, *N*], is overestimated, as *N* → ∞, by a factor *c*, due to the previous approximations.

As a conclusion, this heuristic procedure gives a justification of the P.N.T. that is equivalent to the relation (2), except for the *c* constant [10,12]. In Section 3, the proposed LINPES procedure will be described, which gives a justification of the P.N.T. that is instead equivalent to the more precise estimation (4), by means of a procedure that is not purely probabilistic, but that is also featured by analytic considerations, which can be shared with other scientific sectors.

## **3. The LINPES Estimation of** *π***(***x***) Equivalent to the Li(***x***) Function**

In this section, the novel heuristic LINPES procedure is described, by showing that it can give an estimation of the prime number function *π*(*x*). To this end, an ensemble of periodic binary sequences will be considered in limited intervals of the prime characteristic function *ξ <sup>p</sup>*(*n*). Such a topic is of a great interest because the distribution of primes in short intervals has been deeply investigated in literature, up to the present [13,14]. The proposed procedure is also able to provide useful insights into the estimation of the trend of the twin prime number function *π*2(*x*). In this analysis, we denote in the following *p*(0) = 1 for convenience, even if the integer 1 is not considered to be a prime.

## *3.1. Periodic Binary Sequences Inside the Prime Characteristic Function ξ <sup>p</sup>*(*n*)

The occurrence of pieces of periodic binary sequences inside the prime characteristic function *ξ <sup>p</sup>*(*n*) is discussed here. To this end, both the Sieve procedure and a modified version of it are investigated step-by-step, where each step is labelled with the progressive index *k*, with *k* = 0 denoting the beginning of the two procedures. The difference between the modified and the true Sieve procedure is simply that in the Sieve procedure, in each step *k* ≥ 1, only the multiplies of the prime *p*(*k*) are struck out, but not the prime itself, whereas in the modified Sieve procedure the prime itself is also deleted. As previously stated, the status of each integer (0→deleted, 1→undeleted) is stored in a *N*-size vector, which is initialized with all 1 values. The outputs of the Sieve procedure and its modified version are denoted as *ξ*(*k*, *n*) and *ψ*(*k*, *n*), respectively, for each step *k* > 0. Consequently, the deletion of an integer from the true or the modified Sieve procedure simply means that a 0 value replaces a 1 value in the two previous sequences. In the case of the Sieve procedure, the sequence *ξ*(*k*, *n*) is an approximation at the step *k* of the prime characteristic function *ξ <sup>p</sup>*(*n*).

At the beginning of the procedures ( *k* = 0), we have two equal periodic sequences of all 1 values, that is, *ξ*(0, *n*) and *ψ*(0, *n*), whose period is *T*(0) = 1. In the first step of the modified Sieve procedure (*k* = 1), the multiples of *p*(1) = 2 are struck out, including *p*(1) itself. Consequently, we obtain a sequence *ψ*(1, *n*), which is still periodic, with alternating 1 and 0 symbols. The period of *ψ*(1, *n*) is given by the prime value *p*(1) itself, that is, *T*(1) = 2. In the following, *T*(*k*) will denote the period of the sequence *ψ*(*k*, *n*). Conversely, in the Sieve procedure, the prime *p*(1) is not deleted. In this case, the output sequence *ξ*(1, *n*) is not periodic, but includes a piece of the periodic sequence *ψ*(1, *n*), by starting from the square *p*(1)<sup>2</sup> = 4. Before such a value, the previous sequence *ξ*(0, *n*) is preserved, which coincides with *ψ*(0, *n*). It follows that *ξ*(1, *n*) is a mixed sequence, being composed by pieces of both *ψ*(0, *n*) and *ψ*(1, *n*), that is,

$$\xi^x(1,n) = \begin{cases} \psi(0,n) & \text{if } p(0)^2 \le n < p(1)^2 \\ \psi(1,n) & \text{if } n \ge p(1)^2 . \end{cases} \tag{15}$$

Similarly, in the second step of the modified Sieve procedure ( *k* = 2), every multiple of *p*(2) = 3, which is not yet struck out, is deleted, including the prime itself, to give the new sequence *ψ*(2, *n*). Therefore, this sequence comes from the deletion of all the multiplies of the primes *p*(1) and *p*(2), including the primes themselves. It follows that the sequence *ψ*(2, *n*) is periodic, with a period equal to the product of *p*(1) and *p*(2), as it will be demonstrated in Theorem 1. If we consider the second

step of the Sieve procedure, where the primes *p*(1) and *p*(2) have not been deleted, we obtain the sequence *ξ*(2, *n*). This is again a mixed sequence, where a piece of the periodic sequence *ψ*(2, *n*) is introduced, by starting from the square *p*(2)<sup>2</sup> = 9, whereas the previous binary values are saved before this square. Consequently, we have

$$\xi(2,n) = \begin{cases} \psi(0,n) & \text{if } p(0)^2 \le n < p(1)^2 \\ \psi(1,n) & \text{if } p(1)^2 \le n < p(2)^2 \\ \psi(2,n) & \text{if } n \ge p(2)^2. \end{cases} \tag{16}$$

In general, the multiples of the prime *p*(*k*), which are not yet struck out in the previous steps, are deleted in the *k-th* step of the modified Sieve procedure, including the prime *p*(*k*) itself. Consequently, after performing all the first *k* steps, we obtain the periodic sequence *ψ*(*k*, *n*), as shown in Theorem 1. In the case of the original Sieve procedure, after the *k-th* step, we obtain the sequence *ξ*(*k*, *n*), which is an approximation of the prime characteristic function until the prime *p*(*k*). Such an approximation differs from the previous one *<sup>ξ</sup>*(*<sup>k</sup>* − 1, *<sup>n</sup>*), only by starting from the square *<sup>p</sup>*(*k*)2. In fact, after this point, a piece of the periodic sequence *ψ*(*k*, *n*) is recognizable. It follows that *ξ*(*k*, *n*) can be eventually written as a mixed sequence, which is a generalization of Equations (15) and (16), that is,

$$\xi(k,n) = \begin{cases} \psi(0,n) & \text{if } p(0)^2 \le n < p(1)^2 \\ \psi(1,n) & \text{if } p(1)^2 \le n < p(2)^2 \\ \dots & \\ \psi(k-1,n) & \text{if } p(k-1)^2 \le n < p(k)^2 \\ \psi(k,n) & \text{if } n \ge p(k)^2. \end{cases} \tag{17}$$

By evaluating the expression (17), we can recognize that subsets of the periodic binary sequences *ψ*(*k*, *n*) are present, for each *k*, in the related intervals *I*(*k*)=[*p*(*k*)2, *p*(*k* + 1)2) of the prime characteristic function. This happens until the end of the Sieve procedure, because each *k* − *th* interval is not influenced by the deletions done in the following steps. We now show that the sequences *ψ*(*k*, *n*) are periodic and that their periods are given by the product of all the primes up to *p*(*k*).

**Theorem 1.** *Let be given the binary sequences ψ*(*k*, *n*)*, which are generated by the deletion of the multiplies of all the primes up to p*(*k*)*, including the primes themselves. Then, the sequences ψ*(*k*, *n*) *are periodic, and their periods T*(*k*) *are given by the product of all the primes up to p*(*k*)*, that is,*

$$T(k) = \prod\_{i=1}^{k} p(i) \tag{18}$$

**Proof.** The deletion of the multiplies of all the primes up to *p*(*k*) gives all the sets, as a function of *k*, of reduced residue systems modulo *T*(*k*), where *T*(*k*) is given by Equation (18). Each set is composed by all the positive integers relatively prime to *T*(*k*), that is, by all the numbers such that *gcd* (*n*, *T*(*k*)) = 1. The quantity of integers in each set is given by the Euler phi function *φ*(*T*(*k*)), which computes the number of positive integers less than *T*(*k*) and relatively prime to *T*(*k*). However, the sets of reduced residue systems are abelian groups, so that each of them is associated to a principal Dirichlet character function. This is an arithmetical function *χ*1(*k*, *n*), which is nothing but *ψ*(*k*, *n*), being defined as

$$\chi\_1(k,n) = \begin{cases} 1 & \text{if } \gcd(n, T(k)) = 1 \\ 0 & \text{if } \gcd(n, T(k)) > 1. \end{cases} \tag{19}$$

In [8], it is proven that *χ*1(*k*, *n*) is a periodic sequence, and in particular that

$$
\chi\_1(k, n + T(k)) = \chi\_1(k, n) \qquad \forall n \tag{20}
$$

This completes the proof.

Table 1 reports the periods *T*(*k*) of the sequences *ψ*(*k*, *n*), *k* = 0, ... , 7, in comparison with the sizes *<sup>S</sup>*(*k*) = *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> − *<sup>p</sup>*(*k*)<sup>2</sup> of the intervals *<sup>I</sup>*(*k*), where subsets of each *<sup>ψ</sup>*(*k*, *<sup>n</sup>*) are recognizable. The pseudo-prime *p*(0) = 1 is put in brackets.

**Table 1.** Periods *T*(*k*) of the sequences *ψ*(*k*, *n*), for primes *p*(*k*) ≤ *p*(7), in comparison with the sizes *S*(*k*) of the intervals *I*(*k*). The ratios *S*(*k*)/*T*(*k*) are rapidly decreasing as the prime *p*(*k*) grows.


By considering the ratios *S*(*k*)/*T*(*k*), it is evident that the periods *T*(*k*) increase much faster than the width of the intervals *S*(*k*). This makes sense because the periodicity of the sequences *ψ*(*k*, *n*) is hardly recognizable by simply investigating the subsets of each *ψ*(*k*, *n*) in the intervals *I*(*k*).

#### *3.2. The Symmetric Sequences of the Runs of Zeroes in the Periods T*(*k*)

In Section 3.1, the prime distribution has been represented as the intersection of an endless number of periodic binary sequences *ψ*(*k*, *n*), whose periods *T*(*k*) rapidly grow, and such that subsets of these sequences can be found in limited intervals *I*(*k*) of the prime characteristic function *ξ <sup>p</sup>*(*n*). In particular, each of these intervals ranges between the squares of a prime *p*(*k*) and of the successive *p*(*k* + 1). Consequently, the real primes in each interval *I*(*k*) are given by the 1 values of the correspondent sequence *ψ*(*k*, *n*). In order to complete this analysis, we now consider the gaps between these primes, by following an established trend in literature. In particular, we are interested to investigate the distributions of the runs of zeros *R*(*k*) in each period *T*(*k*), being the binary sequences *ψ*(*k*, *n*) composed by isolated ones followed by strings, more or less large, of zeroes. It follows that the quantity *R*(*k*) also gives the number of undeleted integers (i.e., isolated ones) in each period *T*(*k*), because the quantity *T*(*k*), for *k* ≥ 1, is an even number, so that the last digit of each period is a zero.

Let us consider the Sieve procedure described step-by-step in Section 3.1 and the number of runs of zeroes *R*(*k*) in each period *T*(*k*) of the binary sequences *ψ*(*k*, *n*). For *k* = 0, 1, we have only one run (*R*(0) = *R*(1) = 1), whose sizes are *L*(1, 0) = 1 and *L*(1, 1) = 2, respectively. For *k* = 2, the deletion of both the multiples of *p*(1) and *p*(2) give two runs (*R*(2) = 2) in the period *T*(2) = 6, whose sizes are *L*(1, 2) = 4 and *L*(2, 2) = 2, respectively, and so on. Table 2 reports the number of runs *R*(*k*) and their sizes *L*(*m*, *k*), for *k* ≤ 4, where the index *m* identifies the specific run and *k* gives the step of the Sieve procedure. Noticeably, the runs of each period *T*(*k*) are symmetrical around a symmetry center given by a run sized 4, except for a final run that is sized 2. Such a trend is expected to be a rule also for the successive steps.


**Table 2.** Runs of zeroes in the periods *T*(*k*) of the sequences *ψ*(*k*, *n*), for primes *p*(*k*) ≤ *p*(4). For each *k*, the number of runs *R*(*k*) and their sizes *L*(*m*, *k*) are reported, with *m* = 1, ... , *R*(*k*). Let us notice the symmetry of the runs in each period *T*(*k*). By starting from *k* = 2, the symmetry center is given by a run of length 4, whereas the final run of length 2 is out of symmetry.

*3.3. The Relation Between the Primes in an Interval I*(*k*) *and the Runs in a Period T*(*k*)

For evidencing the relation between each period *T*(*k*) and the correspondent number of runs of zeroes *R*(*k*), we report in Table 3 the scores of *R*(*k*) for *k* ≤ 7.

**Table 3.** Periods *T*(*k*) and related runs of zeroes *R*(*k*) for the primes *p*(*k*) ≤ *p*(7). The special prime *p*[0] = 1 is put in round brackets.


Such scores also give the number of the integers that have not been struck out by the modified Sieve procedure in the period *T*(*k*), which in turn can be related to the number of undeleted integers (and consequently of the primes) in the correspondent interval *I*(*k*). We will show in Theorem 2 that a correlation exists between *T*(*k*) and *R*(*k*), in such a way the number of primes in each interval *I*(*k*) can be inferred. According on the theory of congruences, Theorem 2 gives the quantity of the integers that have not been struck out (i.e., *R*(*k*)) in each period *T*(*k*), that is,

**Theorem 2.** *Let be given the periodic binary sequences ψ*(*k*, *n*) *defined in Theorem 1, and whose periods are T*(*k*) = ∏*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *p*(*i*)*. Then, the number of undeleted integers, that is, the number of runs of zeroes R*(*k*)*, in a period T*(*k*)*, for k* ≥ 1*, is given by*

$$R(k) = \prod\_{i=1}^{k} \left( p(i) - 1 \right), \qquad k \ge 1 \tag{21}$$

**Proof.** The number of undeleted integers in each period *T*(*k*) is given by the number of integers in the reduced residue systems modulo *T*(*k*), that is, the number of positive integers less than *T*(*k*) and relatively prime to *T*(*k*). Such a value is given by the Euler phi function *φ*(*T*(*k*)), once computed in *T*(*k*), that is [8]

$$\phi(T(k)) = T(k) \cdot \prod\_{p|T(k)} \left(1 - \frac{1}{p}\right) = T(k) \cdot \prod\_{p|T(k)} \left(\frac{p-1}{p}\right) = T(k) \cdot \frac{\prod\_{i=1}^{k} (p(i) - 1)}{\prod\_{i=1}^{k} p(i)} = \prod\_{i=1}^{k} \left(p(i) - 1\right) \tag{22}$$

where *p*(*i*), *i* = 1, . . . , *k*, are the primes dividing *T*(*k*).

By starting from *p*(4) = 7, Table 1 shows that the interval *I*(*k*) is included in the first period of the sequence *ψ*(*k*, *n*). Consequently, a subset of the undeleted integers *R*(*k*) in each period *T*(*k*) lies in the correspondent interval *I*(*k*), where they are just primes. Therefore, we can infer the quantity of primes *P*(*k*) in each *I*(*k*), by starting from the quantity *R*(*k*) in the correspondent period *T*(*k*). As a first approximation, a simple proportional relationship is investigated. Let us consider the local density *D*(*k*, *n*) of the undeleted integers in the period *T*(*k*), where *D*(*k*, *n*) is computed in sliding intervals *<sup>J</sup>*(*k*, *<sup>n</sup>*) whose size is the same of *<sup>I</sup>*(*k*)=[*p*(*k*)2, *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)2), that is, *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> − *<sup>p</sup>*(*k*)2. In this context, the index *n* represents the starting point of each *J*(*k*, *n*). If such intervals span the whole period *T*(*k*), we assume that the density *D*(*k*, *n*) is not a function of *n*. In this case, it is equal to the average density *D*(*k*) over *T*(*k*), and we have

$$D(k, n) = \overline{D}(k) = \frac{R(k)}{T(k)} = \frac{\prod\_{i=1}^{k} \left(p(i) - 1\right)}{\prod\_{i=1}^{k} p(i)} = \prod\_{i=1}^{k} \frac{p(i) - 1}{p(i)}, \qquad k \ge 1 \tag{23}$$

It is noteworthy that the product structure in Equation (23) is the same as in Equation (12). Let us suppose that the previous assumption holds. Then, an estimation of the local density *D*(*k*, *n*) in each interval *I*(*k*) (that is, for *n* = *p*(*k*)2), will be just the average density *D*(*k*) over the period *T*(*k*). Consequently, we can write

$$D\left(k, p(k)^2\right) \simeq \overline{D}(k), \qquad k \ge 1. \tag{24}$$

Therefore, by starting from Equation (23), we can estimate the quantity of primes *P*(*k*) in each interval *I*(*k*), for *k* ≥ 1. To this end, the average density *D*(*k*) is multiplied by the size *S*(*k*) = *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> − *<sup>p</sup>*(*k*)2, that is,

$$P(k) = \overline{D}(k) \cdot S(k) = (p(k+1)^2 - p(k)^2) \cdot \prod\_{i=1}^{k} \frac{p(i) - 1}{p(i)}, \qquad k \ge 1. \tag{25}$$

Evidently, Equation (25) is analogous to Equation (12), apart from the size *N* of the global interval *IN*, where *<sup>N</sup>* ∈ *IK* = [*p*(*K*)2, *<sup>p</sup>*(*<sup>K</sup>* + <sup>1</sup>)2), that is changed into the size *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> − *<sup>p</sup>*(*k*)<sup>2</sup> of the local interval *I*(*k*).

#### *3.4. The Novel LINPES Estimation of the Prime Number Function π*(*x*)

Equation (25) gives a succession of estimations *P*(*k*) of the real number of primes *π*(*k*) in each interval *I*(*k*)=[*p*(*k*)2, *p*(*k* + 1)2). Therefore, the next step will be to blend all these scores to compute a global estimation *πP*(*N*) of the quantity of the primes up to *N*, where *N* ∈ *I*(*K*), analogously to Equation (12). In theory, *πP*(*N*) is simply computable by adding all the contributions *P*(*k*) of Equation (25), for *k* = 1, ... , *K*, where *p*(*K*) is the greatest prime number not exceeding *N*1/2. However, such a procedure includes the term *p*(*K* + 1), which is unknown. In order to overcome this issue, the computation of *πP*(*N*) has to involve only the terms up to *P*(*K* − 1), plus a final term *P*(*K*, *N*), where the interval *IK* is only partially considered. Consequently, we obtain

$$\pi\_P(N) = \sum\_{k=0}^{K-1} P(k) + P(K, N) = P(0) + \sum\_{k=1}^{K-1} \left[ \left( p(k+1)^2 - p(k)^2 \right) \cdot \prod\_{i=1}^{k} \frac{p(i) - 1}{p(i)} \right] + P(K, N) \tag{26}$$

where *<sup>P</sup>*(0) = *<sup>p</sup>*(1)<sup>2</sup> − *<sup>p</sup>*(0)2, and *<sup>P</sup>*(*K*, *<sup>N</sup>*) = *<sup>N</sup>* − *<sup>p</sup>*(*K*)<sup>2</sup> · <sup>∏</sup>*<sup>K</sup> i*=1 *p*(*i*)−1 *<sup>p</sup>*(*i*) . Let us notice that Equation (26) includes as many contributions as the primes are, where each term is given by a relation similar to Equation (12), with the global size *N* that is replaced by the size of the interval *I*(*k*). Each contribution includes an average number of primes that is given by ∏*<sup>k</sup> i*=1 *p*(*i*)−1 *<sup>p</sup>*(*i*) , so that the average distance *<sup>p</sup>*(*<sup>k</sup>* <sup>+</sup> <sup>1</sup>) <sup>−</sup> *<sup>p</sup>*(*k*) between two consecutive primes is <sup>∏</sup>*<sup>k</sup> i*=1 *p*(*i*) *<sup>p</sup>*(*i*)−<sup>1</sup> , which is of the order of magnitude of log(*p*(*k*)). For the Cramér conjecture [15], this average distance is *<sup>p</sup>*(*<sup>k</sup>* <sup>+</sup> <sup>1</sup>) <sup>−</sup> *<sup>p</sup>*(*k*) = <sup>O</sup>(log2(*p*(*k*)). Another conjecture by Cramér, by starting from the Riemann's hypothesis, was *p*(*k* + 1) − *p*(*k*) = O( *p*(*k*) log(*p*(*k*)) [12,16]. Consequently, the error given by neglecting the partial term *P*(*K*, *N*) is smaller than the loading term of the Cramér conjectures, so that the partial term *P*(*K*, *N*) could be omitted.

#### *3.5. The Corrected LINPES Estimation by Using the Equivalence with the Li*(*x*) *Function*

We want now to show that Equations (3) and (26) are related. To this end, we write the logarithmic integral function Li(*N*) as a summation of integrals, each of them is computed in the interval *I*(*k*) = [*p*(*k*)2, *p*(*k* + 1)2), that is,

$$\operatorname{Li}(N) = \int\_{2}^{p(1)^2} \frac{dt}{\log t} + \sum\_{k=1}^{K-1} \int\_{p(k)^2}^{p(k+1)^2} \frac{dt}{\log t} + \int\_{p(K)^2}^{N} \frac{dt}{\log t},\tag{27}$$

where the first term starts from 2 to cope with a possible improper integral, and *p*(*K*)<sup>2</sup> is the greatest square of a prime less than *N*. Consequently, the Li(*N*) function is expressed by Equation (27) as a succession of estimations *L*(*k*), in a similar way to Equation (26), that is,

$$\operatorname{Li}(N) = L(0) + \sum\_{k=1}^{K-1} L(k) + L(K, N), \tag{28}$$

where *<sup>L</sup>*(0) = *<sup>p</sup>*(1)<sup>2</sup> 2 *dt* log *<sup>t</sup>* , *<sup>L</sup>*(*K*, *<sup>N</sup>*) = *<sup>N</sup> <sup>p</sup>*(*K*)<sup>2</sup> *dt* log *<sup>t</sup>* , and

$$L(k) = \int\_{p(k)^2}^{p(k+1)^2} \frac{dt}{\log t}. \tag{29}$$

We now apply the Mean Value Theorem to each interval *I*(*k*) in Equation (27), that is,

$$\operatorname{Li}(N) = \frac{p(1)^2 - 2}{\log \left(\varsigma\_0\right)} + \sum\_{k=1}^{K-1} \frac{p(k+1)^2 - p(k)^2}{\log \left(\varsigma(k)\right)} + \frac{N - p(K)^2}{\log \left(\varsigma\_K\right)},\tag{30}$$

where *<sup>ς</sup>*<sup>0</sup> ∈ *<sup>I</sup>*(0), *<sup>I</sup>*(0)=[*p*(0)2, *<sup>p</sup>*(1)2), *<sup>ς</sup>*(*k*) ∈ *<sup>I</sup>*(*k*), *<sup>k</sup>* = 1, . . . , *<sup>K</sup>* − 1, and *<sup>ς</sup><sup>K</sup>* ∈ *<sup>I</sup>*(*K*, *<sup>N</sup>*), *<sup>I</sup>*(*K*, *<sup>N</sup>*) = [*p*(*K*)2, *N*). In order to show the equivalence between the Equations (26) and (30), we also consider the lower bound *p*(*k*)<sup>2</sup> of the interval *I*(*k*). By taking, in the two summations, the ratio between the two terms multiplying the interval size *<sup>S</sup>*(*k*) = *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> − *<sup>p</sup>*(*k*)2, we can write

$$\frac{\prod\_{i=1}^{k} \frac{p(i) - 1}{p(i)}}{\frac{1}{\log(\zeta(k))}} = \left(\frac{\prod\_{i=1}^{k} \frac{p(i)}{p(i) - 1}}{\log\left(\zeta(k)\right)}\right)^{-1} \tag{31}$$

From Equation (13), we have

$$\lim\_{k \to \infty} \frac{\prod\_{i=1}^{k} \frac{p(i)}{p(i) - 1}}{\log \left( \zeta(k) \right)} = \lim\_{k \to \infty} \left[ \frac{\prod\_{i=1}^{k} \frac{p(i)}{p(i) - 1}}{\log \left( p(k)^2 \right)} \times \frac{\log \left( p(k)^2 \right)}{\log \left( \zeta(k) \right)} \right] = \frac{1}{2} \times \varepsilon^{\gamma} \times \lim\_{k \to \infty} \frac{\log \left( p(k)^2 \right)}{\log \left( \zeta(k) \right)} \tag{32}$$

where *<sup>ς</sup>*(*k*) ∈ *<sup>I</sup>*(*k*)=[*p*(*k*)2, *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)2), so that its maximum distance from *<sup>p</sup>*(*k*)<sup>2</sup> is *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> − *<sup>p</sup>*(*k*)2. However, we know that the *k* − *th* prime *p*(*k*) is given asymptotically by *p*(*k*) ∼ *k log*(*k*) [9]. Therefore, *<sup>p</sup>*(*k*)<sup>2</sup> ∼ *<sup>k</sup>*<sup>2</sup> · *log*(*k*)<sup>2</sup> and *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> ∼ (*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> · *log*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> ∼ *<sup>k</sup>*<sup>2</sup> *log*(*k*)2, so that for each point *<sup>ς</sup>*(*k*) ∈ [*p*(*k*)2, *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)2) we have *<sup>ς</sup>*(*k*) ∼ *<sup>k</sup>*<sup>2</sup> *log*(*k*)2. It follows that

$$\lim\_{k \to \infty} \frac{\prod\_{i=1}^{k} \frac{p(i)}{p(i) - 1}}{\log \left( \mathfrak{g}(k) \right)} = \frac{1}{2} \times \epsilon^{\gamma} \times \lim\_{k \to \infty} \frac{\log \left( p(k)^2 \right)}{\log \left( \mathfrak{g}(k) \right)} = \frac{1}{2} \times \epsilon^{\gamma} = \frac{1}{c} \simeq 0.8905 \tag{33}$$

and consequently Equation (31) gives, for each fixed *k*,

$$\frac{\prod\_{i=1}^{k} \frac{p(i) - 1}{p(i)}}{\frac{1}{\log(\xi(k))}} = c\_I(k) \qquad \text{where } \lim\_{k \to \infty} c\_I(k) = c = 2 \times e^{-\gamma} \simeq 1.1229. \tag{34}$$

It follows that the trends of the two estimations (26) and (30) are the same as *k* → ∞, apart from the constant coefficient *c*. Due to this multiplicative factor, the proposed estimation (26) overestimates the prime number function *π*(*N*) with respect to Equation (30), and in this sense it is similar to the heuristic procedure described in Section 2. However, it has to be noticed that this last one is completely probabilistic, whereas the proposed method is also based on an analytical procedure, that is, the recognition of an infinite number of binary periodical sequences and related intervals of the prime characteristic function. In order to correct this discrepancy, we relax the conjecture of Section 3.3, in such a way the trend of the local density *D*(*k*, *n*) becomes a function of *n*. Experimentally, the values of the local density *D*(*k*, *p*<sup>2</sup> *<sup>k</sup>* ) in the interval *I*(*k*) are lower than those of the average density *D*(*k*). The following conjecture is then proposed, which links *D*(*k*, *p*<sup>2</sup> *<sup>k</sup>* ) and *D*(*k*) by means of the constant *c* of the Third Mertens' Theorem [11].

**Conjecture 1.** *The local density D*(*k*, *n*) *of the undeleted integers in the period T*(*k*)*, if computed in sliding intervals whose size is the same of I*(*k*)=[*p*(*k*)2, *p*(*k* + 1)2)*, is a function of the starting point n of the sliding interval. In particular, the average density D*(*k*) *is greater than the local density D k*, *p*(*k*)<sup>2</sup> *in the interval I*(*k*)*, in such a way the succession cI*(*k*) *of their ratios exceeds the unity. Moreover, the limit value as k* → ∞ *of cI*(*k*) *is equal to the constant c* <sup>=</sup> <sup>2</sup> · *<sup>e</sup>*−*<sup>γ</sup>* 1.1229 *of the Third Mertens' Theorem, that is,*

$$\lim\_{k \to \infty} \frac{\overline{D(k)}}{D\left(k, p\_k^2\right)} = c.\tag{35}$$

The typical trend of *D*(*k*, *n*) = *D*(16, *n*) = *D*(*n*), for *k* = 16 and varying *n*, is plotted in Figure 1, together with the average density *D*(*k*) = *D*(16) = *D* in the period *T*(*k*) = *T*(16). Let us notice that, as it will be discussed in the following, such a trend is less appreciable for small values of the primes.

Figure 1 can be explained as follows. Let us consider the sequences *ψ*(*k*, *n*) defined in Section 3.1, where the multiples of the primes up to *p*(*k*) have been struck out, included the primes themselves. In each of these sequences, all the undeleted integers are just primes in the range [*p*(*k* + 1), *p*(*k* + 1)2], whereas the undeleted integers greater than *p*(*k* + 1)<sup>2</sup> can be indifferently primes or composites, because the multiples of the primes greater than *p*(*k*) have not yet been struck out.

At the beginning of the modified Sieve procedure (*k* = 0), the local density *D*(*k*, *n*) of the undeleted integers is not a function of *n*, because no integer has been still struck out. In the first step (*k* = 1), only the even integers (i.e., the multiplies of *p*(1) = 2) have been struck out, so that *D*(*k*, *n*) is still a constant value up to infinity. Noticeably, the multipliers (i.e. the integers multiplying *p*(1) to give the deleted multiplies) are equal to the undeleted integers when the procedure starts (i.e., all the integers). This rule also holds for the following steps, that is, the multipliers of the prime *p*(*k*) in the *k* − *th* step of the modified Sieve procedure are equal to the undeleted integers in the previous (*k* − 1) − *th* step. It follows that the multipliers of *p*(2) = 3 are all the odd integers, whose distribution is again uniform. Some of these multipliers (that is, 3, 5, 7) are just primes in the interval

6 *p*(2), *p*(2)<sup>2</sup> , but they can also be composites beyond *p*(2)2. In this case, the distribution of the composite multipliers exactly compensate the decreasing trend of the distribution of the multipliers that are also prime numbers. If the primes *p*(*k*) are sufficiently small, such a compensation happens quickly, because it starts from *p*(*k*)2. In these cases, the distribution of the local density *D*(*k*, *n*) is still approximately uniform. However, as *p*(*k*) grows, a transient state is noticeable, because, for such values of *k* and small values of *n*, the local density *D*(*k*, *n*) is greater than the average density *D*(*k*). In fact, for such *n* values, only a portion of the multiplies of the primes *p*(*i*), *i* = 1, ... , *k*, have been struck out, because the deletion of the multiplies of the prime *p*(*i*), *i* < *k*, starts only from *p*(*i*)2, apart from the prime *p*(*i*) itself. This means that the deletion of the multiplies of *p*(*i*), *i* = 1, ... , *k*, is completed only at the lower bound of the interval *I*(*k*), that is, *p*(*k*)2. Consequently, after this point, the transient state ends and the stationary state begins, where the local density *D*(*k*, *n*) fluctuates around the average density *D*(*k*).

**Figure 1.** Typical trend (in black), with *k* = 16, *p*(16) = 53 and *p*(17) = 59, of the local density of the non-deleted integers *D*(*n*) by varying *n* in sliding intervals whose size is *S*(16) = 3481 − 2809 = 672. Notice that it is shown only the initial part of the period *T*(16), whose order of magnitude is 1019, in such a way the symmetrical trend of the period falls outside the figure. The red line reports a polynomial fitting of the density *D*(*k*, *n*), whereas the blue line concerns the average density *D*(*k*) in the period *T*(*k*). The minimum value of the local density is just reached at the lower bound of the interval *I*(*k*), that is, *p*(16)<sup>2</sup> = 2809.

Figure 1 shows the trend of the local density *D*(*k*, *n*) in the case of *p*(*k*) = 16. Starting approximately from this value of *k*, we can notice a minimum value *D*(*k*, *p*(*k*)2) for the distribution of *D*(*k*, *n*), which is located immediately after the transient state, that is, at the lower bound of the interval *I*(*k*). Such a minimum value is about a 10 percent lower than the average density *D*(*k*). In fact, as previously explained, the multipliers of the prime *p*(*k*) are just primes up to *p*(*k*)2, whereupon they can be even composites. It follows that the distribution of the composite multipliers compensate the decreasing distribution of the multipliers that are prime numbers only starting from the multiple *<sup>p</sup>*(*k*)<sup>3</sup> = *<sup>p</sup>*(*k*)<sup>2</sup> · *<sup>p</sup>*(*k*). Therefore, as *<sup>k</sup>* → <sup>∞</sup>, such a compensation is delaying, in such a way the ratio between *D*(*k*) and *D*(*k*, *n*) more and more grows up to the *c* value of Equation (35). As a matter of fact, if all the multipliers were primes, their distribution would decrease by following a logarithmic trend, so that *D*(*k*, *n*) would augment with the same trend, by starting from the minimum value in the interval *I*(*k*). In the real case, however, the compensation given by the composite multipliers has the effect that the local density does not grow indefinitely, but tends to the limit value *<sup>c</sup>* · *<sup>D</sup>*(*k*, *<sup>p</sup>*(*k*)2). Let us notice that, if we stop the procedure to a finite value of *k*, the ratio between *D*(*k*) and *D*(*k*, *n*) is *cI*(*k*) · *<sup>D</sup>*(*k*, *<sup>p</sup>*(*k*)2), where the succession *cI*(*k*) is increasing and tends to the limit value *<sup>c</sup>* as *<sup>k</sup>* → <sup>∞</sup>.

In order to evaluate the effect of the compensation delay for the small primes *p*(*k*), *k* = 1, ... , 7, in comparison with the case of *p*(16) = 53, Table 4 reports: a) the multipliers *fI* such that the multiples *fI* · *<sup>p</sup>*(*k*) lie in the interval *<sup>I</sup>*(*k*)=[*p*(*k*)2, *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)2), and b) the first multiplier that is a composite number, that is, *fc* = *<sup>p</sup>*(*k*)2, whose correspondent multiple is *<sup>p</sup>*(*k*)<sup>2</sup> · *<sup>p</sup>*(*k*) = *<sup>p</sup>*(*k*)3. Evidently, as *<sup>k</sup>*

grows, the difference between the upper bound *p*(*k* + 1)<sup>2</sup> of *I*(*k*) and *p*(*k*)<sup>3</sup> becomes so large that the compensation effect of the composite multipliers is no longer noticeable in the interval itself.

**Table 4.** Prime numbers *p*(*k*), *k* = 1, ... , 7, and *k* = 16, and the related intervals *I*(*k*), together with: a) the multipliers *fI* such that the multiples *fI* · *p*(*k*) lie inside the intervals *I*(*k*); b) the first multiplier *fc* that is a composite number. Let us notice that the difference between *fc* and the multipliers *fI* rapidly grows, so that the distance between the multiple *fc* · *p*(*k*) and the upper bound of the interval *I*(*k*) becomes larger and larger.


Figure 2 shows the trend of the succession *cI*(*k*), as *k* approaches infinity. Evidently, such a succession tends to the constant value *c*. The x-axis is in a logarithmic scale, in such a way the values of *p*(*k*)<sup>2</sup> can be visualized up to 1015.

**Figure 2.** Trend of the succession *cI*(*k*) whose elements are the ratios between the average densities *D*(*k*) in the period *T*(*k*) and the local densities *D*(*k*, *n*) in the correspondent interval *I*(*k*). For *k* → ∞, such a succession asymptotically approximates the constant *c*. In the x-axis, a base-10 logarithmic scale has been chosen for a better visualization.

Finally, Table 5 highlights the equivalence between the proposed estimation (26) and the logarithmic-integral one (3). To this end, a number of linear regressions have been computed between the occurrences *P*(*k*) (25) in each interval *I*(*k*) of the proposed estimation versus the correspondent ones *L*(*k*) (29) of the integral-logarithmic function. Each row of Table 5 is referred to the prime squares *p*(*k*)<sup>2</sup> ranging from a power-of-ten to the following one, except the first raw, which includes all the squares lower than 106, in order to elaborate a sufficient number of points. For each of these ranges, we report the coefficients *m*<sup>1</sup> and *q*<sup>1</sup> of the linear regressions *yi* = *m*<sup>1</sup> *xi* + *q*1, together with the coefficient of determination *R*<sup>2</sup> <sup>1</sup>, which is a measure of the fitting between the two estimations. Evidently, the coefficient of determination tends very fast to its optimal value, that is 1, despite that the number of observations has increased. Let us notice that the intercept *q*<sup>1</sup> is practically negligible with respect to the full-scale level, whereas the slope *m*<sup>1</sup> is approaching the constant value 1/*c*.

For comparison, Table 5 also reports the parameters and the coefficient of determination in the case of the linear regressions *yi* = *m*<sup>2</sup> *xi* + *q*<sup>2</sup> concerning the occurrences *P*(*k*) versus the targets *π*(*k*). These scores are defined as the number of primes in each interval *I*(*k*). Even in this case, the fitting between *P*(*k*) and *π*(*k*) is impressive, as shown by the coefficient of determination *R*<sup>2</sup> <sup>2</sup>. Noticeably, the slope *m*<sup>2</sup> still approaches the value 1/*c*, because the P.N.T. guarantees that the logarithmic-integral function and the prime number function goes to infinity in the same way.

**Table 5.** Parameters and coefficients of determination of the linear regressions *yi* = *m*<sup>1</sup> *xi* + *q*<sup>1</sup> of the proposed estimations *P*(*k*) versus the logarithmic-integral ones *L*(*k*), together with the parameters and coefficients of determination of the linear regressions *yi* = *m*<sup>2</sup> *xi* + *q*<sup>2</sup> of *P*(*k*) versus the true number of primes *π*(*k*). Each point is computed in an interval *I*(*k*).


From the previous analysis, it follows that, for a given *N*, the proposed approximation *πP*(*N*) overestimates the prime number function *π*(*N*) by a factor *cN*, which can be computed by considering that we have an overestimation for each interval *I*(*k*) that can be computed by considering a factor in the finite set *cI*(*k*), *<sup>k</sup>* = 1, ... , *<sup>K</sup>* , where *<sup>K</sup>* is such that *<sup>N</sup> <sup>p</sup>*(*K*)<sup>2</sup> (see Equation (34)). If *<sup>N</sup>* → <sup>∞</sup>, the overestimation factor *cN* tends to the constant *c*. Being *cN* unknown, an adjusted version (36) of (26) can be defined by means of the correction factor 1/*c*, that is,

$$\begin{split} \bar{\pi}\_{P}(N) &= \frac{1}{\mathfrak{c}} \cdot \left( P\_{\mathbb{D}} + \sum\_{k=1}^{K-1} P(k) + P\_{\mathbb{K},N} \right) = \\ &= \frac{1}{\mathfrak{c}} \cdot \left( p(1)^2 - p(0)^2 \right) + \frac{1}{\mathfrak{c}} \cdot \sum\_{k=1}^{K-1} \left[ \left( p(k+1)^2 - p(k)^2 \right) \cdot \prod\_{i=1}^{k} \frac{p(i) - 1}{p(i)} \right] + \frac{1}{\mathfrak{c}} \cdot \left( N - p(K)^2 \right) \cdot \prod\_{i=1}^{K} \frac{p(i) - 1}{p(i)}. \end{split} \tag{36}$$

Clearly, the corrected version *<sup>π</sup>*3*P*(*N*) = <sup>1</sup> *<sup>c</sup>* · *πP*(*N*) is able to give better estimations than *πP*(*N*) as *N* approaches infinity. In order to give a quantitative assessment, Table 6 reports the scores of *<sup>π</sup>P*(*<sup>N</sup>* ) (26) and of its adjusted version *<sup>π</sup>*3*P*(*N*) (36), in comparison with the logarithmic integral estimation Li(*N*) (27), and with the prime number function *π*(*N*). The range of each row of Table 6 starts from a power-of-ten and ends to the following one up to 1015.

It can be noticed that the scores of *<sup>π</sup>*3*P*(*N*) slightly underestimate both the true number of primes *π*(*N*) and the logarithmic integral function Li(*N*), which, in turn, is such that the sign of its difference with *<sup>π</sup>*3*P*(*N*) changes infinitely many times [17,18], by showing some irregularities in the distribution of the primes [19], which have been investigated by considering differences in some subsets of the primes themselves [20]. Concerning the previous underestimation, this is due to the fact that the limit value *<sup>c</sup>* is an upper bound for the succession *cI*(*k*). Evidently, *<sup>π</sup>*3*P*(*N*) would be perfectly accurate if the terms *cI*(*k*) were available for the computation of (36), by considering the real number of primes in each interval *I*(*k*).


**Table 6.** The proposed estimation *<sup>π</sup>P*(*N*) and its adjusted version *<sup>π</sup>*3*P*(*N*) in comparison with the logarithmic integral estimation Li(*N*), and the prime number function *π*(*N*). The scores of Li(*N*) have been computed by using the MATLAB<sup>R</sup> toolbox. The scores of *<sup>π</sup>P*(*N*) and *<sup>π</sup>*3*P*(*N*) have been rounded to the nearest integer.

## **4. An Extension of the Procedure to the Twin Prime Numbers**

#### *4.1. Preliminary Concepts*

Two prime numbers *p* and *q* are twin primes if |*p* − *q*| = 2, which is the lowest possible distance between primes, apart from *p* = 2 and *q* = 3, where |*p* − *q*| = 1. Let us note that two consecutive pairs of twin primes do not ever occur, apart from the case {3, 5} and {5, 7}. In fact, one number in the sequence {*n*, *n* + 2, *n* + 4} is certainly a multiple of 3. The gaps between consecutive primes have been extensively investigated in literature [13,15,21]. However, differently from the primes, *it is presently unknown whether there are infinitely many pairs of twin primes*. In any case, a preliminary counting shows that the twin primes are relatively abundant into the sequence of primes, and, consequently, it is reasonable to infer the so-called twin prime conjecture, which states that *there are infinitely many pairs of twin primes*. This conjecture is strengthened by the fact that the distribution of the primes does not change abruptly. Recently, significant progress has been made by showing that lim inf *<sup>k</sup>*→<sup>∞</sup> [*p*(*<sup>k</sup>* <sup>+</sup> <sup>1</sup>) <sup>−</sup> *<sup>p</sup>*(*k*)] <sup>=</sup> - < ∞, that is, a finite upper bound exists for the limit inferior of the difference between consecutive primes. In particular, Zhang found that - ≤ <sup>7</sup> · <sup>10</sup><sup>7</sup> [22], and this bound has been successively improved by Maynard to - ≤ 600 [23]. Finally, the Polymath's project, whose aim is to collect all the various efforts that try to put the bound lower as much as possible, has reached the value of - ≤ 246 [24]. Evidently, in order to demonstrate the twin prime conjecture, a bound of - = 2 should be obtained. In this work, we try to give a contribution to the discussion of this conjecture, by following a different strategy, that is, by exploiting the concepts previously introduced for the primes. Consequently, as for the primes, the approach is not merely probabilistic, but also analytic, so constituting a possible significant step for further advancements, as in the case of approaches based on periodic functions [25]. The distribution of the twin primes is commonly characterized by using the twin prime function *π*2(*x*) (6). Such a distribution decays more rapidly than the distribution of the primes. In fact, Brun demonstrated in 1919 [26] that, if *ST* is the set of twin primes given by *ST* = {*p* : *p* prime and *p* + 2 prime}, the related series of the reciprocals converges to the finite limit *B* 1.9022 [1], that is,

$$\sum\_{p \in \mathcal{S}\_T} \left( \frac{1}{p} + \frac{1}{p+2} \right) = B \tag{37}$$

regardless of the fact of whether the number of summation terms is infinite or not, whereas the same summation instead diverges for the primes.

Analogously to the P.N.T., a possible function for approximating the twin prime function *π*2(*x*) has been proposed [5] as the logarithmic integral function Li2(*x*) (8). As for the primes, we want to obtain an equivalent procedure and investigate possible consequences.

## *4.2. A Possible Relation Between the Twin Primes in the Intervals and the Undeleted Integers in the Periods*

In Section 3.2, the distribution of the runs into each period *T*(*k*) has been investigated. In the present analysis, the same investigation can be made for the particular case in which the size of the runs is 2. Evidently, such an investigation can potentially give an estimation of the quantity of twin primes, similarly to the one given by the Equation (26) for the primes. In fact, we will suggest that the number of the runs sized 2 in the interval *I*(*k*) is equal to the quantity of twin primes in the same interval. Such a number is equal to the number of {101} sequences, if the sequence {10} is completely included in the interval. However, such a sequence cannot occur across two intervals, because each interval, apart from the first one, ends with an even number (that is, a 0), because it is followed by a square of an odd prime (that is, another 0), which is an odd number. For the sake of clarity, in the following we denote the runs sized 2 as runs 2. Let us notice that this procedure can be extended to run-lengths of whatever size, by following the *Hardy-Littlewood conjecture B* [6]. Such a topic will be the object of future explorations.

Table 7 reports the number *R*2(*k*) of the runs 2 in each period *T*(*k*) for *p*(*k*), *k* = 0, ... , 7. As for the total number of runs *R*(*k*) (21) in the same period, a correlation can be found between *R*2(*k*) and the prime number *p*(*k*). In particular, the scores of Table 7 suggest the following conjecture for *R*2(*k*)

$$R\_2(k) = \prod\_{i=2}^k (p(i) - 2), \qquad k \ge 2. \tag{38}$$

**Table 7.** Number of runs 2, denoted as *R*2(*k*), that are included in the periods *T*(*k*), for *p*(*k*), *k* = 0, ... , 7. These scores are compared with the total number of runs *R*(*k*). The special prime *p*(0) = 1 is put in round brackets.


Equation (38) can be investigated by taking the modified Sieve procedure. At the start of the procedure (*k* = 0), we have no run 2. In the first step (*k* = 1), the multiples of *p*(1) = 2 are struck out, in such a way the sequence *ψ*(1, *n*) is made by runs 2 only. In particular, a single run 2 is included in the period *T*(1) = 2, so that *R*2(1) = 1. For *k* = 2, we delete the multiples of *p*(2) = 3, so that the period *T*(2) = 6 becomes three times greater. This implies that the number of runs 2 could increase from 1 to 3, but the deletion in the point *n* = 3 vanishes two of these runs. Let us notice that the cancellation of one multiple vanishes two runs 2 only in this step, being all the runs 2 consecutive, but this does not happen in the following steps, where only one run 2, or even none, is deleted at the time. It follows that *R*2(2) = 1, as in the previous step. On the whole, we obtain that the deleted runs 2 in the period *T*(2) are a fraction 2/3 = 2/*p*(2) of the total number of runs 2 in the same period if no cancellations were made.

Similarly, for *k* = 3, the multiples of *p*(3) = 5 are struck out, so that the period *T*(3) becomes five times greater. It follows that the number of runs 2 would grow from 1 to 5, but two cancellations (for *n* = 5, 25) vanish two of the five runs 2. Consequently, we obtain *R*2(3) = 3 and the fraction of the deleted runs 2 is 2/5 = 2/*p*(3) of the total runs in this period if no cancellation were made. In this step, all the cancellations imply the deletion of one run 2, but this will not also be a rule for the following steps. In fact, for *k* = 4, we have eight cancellations in the period *T*(4), but only six of them stroke out a run 2. However, the fraction of the deleted runs 2 in the period is still given by 6/21 = 2/7 = 2/*p*(4) of the pre-existing ones before the cancellations, being *R*2(4) = 3 · 7 − 6 = 15.

In the case of primes, it follows from the relation (21) that we struck out, in each step, a fraction 1/*p*(*k*) of the total number of runs in the period *T*(*k*) if no cancellations were made, which is given by the product of the prime *p*(*k*) by the actual number of runs in the previous period *T*(*k* − 1). By considering the scores of Table 7, a similar relation can be conjectured for the runs 2 in the case of twin primes, in order to link the number of cancelled runs 2 and the total number of runs 2 in the period *T*(*k*) if no cancellations were made. Unfortunately, in general, the actual number of the deleted runs 2 is not easily computable, by starting from the total number of cancellations in *T*(*k*). However, in the same way of the primes, our conjecture is that the deletion of the multiples of *p*(*k*) has the effect to exactly cancel a fraction 2/*p*(*k*) of the runs 2 in the period *T*(*k*).

If this conjecture holds, Equation (38) follows by induction. In fact, it is true for *p*(2) = 3. Let us suppose that Equation (38) holds for *p*(*k* − 1) and show that it is also true for *p*(*k*). By the induction hypothesis, the number of runs 2 in the period *<sup>T</sup>*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>) is given by *<sup>R</sup>*2(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>) = <sup>∏</sup>*k*−<sup>1</sup> *<sup>i</sup>*=<sup>2</sup> (*p*(*i*) − 2). We must show that the number of runs 2 in the period *T*(*k*) is *R*2(*k*) = ∏*<sup>k</sup> <sup>i</sup>*=<sup>2</sup> (*p*(*i*) − 2). Given *R*2(*k* − 1), the number of runs 2 in the new period *T*(*k*) becomes *p*(*k*) · *R*2(*k* − 1), because *T*(*k*) is *p*(*k*) times greater than *T*(*k* − 1). By taking the previous conjecture, a fraction 2/*p*(*k*) of the runs 2 is struck out, in such a way we have a fraction of residual runs 2 given by (*p*(*k*) − 2)/*p*(*k*) · *R*2(*k* − 1) = (*p*(*k*) <sup>−</sup> <sup>2</sup>)/*p*(*k*) · <sup>∏</sup>*k*−<sup>1</sup> *<sup>i</sup>*=<sup>2</sup> (*p*(*i*) <sup>−</sup> <sup>2</sup>) = <sup>∏</sup>*<sup>k</sup> <sup>i</sup>*=<sup>2</sup> (*p*(*i*) − 2) = *R*2(*k*).

## *4.3. A Heuristic Estimation of π*2(*x*) *Equivalent to the Li*2(*x*) *Approximation*

From Equation (38), we can give an estimation *π*2*P*(*N*) of the twin prime function *π*2(*x*), which is equivalent to the approximation given by the Li2(*x*) function (8). Such an estimation can be viewed as a generalization of Equation (26) to the case of the twin primes. To this end, analogously to Equation (23) for the primes, we compute the average density *D*2(*k*) of the number of runs 2 in a period *T*(*k*). By starting from the total number of runs 2 *R*2(*k*) in the period *T*(*k*), the average density *D*2(*k*) is given by the relation

$$\overline{D\_2}(k) = \frac{R\_2(k)}{T(k)} = \frac{\prod\_{i=2}^k \left(p(i) - 2\right)}{\prod\_{i=1}^k p(i)} = \frac{1}{2} \times \prod\_{i=2}^k \frac{p(i) - 2}{p(i)}, \qquad k \ge 2. \tag{39}$$

As for the primes, we can initially approximate the local density *D*2(*k*, *n*) in the interval *I*(*k*) as the average density *D*2(*k*), that is, *D*<sup>2</sup> *k*, *p*(*k*)<sup>2</sup> *D*2(*k*). In this case, the estimated number of twin primes *P*2(*k*) in *I*(*k*), for *k* ≥ 2, is given by

$$P\_2(k) = \overline{D\_2}(k) \times S(k) = \left(p(k+1)^2 - p(k)^2\right) \times \frac{1}{2} \times \prod\_{i=2}^{k} \frac{p(i) - 2}{p(i)}, \qquad k \ge 2 \tag{40}$$

The total estimation *π*2*P*(*N*) is then obtained by adding all the contributions *P*2(*k*), that is,

$$\pi\_{2^{p}}(N) = \sum\_{k=0}^{K-1} P\_{2}(k) + P\_{2}(K, N) = P\_{2}(0) + P\_{2}(1) + \sum\_{k=2}^{K-1} \left[ \left( p(k+1)^{2} - p(k)^{2} \right) \cdot \frac{1}{2} \cdot \prod\_{i=2}^{k} \frac{p(i) - 2}{p(i)} \right] + P\_{2}(K, N) \tag{41}$$

where *<sup>P</sup>*2(0) = *<sup>p</sup>*(1)<sup>2</sup> <sup>−</sup> *<sup>p</sup>*(0)2, *<sup>P</sup>*2(1) = <sup>1</sup> <sup>2</sup> · (*p*(2)<sup>2</sup> − *<sup>p</sup>*(1)2), *<sup>P</sup>*2(*K*, *<sup>N</sup>*) = *<sup>N</sup>* − *<sup>p</sup>*(*K*)<sup>2</sup> · 1 <sup>2</sup> · <sup>∏</sup>*<sup>K</sup> i*=2 *p*(*i*)−2 *<sup>p</sup>*(*i*) , and *K* is the greatest prime number not exceeding *N*1/2. As for the primes, Equation (41) overestimates

the true *π*2(*N*) scores, because the local density *D*2(*k*, *n*) is not actually constant in the period *T*(*k*), but it is a function of *n*. However, the offset of the local density in the interval *I*(*k*) with respect to the average density is greater than for the primes. Experimentally, each *P*2(*k*) value (40) overtakes the true quantity of twin primes computed in *I*(*k*) of about 20%, that is, more or less a double of the percentage previously found for the primes, and reported in Figure 1, even if the trends of the local densities are similar. Quantitatively, the ratio between the average density *D*2(*n*) and the local density *D*<sup>2</sup> *k*, *p*(*k*)<sup>2</sup> seems to approximate the constant *<sup>c</sup>*<sup>2</sup> as *<sup>k</sup>* → <sup>∞</sup>, that is, the square of *<sup>c</sup>*.

To evidence this statement, let us consider the estimation given by the Li2(*x*) function, that is, *C* Li2(*x*), for *x* = *N*, from Equation (8), that is, *C* Li2(*N*), as a summation of integrals, each of them is computed in the interval *I*(*k*)=[*p*(*k*)2, *p*(*k* + 1)2)

$$\mathbb{C}\operatorname{Li}\_2(N) = \mathbb{C} \int\_2^{p(1)^2} \frac{dt}{\log^2 t} + \mathbb{C} \sum\_{k=1}^{K-1} \int\_{p(k)^2}^{p(k+1)^2} \frac{dt}{\log^2 t} + \mathbb{C} \int\_{p(K)^2}^N \frac{dt}{\log^2 t} \tag{42}$$

being *p*(*K*)<sup>2</sup> the greatest square of a prime less than *N*. Similarly to Equation (28), we can write Equation (42) as a succession of estimations *L*2(*k*) in each interval *I*(*k*), that is,

$$\mathbb{C}\operatorname{Li}\_2(N) = \mathbb{C}\operatorname{Li}\_2(0) + \mathbb{C}\sum\_{k=1}^{K-1} L\_2(k) + \mathbb{C}\operatorname{Li}\_2(K, N), \tag{43}$$

where *<sup>L</sup>*2(0) = *<sup>p</sup>*(1)<sup>2</sup> 2 *dt* log<sup>2</sup> *t* , *<sup>L</sup>*2(*K*, *<sup>N</sup>*) = *<sup>N</sup> <sup>p</sup>*(*K*)<sup>2</sup> *dt* log2 *<sup>t</sup>* and

$$L\_2(k) = \int\_{p(k)^2}^{p(k+1)^2} \frac{dt}{\log^2 t}. \tag{44}$$

Then, we apply the Mean Value Theorem for Integrals to Equation (42) in each interval *I*(*k*)

$$\text{C Li}(N) = \text{C} \frac{p(1)^2 - 2}{\log^2(\xi\_0)} + \text{C} \sum\_{k=1}^{K-1} \frac{p(k+1)^2 - p(k)^2}{\log^2(\xi(k))} + \text{C} \frac{N - p(K)^2}{\log^2(\xi\_K)},\tag{45}$$

where the point *ς*<sup>0</sup> belongs to the interval *I*(0)=[*p*(0)2, *p*(1)2), *ς*(*k*) belongs to the interval *I*(*k*), *k* = 1, ... , *<sup>K</sup>* − 1, and *<sup>ς</sup><sup>K</sup>* belongs to the interval *<sup>I</sup>*(*K*, *<sup>N</sup>*)=[*p*(*K*)2, *<sup>N</sup>*). As for the primes, we have to consider the lower bound *p*(*k*)<sup>2</sup> of the interval *I*(*k*). Let us take the ratio between the two terms multiplying the size *<sup>S</sup>*(*k*) = *<sup>p</sup>*(*<sup>k</sup>* + <sup>1</sup>)<sup>2</sup> − *<sup>p</sup>*(*k*)2, in the summations of the Equations (41) and (45), so that we obtain

$$\frac{\frac{1}{2} \cdot \prod\_{i=2}^{k} \frac{p(i) - 2}{p(i)}}{\frac{C}{\log^2(\zeta(k))}} = \left(\frac{2C \cdot \prod\_{i=2}^{k} \frac{p(i)}{p(i) - 2}}{\log^2(\zeta(k))}\right)^{-1} \tag{46}$$

If we consider the lower bound *p*(*k*)<sup>2</sup> of the interval *I*(*k*), we have

$$\frac{2\mathbb{C}\cdot\prod\_{i=2}^{k}\frac{p(i)}{p(i)-2}}{\log^{2}\left(\zeta(k)\right)} = \frac{2\mathbb{C}\cdot\prod\_{i=2}^{k}\frac{p(i)}{p(i)-2}}{\log^{2}\left(p(k)^{2}\right)}\cdot\frac{\log^{2}\left(p(k)^{2}\right)}{\log^{2}\left(\zeta(k)\right)}\tag{47}$$

Let us notice that the ratio *<sup>p</sup>*(*i*)−<sup>2</sup> *<sup>p</sup>*(*i*) can be split as

$$\frac{p(i) - 2}{p(i)} = \frac{p(i) - 2}{(p(i) - 1)^2} \times \frac{(p(i) - 1)^2}{p(i)} = \frac{p(i)^2 - 2 \cdot p(i)}{(p(i) - 1)^2} \times \frac{(p(i) - 1)^2}{p(i)^2} = \frac{(p(i) - 1)^2 - 1}{(p(i) - 1)^2} \times \frac{(p(i) - 1)^2}{p(i)^2} \tag{48}$$
 
$$\implies \frac{p(i) - 2}{p(i)} = \frac{p(i) - 1}{p(i)} \times \frac{p(i) - 1}{p(i)} \times \left(1 - \frac{1}{(p(i) - 1)^2}\right)$$

*Symmetry* **2019**, *11*, 775

Consequently, we obtain

$$\prod\_{i=2}^{k} \frac{p(i)}{p(i) - 2} = \prod\_{i=2}^{k} \left[ \frac{p(i)}{p(i) - 1} \times \frac{p(i)}{p(i) - 1} \times \frac{1}{1 - \frac{1}{(p(i) - 1)^2}} \right] \tag{49}$$

Then, we define

$$\begin{cases} \mathbb{C}(k) = 2 \times \prod\_{i=2}^{k} \left( 1 - \frac{1}{(p(i) - 1)^2} \right), \qquad k \ge 2\\ \mathbb{C}(1) = \mathbb{C}(0) = 1. \end{cases} \tag{50}$$

From Equation (49) and considering that lim *k*→∞ log<sup>2</sup> (*p*(*k*)<sup>2</sup>) log2(*ς*(*k*)) <sup>=</sup> 1 (see Section 3.5), the limit, as *<sup>k</sup>* <sup>→</sup> <sup>∞</sup>, of the ratio (47) is given by

$$\lim\_{k \to \infty} \frac{2\mathbb{C} \times \prod\_{i=2}^{k} \frac{p(i)}{p(i)-2}}{\log^2(p(k)^2)} \times \frac{\log^2(p(k)^2)}{\log^2(\zeta(k))} = \lim\_{k \to \infty} \frac{2\mathbb{C} \times \prod\_{i=2}^{k} \frac{p(i)}{p(i)-2}}{\log^2(p(k)^2)} = \lim\_{k \to \infty} \frac{2\mathbb{C} \times \frac{2}{\mathbb{C}\left(\frac{p}{\zeta(k)}\right)} \prod\_{i=2}^{k} \left[\frac{p(i)}{p(i)-1} \times \frac{p(i)}{p(i)-1}\right]}{\log^2(p(k)^2)}.\tag{51}$$

We noticed in the Equation (33) that

$$\lim\_{k \to \infty} \frac{\prod\_{i=1}^{k} \frac{p(i)}{p(i) - 1}}{\log \left( p(k)^2 \right)} = \frac{1}{2} \times e^{\gamma} \simeq \frac{1}{c} \simeq 0.8905. \tag{52}$$

Evidently, we have

$$\lim\_{k \to \infty} \prod\_{i=2}^{k} \frac{p(i)}{p(i) - 1} = \frac{1}{2} \times \lim\_{k \to \infty} \prod\_{i=1}^{k} \frac{p(i)}{p(i) - 1} \tag{53}$$

and, consequently, from Equation (9),

$$\lim\_{k \to \infty} \mathbb{C}(k) = \mathbb{C}.\tag{54}$$

Finally, from Equation (51), we obtain the limit of the ratio (47)

$$\lim\_{k \to \infty} \frac{4 \times \prod\_{i=2}^{k} \left[ \frac{p(i)}{p(i) - 1} \times \frac{p(i)}{p(i) - 1} \right]}{\log^2 \left( p(k)^2 \right)} = 4 \times \left( \frac{1}{2c} \right)^2 = \frac{1}{c^2} \simeq 0.8905^2 = 0.7931 \tag{55}$$

and Equation (46) gives

$$\frac{\frac{1}{2} \times \prod\_{i=2}^{k} \frac{p(i) - 2}{p(i)}}{\frac{C}{\log^2(\zeta(k))}} = c\_{2I}(k) \qquad \text{where } \lim\_{k \to \infty} c\_{2I}(k) = c^2 \simeq 1.2609. \tag{56}$$

For a given *N*, the proposed approximation *π*2*P*(*N*) overestimates the twin prime number function *π*2(*N*) by a factor *c*2*N*, which can be computed by considering that we have an overestimation for each interval *I*(*k*) that can be computed by considering a factor in the finite set *c*2*I*(*k*), *k* = 1, ... , *K*, where *<sup>K</sup>* is such that *<sup>N</sup> <sup>p</sup>*(*K*)2. Equations (55) and (56) show that the succession *<sup>c</sup>*2*I*(*k*) tends to the constant *<sup>c</sup>*<sup>2</sup> as *<sup>N</sup>* <sup>→</sup> <sup>∞</sup>. Consequently, we can define a corrected version *<sup>π</sup>*32*P*(*N*) (57) of the proposed estimation *<sup>π</sup>*2*P*(*N*), by multiplying Equation (41) by the factor 1/*c*<sup>2</sup> 0.7931, that is,

$$\begin{split} \mathcal{H}\_{2P}(N) &= \frac{1}{\varepsilon^2} \times \left( P\_2(0) + P\_2(1) + \sum\_{k=2}^{K-1} P\_2(k) + P\_2(K,N) \right) = \frac{1}{\varepsilon^2} \times \left( p(1)^2 - p(0)^2 \right) + \frac{1}{2\varepsilon^2} \times \left( p(2)^2 - p(1)^2 \right) + \frac{1}{2\varepsilon^2} \times \left( p(3)^2 - p(1)^2 \right) \\ &+ \frac{1}{2\varepsilon^2} \times \sum\_{k=2}^{K-1} \left[ \left( p(k+1)^2 - p(k)^2 \right) \times \prod\_{i=2}^{k} \frac{p(i)-2}{p(i)} \right] + \frac{1}{2\varepsilon^2} \times \left( N - p(K)^2 \right) \times \prod\_{i=2}^{K} \frac{p(i)-2}{p(i)}. \end{split} \tag{57}$$

As for the primes, Equation (57) is expected to improve the estimation of *π*2(*N*) as *N* approaches infinity. This is evidenced in the scores of Table 8, where a comparison is made between the proposed estimation *<sup>π</sup>*2*P*(*<sup>N</sup>* ) and its adjusted version *<sup>π</sup>*32*P*(*N*) with the estimation *<sup>C</sup>* Li2(*N*) given by the logarithmic integral function (8) and the twin prime number function *π*2(*N*). The ranges of *N* are the same as Table 6.

**Table 8.** The proposed estimation *<sup>π</sup>*2*P*(*N*) and its adjusted version *<sup>π</sup>*32*P*(*N*) in comparison with the logarithmic integral estimation *C* Li2(*N*) and the prime number function *π*2(*N*). The scores of the logarithmic integer function have been computed by using the MATLAB<sup>R</sup> toolbox. The scores of *<sup>π</sup>*2*P*(*N*) and *<sup>π</sup>*32*P*(*N*) have been rounded to the nearest integer.


The connection between the *π*2*P*(*N*) estimation (41) and the *C* Li2(*N*) estimation (42) is investigated in Table 9, by considering the parameters and the coefficient of determination of the linear regressions *yi* = *m*<sup>1</sup> *xi* + *q*<sup>1</sup> between the occurrences of *P*2(*k*) (40) versus those of *C L*2(*k*), where *L*2(*k*) is given by (44), in each interval *I*(*k*). As for the primes, an excellent fitting is given by the linear relationship between *P*2(*k*) and *C L*2(*k*). This is confirmed by the coefficient of determination *R*<sup>2</sup> 1, which rapidly tends to 1 as *k* grows. On the other hand, the intercept *q*<sup>1</sup> is negligible, whilst the slope *m*<sup>1</sup> approaches the limit value 1/*c*2.

The fitting of the linear regressions *yi* = *m*<sup>2</sup> *xi* + *q*<sup>2</sup> between the occurrences of *P*2(*k*) (40) versus those of the twin prime number function *π*2(*k*), if computed in the same interval *I*(*k*), is also reported in Table 9. Even if less impressive than in the case of Table 5 for the primes, the goodness of the fitting is clearly shown by the coefficient of determination *R*<sup>2</sup> <sup>2</sup>, which is practically at its best value. As for *m*1, the slope *m*<sup>2</sup> seems to approximate the limit value 1/*c*2.

**Table 9.** Parameters and coefficients of determination of the linear regressions *yi* = *m*<sup>1</sup> *xi* + *q*<sup>1</sup> of the proposed estimations for the twin primes *P*2(*k*) versus the logarithmic-integral ones *C L*2(*k*), together with the parameters and coefficients of determination of the linear regressions *yi* = *m*<sup>2</sup> *xi* + *q*<sup>2</sup> of *P*2(*k*) versus the true number of twin primes *π*2(*k*). Each point is computed in an interval *I*(*k*).


In summary, the proposed approach estimates the true number of twin primes by considering the number of runs 2 in each interval *I*(*k*) = 6 *p*(*k*)2, *p*(*k* + 1)<sup>2</sup> , in such a way each estimation *P*2(*k*) fits the correspondent one given by *C L*2(*k*). Consequently, in the case the conjecture (38) holds, we can infer that the distribution of the twin primes follows the same trend in all the intervals *I*(*k*). Because these intervals are a function of the squares of both the prime *p*(*k*) and its successive one, it follows that, *being the primes are a never-ending succession, the unproved hypothesis of the infinitude of the twin primes would be further strengthened.*

#### **5. Conclusions and Future Developments**

In this work, an original heuristic procedure in order to obtain the distribution of the prime number function *π*(*x*) is proposed and investigated, which gives estimations of the scores of *π*(*x*) equivalently to the logarithmic integral function Li(*x*). However, this approach is not fully probabilistic, but it is also based on analytical concepts, that is, a set of infinitely many binary periodic sequences is found by means of a modified Sieve procedure, whose periods have a subset that is included in limited and disjoint intervals *I*(*k*) of the prime characteristic function. In each period *T*(*k*), these binary sequences define a succession of 1 values, which are separated by runs of consecutive zeroes. Starting from the number of runs of zeroes in a period *T*(*k*), an estimation of the total number of primes can be found, which is linked to the logarithmic integral estimation by the constant *c* of the Third Mertens' Theorem. Noticeably, the succession of the runs of zeroes, whose elements are the gaps between two consecutive primes, is symmetric in each period *T*(*k*). As a result, the proposed LINPES procedure estimates the prime number function in each interval *I*(*k*), whose bounds are the squares of a prime number and of the successive one. As a particular case, this procedure is also specialized to the case of the twin primes, in such a way only the runs sized 2 are considered in each period. Consequently, a heuristic relation for the number of these runs in a period *T*(*k*) is formulated, whose trend is linked to the relation previously found for the total number of runs in the case of primes. Therefore, such a relation gives an estimation of the twin prime number function *π*2(*x*) in each interval *I*(*k*), which is equivalent to the estimation of the logarithmic integral function Li2(*x*), by means of the square of the constant *c*. Being the bounds of these intervals given by squares of primes, their number is infinite. As a consequence, the proposed procedure could give a contribution to the presumed infinity of the succession of the twin primes. Future developments will further investigate the relation of the number of runs 2 in a period *T*(*k*), together with the symmetry of the succession of the runs of zeroes.

**Author Contributions:** Conceptualization, B.A.; Methodology, B.A., S.B., L.S. and M.S.; Formal Analysis, L.S.; Investigation, B.A., S.B., L.S. and M.S.; Data Curation, B.A., L.S. and M.S.; Writing—Original Draft Preparation, B.A.; Writing—Review & Editing, B.A. and M.S.; Visualization, M.S.; Supervision, S.B.

**Funding:** This research received no external funding.

**Acknowledgments:** The author would thank the site https://primes.utm.edu/lists/small/millions/ for providing the prime numbers that have been used for the computations in this work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
