1. Introduction and Main Results
The subject of this study is upper and lower bounds for probabilities of the type , where are independent equally distributed Bernoulli random variables. In other words, we estimate tail probabilities for the binomial distribution. To this end we use the Poisson approximation.
It should be noted that although the binomial distribution is very special from the formal point of view it is of great concern in applications. Moreover, due to simplicity, more exact bounds are attainable for the binomial distribution than in the general case.
Let us start with the known Hoeffding inequality. Assuming that the independent random variables
satisfy the condition
,
, W. Hoeffding [
1] deduced the inequality
where
,
. In the case of identically distributed random variables
we have
, and the inequality (
1) remains the same. Making in (
1) the change of variable
we get
In turn this inequality can be written in the following form,
where
is the so-called relative entropy or Kullback–Leibler distance between two two-point distributions
and
concentrated at the same pair of points.
Apparently, I. Sanov [
2] was the first who stated probability inequalities in terms of the function of the type
, where
,
,
.
The starting point in proving (
1) and many other probability inequalities for independent random variables is the following bound.
Let there exist
such that
where
are the distribution functions of
,
. Then for every
, we have
where
In the case of i. i. d. random variables inequality (
5) can be written in the following form,
where
,
G is the distribution of
. On the other hand, for each
the following identity holds,
where
is the Esscher transformation of the distribution function
(see [
3]). Note that starting with the classic work of Cramér [
4], Esscher’s transform has been repeatedly used in the theory of large deviations.
Let
be such that
and denote
It follows from (
7) and (
10) that
Notice, that although the method used in this work essentially coincides with the method of our previous article on estimates of large deviations in the case of normal approximation [
5], function
differs from function
from [
5] by the absence of the factor
. The nuance is that in this work we are dealing with one-way distributions, and direct copying of the previous approach could make the reasoning unnecessarily cumbersome.
Taking into account (
9) it is easily seen that
satisfies the equality
, where
For any nondegenerate random variable we have . Therefore, . Estimating , we can sharpen the Hoeffding inequality.
Note that in the case
the asymptotics of
is found in [
4] (p. 172) under condition
,
, namely,
where
, the restriction
being imposed (see also [
6]), and
is the so called Mills ratio (
and
are the distribution function and density function, respectively, of the standard normal law).
Let be an arbitrary positive number, the distribution function of the Poisson law with the mean . We will also use the notation . Note that we consider distribution functions to be continious from left.
In connection with (
12), note that in the present work we define and use the following analogue of the Mills ratio for the Poisson distribution with an arbitrary parameter
: for every integer
M. Talagrand [
7] sharpened the Hoeffding inequalities for
, where
K is a constant, regarding which is known only that it exists. The bounds obtained in [
7] are stated in terms of
K as well.
Remark that Talagrand, like Hoeffding, considers the case of non-identically distributed random variables.
In the present work we estimate in the case of Bernoulli trials with explicit values for constants, not laying any restriction on y.
In what follows we use the next notations: F the distribution function of the Bernoulli random variable with parameter p, , the n-fold convolution of F.
In what follows we will assume
x to satisfy the following condition,
It is not hard to verify that
satisfying (
9) in the case
and
has the following form,
Notice that
under condition (
15). In what follows
.
We get from (
14) and (
16) that
Denote by
the distribution function of Poisson law with a parameter
. If the variable
x from (
18) approaches 0, it is natural to take
with
as the approximating distribution for
. Just this distribution is used in Theorem 2. However, first we need another approximating Poisson distribution with the mean
depending not only on the parameters
n and
p, but on the variable
x from formula (
15). We shall call this distribution by
the variable Poisson distribution.
Let us formulate the first statement about the connection between the behaviors of tails
and
. First introduce the function
We have
where
. Function
is presented as a series:
Note that the series
converges since by condition (
15), we have
.
Proposition 1. If condition (
15)
is fulfilled, then The following theorem gives one more form of the dependence of the tails of the binomial distribution on the tails
of the variable Poisson distribution. It is a consequence of Proposition 1, but by no means trivial, and requires the proof of a number of additional statements, which are given in
Section 3.
Theorem 1. If condition (
15)
is fulfilled, then Example 1. Let , , , . Table 1 shows the corresponding values of the function . Table 1, in accordance with Theorem 1, shows that with increasing x the approximation deteriorates. Remark 1. It is known that the binomial distribution with parameters n, p is well approximated by the Poisson one with the parameter if p is small enough [8]. The Poisson distribution from the equalities (
24)
and (
26)
has another parameter. However, we have , when x is close to 0 and . In the next claims we consider the Poisson approximation with the parameter . Note also that the Poisson distribution with parameter degenerates when x is close to 1. See also Table 2. Remark 2. A necessary condition for good approximation in (
26)
is the smallness of x, namely, . This agrees with the result by Yu.V. Prokhorov [9], according to which in the case () Poisson approximation to the binomial distribution is more precise with respect to the normal approximation. However, as x is close to 0, (). In this case, λ can be both large and small. This also applies to the values of . Note that for any . Indeed, it is easy to see that . Therefore, . This means that for all . Theorem 2. If condition (
15)
is fulfilled, then the following equality holds, where is the function from Theorem 1. Remark 3. It follows from Remark 2 that if in the representation (
26)
the difference is replaced by , where , then instead of the function , it will be necessary to insert another correction factor, which will be less than . The form of this factor is indicated in Theorem 2. In this connection, we note that the exponential function on the right-hand side of (
28)
has a negative exponent, in contrast to the exponential function in (
26)
. The following table gives an idea of the relationship between tails of the approximating distributions under consideration: and .
By we will denote quantities, maybe different in different places, satisfying the bound .
Rewrite (
28) in the form:
where
.
Let us give a table of values of the functions:
,
and
. Let
,
,
,
. Calculations arrive at the following table (
Table 3).
Taking into account that
is not much different from 1 (see
Table 3) write up
. We will use the elementary identity
. Putting
,
, we obtain
and
Note that Equality (
31) is another form of Theorem 2.
The following inequalities hold:
and
. Hence, by (
30),
if
, and
if
.
In the next theorem, the estimate of is got.
Theorem 3. If condition (
15)
is fulfilled, then - 1.
The closeness of and to 0 ensure the closeness of to zero. Moreover, as it was said in Remark 2, the closeness of to 0 agrees with [9]. - 2.
Under the condition the quantity may not tend to zero.
Remark 5. Let us discuss the connection between x, n and p the function approaching zero. Obviously, can tend to zero only if and .
Let the parameters n and p be fixed. Find . We write for brevity as . Obviously, . Therefore, the minimum of is attained at the point , and for , and for .
As a result of calculations, we make sure that if . This condition can be considered fulfilled. From here,where . Thus, if and only if . Indeed, let and be the left and right branches of the function with respect to the line . In this case, the domain of is , and is . On the other hand, for each , you can specify an interval , containing such that for the inequality will hold.
These functions are strictly monotone and therefore have inverse ones: and , respectively. Then the required interval has the form . Note that the domain of these inverse functions is the same: .
Example 2. Let , . Then , , . The graph of the function is shown in Figure 1. Take for example. Finding the roots of the equation , we get: , , . Note that ε can be chosen arbitrarily small only if is sufficiently close to 0.
The following table (Table 4) shows the behavior of the interval with decreasing ε. Note that near the point both functions and that form , make approximately the same contribution to . For instance, , where , .
Corollary 1. Let condition (
15)
be fulfilled and . Then Remark 6. Note that the behavior of the series is defined by the first summand in contrast to the Cramér series in the case of Gaussian approximation [4]. 6. Supplement
In this section, we offer the reader some conjectures regarding the behavior of .
Due to the cumbersomeness of the table, we did not place columns corresponding to .
Nevertheless, we made sure that for each
the equality
holds. Our conjecture is as follows: for every
,
Remark that the sequence
decreases monotonically for
. This property is also true for sequences
for every fixed
and
for fixed
. According to CLT,
converges in a uniform metric with the normal law
. On the other hand,
approaches
. It means that
Using formula (
83), we get the elements of the last row of
Table 5 and, hence, the elements of the last row of
Table 6.
The next conjecture concerns existence and the value of the limit of
, when
. Calculations for
suggest that
In connection with this we remark that the behaviour of differences
under condition
is investigated in [
10].
Note that (
84) is equivalent to the assumption
i.e.,
is realized at
. This fact is fairly easy to prove in the case
, using the results of the paper [
10], in which this case is considered. In the case of
, it is more difficult to find a proof, but it certainly exists.
After that we can assert the formula (
84) is valid for all
k and, moreover, there exists
. Indeed, according to [
10],
whence
Accordingly to
Table 6 the constant
in the inequality
cannot be less
.
If we impose the constraint
, then the lower bound for
is not less than
(see
Table 6). As for the upper bound for
, it is equal to
if in (
85) supremum with respect to
p is taken over all
p such that
,
.
If we adhere to the principle of incomplete induction, then the available information is sufficient to assert that .
Note that in the case , it is sufficient to swap the roles of p and .
Table 6 demonstrates the following remarkable property:
Therefore, it is highly plausible that
Moreover, the following equality is highly plausible,
The equality (
86) is another our conjecture. If this assumption is true, then instead of (
39) we have a more precise estimate
If the hypothetical estimate (
87) is correct, main statements of the present work can be sharpen.
Since
, in the right-hand side of inequality (
25) in Proposition 1 the product
can be replaced by
.
Taking into account the inequality , in all places the constant can be replaced by . In particular, in the formulations of Theorems 1–3, can be replaced by .
Taking into account that
, and using
Table 6, we arrive at the conclusion that in the case under
and
(see the row “
”, the column “
”),
If
k is growing, the coefficient at
p in (
88) decreases, but cannot be less than
.