1. Introduction
Research of the classical inequalities, such as the Jensen, the Hölder and similar, has experienced great expansion. These inequalities first appeared in discrete and integral forms, and then many generalizations and improvements have been proved (for instance, see [
1,
2]). Lately, they are proven to be very useful in information theory (for instance, see [
3]).
Let
I be an interval in
and
a convex function. If
is any
n-tuple in
and
a nonnegative
n-tuple such that
, then the well known Jensen’s inequality
holds (see [
4,
5] or for example [
6] (p. 43)). If
f is strictly convex then (1) is strict unless
for all
.
Jensen’s inequality is one of the most famous inequalities in convex analysis, for which special cases are other well-known inequalities (such as Hölder’s inequality, A-G-H inequality, etc.). Beside mathematics, it has many applications in statistics, information theory, and engineering.
Strongly related to Jensen’s inequality is the Lah–Ribarič inequality (see [
7]),
which holds when
is a convex function on
,
is as in (
1),
is any
n-tuple in
and
If
f is strictly convex then (4) is strict unless
for all
.
The Lah–Ribarič inequality has been largely investigated and the interested reader can find many related results in the recent literature as well as in monographs such as [
6,
8,
9]. It is interesting to find further refinements of the above inequality.
Our main result will be refinement of the inequality (
2).
Using the same technique, we will give a refinement of the inequality (
1) (see [
10]).
In addition, we deal with the notion of f-divergences which measure the distance between two probability distributions. One of the most important is the Csiszár f-divergence, some special cases of which are the Shannon entropy, Jeffrey’s distance, Kullback–Leibler divergence, the Hellinger distance, and the Bhattacharyya distance. We deduce the relations for the mentioned f-divergences.
Let us say few words about the organization of the paper. In the following section we give a new refinement of the Lah–Ribarič inequality and state a known refinement of the Jensen inequality using the same technique. Using obtained results we give a refinement of the famous Hölder inequality and some new refinements for the weighted power means and quasi-arithmetic means. In addition, we give a historical remark regarding the Jensen–Boas inequality. In
Section 3, we give the results for various
f-divergences. These are further examined for the Zipf–Mandelbrot law.
2. New Refinements
The starting point of this consideration is the following lemma (see [
11]).
Lemma 1. Let f be a convex function on an interval I. If such that , then the inequalityholds for any . The main result is a refinement of the Lah–Ribarič inequality (
2). As we will see, its proof is based on the idea from the proof of the Jensen–Boas inequality.
Theorem 1. Let be a convex function on , , is as in (1), be any n-tuple in and Let where for , , , for and , , for . Thenholds, where If f is concave on I, then the inequalities in (
3)
are reversed. Proof. Using the Lah–Ribarič inequality (
2) for each of the subsets
, we obtain
Using
,
and Lemma 1, we obtain
□
Remark 1. If , the related term in the sum on the right-hand side of the first inequality in the proof of Theorem 1 remains unaltered (i.e., is equal to ).
Using the same technique, we obtain the following refinement of the Jensen inequality (
1).
Theorem 2. Let I be an interval in and a convex function. Let is any n-tuple in and a nonnegative n-tuple such that . Let where for , and . Thenholds. If f is concave on I, then the inequalities in (
4)
are reversed. Proof. Using Jensen’s inequality (1), we obtain
which is (
4). □
We can find this idea for proving the refinement of our main result (and the refinement of the Jensen inequality) in one other well-known result (see [
6] (pp. 55–60)).
In Jensen’s inequality there is a condition “
a nonnegative
n-tuple such that
”. In 1919, Steffensen gave the same inequality (
1) with slightly relaxed conditions (see [
12]).
Theorem 3 (Jensen–Steffensen).
If is a convex function, is a real monotonic n-tuple such that , and is a real n-tuple such thatThen (1) holds. If f is strictly convex, then inequality (1) is strict unless . One of many generalizations of the Jensen inequality is the Riemann–Stieltjes integral form of the Jensen inequality.
Theorem 4 (the Riemann–Stieltjes form of Jensen’s inequality).
Let be a continuous convex function where I is the range of the continuous function . Inequalityholds provided that λ is increasing, bounded and . Analogously, integral form of the Jensen–Steffensen’s inequality is given.
Theorem 5 (The Jensen–Steffensen).
If f is continuous and monotonic (either increasing or decreasing) and λ is either continuous or of bounded variation satisfyingthen (
5)
holds. In 1970, Boas gave the integral analogue of Jensen–Steffensen’s inequality with slightly different conditions.
Theorem 6 (the Jensen–Boas inequality).
If f is continuous or of bounded variation satisfyingfor all , and , and if f is continuous and monotonic (either increasing or decreasing) in each of the intervals , then inequality (5) holds. In 1982, J. Pečarić gave the following proof of the Jensen–Boas inequality.
Proof. If
with the notation
we have
Using Jensen’s inequality (
1), we obtain
Using Jensen–Steffensen’s inequality (
5) on each subinterval
, we obtain
If , for some j, then on and we can easily prove that the Jensen–Boas inequality is valid. □
If we look at the previous proof, we see that the technique is the same as for our main result and the refinement of the Jensen inequality.
By using Theorem 2, we obtain the following refinement of the discrete Hölder inequality (see [
13,
14]).
Corollary 1. Let such that . Let , such that . Then: Proof. We use Theorem 2 with
. Then
and from (
4), we obtain
For the function
from (
7), we obtain
Multiplying with
, and raising to the power of
, we obtain
which is (
6). □
Corollary 2. Using the same conditions as in previous corollary for , , , we obtain Proof. First for
. We use Theorem 2 with
. Then
and from (
4), we obtain
For the function
, we obtain
Multiplying with
, and then with
, we obtain
which is (
8).
If , then , and the same result follows from symmetry (see comments in Corollary 1). □
It is interesting to show how the previously obtained results impact the study of the weighted discrete power means and the weighted discrete quasi-arithmetic means.
Let
,
,
,
,
. The weighted discrete power means of order
are defined as
Using Theorem 2, we obtain the following inequalities for the weighted discrete power means. Let us notice that left-hand side and right-hand side of both inequalities are the same; only mixed means in the middle, which are a refinement, change.
Corollary 3. Let , , , . Let such that . Thenwhere , , , , for . Proof. We use Theorem 2 with
for
,
,
,
,
. From (
4), we obtain
Substituting
with
, and then raising to the power
, we obtain
which is (
9).
Similarly, we use Theorem 2 with
for
,
,
,
. We obtain
Substituting with , and then raising to the power , inequality (10) easily follows. Other cases follow similarly. □
Let
I be an interval in
. Let
,
,
,
. Then, for a strictly monotone continuous function
, the discrete weighted quasi-arithmetic mean is defined as
Using Theorem 2, we obtain the following inequalities for quasi-arithmetic means.
Corollary 4. Let I be an interval in . Let , , , . Let be a strictly monotone continuous function such that convex. Let where for , and . Thenwhere , , , , for . Proof. Theorem 2 with
and
gives
□
3. Applications in Information Theory
In this section we give basic results concerning the discrete Csiszár f-divergence. In addition, bounds for the divergence of the Zipf–Mandelbrot law are obtained.
Let us denote the set of all probability densities by , i.e., if for and .
In [
15], Csiszár introduced the
f-divergence functional as
where
is a convex function, and it represents a “distance function” on the set of probability distributions
.
In order to use nonnegative probability distributions in the
f-divergence functional, we assume, as usual,
and the following definition of a generalized
f-divergence functional is given.
Definition 1 (the Csiszár
f-divergence functional).
Let be an interval, and let be a function. Let be an n-tuple of real numbers and be an n-tuple of nonnegative real numbers such that for every . The Csiszár f-divergence functional is defined as Theorem 7. Let I be an interval in and a convex function. Let be an n-tuple of real numbers and be an n-tuple of nonnegative real numbers such that for every . Let where for , , and . Thenholds. Proof. Using Theorem 2 with
and
, we obtain
which is (
13). □
Corollary 5. If in the previous theorem we take and to be probability distributions, and we directly obtain the following result: Theorem 8. Let be a convex function on , . Let be an n-tuple of real numbers and be an n-tuple of nonnegative real numbers such that . Let where for , , , for and , , for . Thenholds. Proof. Using Theorem 1 with
and
, we obtain
which is (
15). □
Corollary 6. If, in the previous theorem, we take and to be probability distributions, we directly obtain the following result: If
and
are probability distributions, the Kullback–Leibler divergence, also called relative entropy or KL divergence, is defined as
The next corollary provides us bounds for the Kullback–Leibler divergence of two probability distributions.
Corollary 7. Let where for , and .
Let and be n-tuples of nonnegative real numbers. Then Let and be probability distributions. Then
Proof. Let and be an n-tuples of nonnegative real numbers. Since the function is convex, first inequality follows from Theorem 7 by setting .
The second inequality is a special case of the first inequality for probability distributions and . □
Corollary 8. Let where for , and , for .
Let and be n-tuples of nonnegative real numbers. Let , , and , for . Then Let and be probability distributions. Let , , and , for . Then
Proof. Let and be an n-tuples of nonnegative real numbers. Since the function is convex, the first inequality follows from Theorem 8 by setting .
The second inequality is a special case of the first inequality for probability distributions and . □
Now we deduce the relations for some more special cases of the Csiszár f-divergence.
Definition 2 (the Shannon entropy).
For a , the discrete Shannon entropy is defined as Corollary 9. Let . Let where for , and . Then Proof. Using Theorem 7 with
and
, we obtain
For inequality (17) follows. □
Corollary 10. Let , , such that . Let where for , , , for and , , for . Thenholds. Proof. Using Theorem 8 with
,
and
, we obtain
and (
17) easily follows. □
Definition 3 (Jeffrey’s distance).
For the the discrete Jeffrey distance is defined as Corollary 11. Let . Let where for , and . Then Proof. Using Corollary 5 with
, we obtain
and (
18) easily follows. □
Corollary 12. Let , , such that . Let where for , , , for and , , for . Thenholds. Proof. Using Corollary 6 with
, we obtain
and (
19) easily follows. □
Definition 4 (the Hellinger distance).
For the , the discrete Hellinger distance is defined as Corollary 13. Let . Let where for , and . Then Proof. Using Corollary 5 with
(
20) follows. □
Corollary 14. Let , , such that . Let where for , , , for and , , for . Thenholds. Proof. Using Corollary 6 with
(
21) follows. □
Definition 5 (Bhattacharyya distance).
For the , the discrete Bhattacharyya distance is defined as Corollary 15. Let . Let where for , and . Then Proof. Using Corollary 5 with
(
22) follows. □
Corollary 16. Let , , such that . Let where for , , , for and , , for . Thenholds. Proof. Using Corollary 6 with
(
23) follows. □
Now we are going to derive the results from the Theorems (7) and (8) for the Zipf–Mandelbrot law.
The Zipf–Mandelbrot law is a discrete probability distribution and is defined by the following probability mass function:
where
is a generalization of the harmonic number and
,
and
are parameters.
If we define
as a Zipf–Mandelbrot law M-tuple, we have
where
and the Csiszár functional becomes
where
, and the parameters
are such that
.
If
and
are both defined as Zipf–Mandelbrot law M-tuples, then the Csiszár functional becomes
where
, and the parameters
are such that
.
Now, from Theorem 7, we have the following result.
Corollary 17. Let I be an interval in and a convex function. Let be an n-tuple of real numbers and be an n-tuple of nonnegative real numbers such that for every . Let where for , . Suppose are such that , . Thenholds. Proof. If we define
as a Zipf–Mandelbrot law
n-tuple with parameters
, then from Theorem 7 it follows
which is (
24). □
From Theorem 8 we have the following result.
Corollary 18. Let be a convex function on , . Let be an n-tuple of real numbers. Suppose are such that . Let where for , , , and , , for . Thenholds. Proof. If we define
as a Zipf–Mandelbrot law
n-tuple with parameters
, then from Theorem 8 it follows
which is (
25). □
Now, from Theorem 7, we also have the following result.
Corollary 19. Let I be an interval in and a convex function. Let where for , . Suppose are such that , . Thenholds. Proof. If we define
as a Zipf–Mandelbrot law
n-tuples with parameters
, then from Theorem 7, we obtain (
26). □
From Theorem 8, we have the following result.
Corollary 20. Let be a convex function on , . Suppose are such that . Let where for , , , and , , for . Thenholds. Proof. If we define
as a Zipf–Mandelbrot law
n-tuples with parameters
, then from Theorem 8, we obtain (
27). □
Since the minimal value for
is
and its maximal value is
, from the right-hand side of (
24) and the left-hand side of (
25), we obtain the following result.
Corollary 21. Let be a convex function on , . Let be an n-tuple of real numbers. Suppose are such that . Let where for , , , and , , for . Thenholds. Proof. Using
and
from the right-hand side of (
24) and the left-hand side of (
25), we obtain
and (
28) follows. □
4. Conclusions
In this paper we have obtained a refinement of the Lah–Ribarič inequality and a refinement of the Jensen inequality which follows from using the Lah–Ribarič inequality and the Jensen inequality on disjunctive subsets of .
Using these results, we find a refinement of the discrete Hölder inequality and a refinement of some inequalities for the discrete weighted power means and the discrete weighted quasi-arithmetic means. In addition, some interesting estimations for the discrete Csiszár divergence and for its important special cases are obtained.
It would be interesting to see whether using this method one can give refinements of some other inequalities. In addition, we can try to use this method for refining the Jensen inequality and the Lah–Ribarič inequality for operators.