Next Article in Journal
RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model
Previous Article in Journal
The Intrinsic Dimension of Neural Network Ensembles
Previous Article in Special Issue
Quasi-Optimal Path Convergence-Aided Automorphism Ensemble Decoding of Reed–Muller Codes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Constrained Talagrand Transportation Inequality with Applications to Rate-Distortion-Perception Theory

1
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
2
Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada
3
School of Statistics and Data Science, LPMC, KLMDASR, and LEBPS, Nankai University, Tianjin 300071, China
4
School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(4), 441; https://doi.org/10.3390/e27040441
Submission received: 13 March 2025 / Revised: 18 April 2025 / Accepted: 18 April 2025 / Published: 19 April 2025
(This article belongs to the Special Issue Advances in Information and Coding Theory, the Third Edition)

Abstract

:
A constrained version of Talagrand’s transportation inequality is established, which reveals an intrinsic connection between the Gaussian distortion-rate-perception functions with limited common randomness under the Kullback–Leibler divergence-based and squared Wasserstein-2 distance-based perception measures. This connection provides an organizational framework for assessing existing bounds on these functions. In particular, we show that the best-known bounds of Xie et al. are nonredundant when examined through this connection.

1. Introduction

Traditional rate-distortion theory [1] seeks to determine the minimum rate required to encode a source while ensuring that the expected distortion remains below a given threshold. However, minimizing distortion alone does not always align with human perception, particularly in applications like image and audio compression, where perceptual quality plays a crucial role. Rate-distortion-perception theory addresses this by introducing a perception constraint [2], measured by a divergence between the source and reconstruction distributions, to ensure that the reconstructed signal remains perceptually similar to the original. This framework enables a more nuanced tradeoff between compression efficiency, signal fidelity, and perceptual quality, making it particularly relevant in modern machine learning and generative model-based compression techniques. The origin of rate-distortion-perception theory can be traced back to the foundational work of Klejsa et al. [3,4] and Saldi et al. [5,6] on distribution-preserving quantization. However, it was arguably the influential paper by Blau and Michaeli [7] (see also [8,9]) that brought the theory to the forefront of the research community’s attention. Since then, the field has developed rapidly, offering insights into architectural design principles [10,11,12,13], the role of randomness [14,15,16,17], and fundamental performance limits [18,19,20,21,22]. These advances have also catalyzed a variety of new research directions and applications [23,24,25,26,27,28,29,30,31,32].
Kullback–Leibler divergence and squared Wasserstein-2 distance are among the most widely adopted perception measures. When the source distribution is Gaussian, these two measures are intrinsically linked through Talagrand’s transportation inequality [33]. Exploring the implications of this connection in rate-distortion-perception theory is of significant interest. The availability of partial knowledge of the reconstruction distribution in perception-aware lossy source coding, in turn, motivates the study of constrained versions of Talagrand’s transportation inequality. Such inequalities will further strengthen the link between the information-theoretic performance limits under these two perception measures.
The main contributions of this paper are as follows:
1.
We prove a variant of Talagrand’s transportation inequality, where the reference distribution is Gaussian and the other distribution is subject to constraints on its first- and second-order statistics.
2.
This inequality is then used to establish a connection between the Gaussian distortion-rate-perception functions with limited common randomness under the Kullback–Leibler divergence-based and squared Wasserstein-2 distance-based perception measures. We leverage this connection as an organizational framework to assess existing bounds on these functions. In particular, it is shown that the best-known bounds of Xie et al. [22] are nonredundant when examined through this connection.
The rest of this paper is organized as follows. Section 2 presents a constrained Talagrand’s transportation inequality. Its application to rate-distortion-perception theory is explored in Section 3. We conclude this paper in Section 4.
We adopt standard notations for information measures, e.g., h ( · ) for differential entropy and I ( · ; · ) for mutual information. For a given random variable X, its distribution, mean, and variance are written as p X , μ X , and σ X 2 , respectively. A Gaussian distribution with mean μ and variance σ 2 is denoted by N ( μ , σ 2 ) . We use Π ( p X , p X ^ ) to represent the set of all possible joint distributions with marginals p X and p X ^ . The cardinality of set S is expressed as | S | . For a real number a, define ( a ) + : = max { a , 0 } . Throughout this paper, the logarithm function is assumed to have a base e.

2. A Constrained Talagrand Transportation Inequality

For p X = N ( μ X , σ X 2 ) , Talagrand’s transportation inequality [33] states that
W 2 2 ( p X , p X ^ ) 2 σ X 2 ϕ K L ( p X ^ p X ) ,
where
ϕ ( p X ^ p X ) : = E log p X ^ ( X ^ ) p X ( X ^ )
is the Kullback–Leibler divergence and
W 2 2 ( p X , p X ^ ) : = inf p X X ^ Π ( p X , p X ^ ) E [ ( X X ^ ) 2 ]
is the squared Wasserstein-2 distance. Note that Talagrand’s transportation inequality does not impose any assumptions on p X ^ . However, in practice, we often have partial knowledge of p X ^ , which can be exploited to strengthen the inequality. In this paper, we focus on the case where p X ^ satisfies μ X ^ = μ X and σ X ^ σ X . Under these constraints on p X ^ , we establish the following variant of Talagrand’s transportation inequality:
Theorem 1.
For p X = N ( μ X , σ X 2 ) and p X ^ with μ X ^ = μ X and σ X ^ σ X ,
W 2 2 ( p X , p X ^ ) 2 σ X 2 ( 1 e ϕ K L ( p X ^ p X ) ) .
It is clear that (4) is stronger than (1) since 1 + z e z for z R . To prove Theorem 1, we need the following well-known result (see, e.g., Propositions 1 and 2 [22]) concerning the Gaussian extremal property of the Kullback–Leibler divergence and the squared Wasserstein-2 distance.
Lemma 1.
For p X = N ( μ X , σ X 2 ) and p X ^ with E [ X ^ 2 ] < ,
ϕ K L ( p X ^ p X ) ϕ K L ( p X ^ G p X ) = log σ X σ X ^ + ( μ X μ X ^ ) 2 + σ X ^ 2 σ X 2 2 σ X 2
and
W 2 2 ( p X , p X ^ ) W 2 2 ( p X , p X ^ G ) = ( μ X μ X ^ ) 2 + ( σ X σ X ^ ) 2 ,
where p X ^ G : = N ( μ X ^ , σ X ^ 2 ) .
Lemma 1 indicates that when the reference distribution is Gaussian, replacing the other distribution with its Gaussian counterpart leads to reductions in both the Kullback–Leibler divergence and the squared Wasserstein-2 distance. These reductions turn out to be quantitatively related, as shown by the next result.
Lemma 2.
For p X = N ( μ X , σ X 2 ) and p X ^ with E [ X ^ 2 ] < ,
W 2 2 ( p X , p X ^ ) W 2 2 ( p X , p X ^ G ) 2 σ X σ X ^ ( 1 e ( ϕ K L ( p X ^ p X ) ϕ K L ( p X ^ G p X ) ) ) .
Proof of Lemma 2.
Note that
W 2 2 ( p X , p X ^ ) = ( μ X μ X ^ ) 2 + W 2 2 ( p X μ X , p X ^ μ X ^ ) = ( μ X μ X ^ ) 2 + σ X 2 W 2 2 ( p σ X 1 ( X μ X ) , p σ X 1 ( X ^ μ X ^ ) ) ( a ) ( μ X μ X ^ ) 2 + σ X 2 + σ X ^ 2 2 σ X 2 1 2 π e e 2 h ( σ X 1 X ^ ) = ( b ) W 2 2 ( p X , p X ^ G ) + 2 σ X σ X ^ 2 σ X 2 1 2 π e e 2 h ( σ X 1 X ^ ) ,
where (a) is due to (Equation (8), [34]) and (b) is due to Lemma 1. Moreover,
h ( σ X 1 X ^ ) = h ( X ^ ) log σ X = 1 2 log 2 π e σ X ^ 2 σ X 2 ϕ K L ( p X ^ p X ) + ϕ K L ( p X ^ G p X ) .
Substituting (9) into (8) proves Lemma 2. □
Proof of Theorem 1.
In view of Lemmas 1 and 2,
W 2 2 ( p X , p X ^ ) max μ , σ η ( μ , σ )
subject to μ = μ X ,
σ σ X ,
( μ X μ ) 2 2 σ X 2 + ψ ( σ ) ϕ K L ( p X ^ p X ) ,
where
η ( μ , σ ) : = 2 σ X 2 e ( μ X μ ) 2 + σ 2 σ X 2 2 σ X 2 ϕ K L ( p X ^ p X ) + ( μ X μ ) 2 + σ X 2 + σ 2
and
ψ ( σ ) : = log σ X σ + σ 2 σ X 2 2 σ X 2 .
Since ψ ( σ ) decreases monotonically from to 0 as σ varies from 0 to σ X and increases monotonically from 0 to as σ varies from σ X to , there must exist σ ̲ σ X and σ ¯ σ X satisfying
ψ ( σ ̲ ) = ψ ( σ ¯ ) = ϕ K L ( p X ^ p X ) .
Note that (10)–(13) can be written compactly as
W 2 2 ( p X , p X ^ ) max σ [ σ ̲ , σ X ] η ( μ X , σ ) .
For σ [ σ ̲ , σ X ] , we have
σ 2 σ X 2 2 σ X 2 ϕ K L ( p X ^ p X ) 0 ,
and, consequently,
σ η ( μ X , σ ) = 2 σ e σ 2 σ X 2 2 σ X 2 ϕ K L ( p X ^ p X ) + 2 σ 0 ,
which implies that the maximum in (17) is attained at σ = σ X . Thus,
W 2 2 ( p X , p X ^ ) η ( μ X , σ X ) = 2 σ X 2 ( 1 e ϕ K L ( p X ^ p X ) ) .
This proves Theorem 1. □
The following result shows that Talagrand’s transportation inequality (1) corresponds to a relaxed version of (10), obtained by removing Constraints (11) and (12).
Theorem 2.
For p X = N ( μ X , σ X 2 ) and p X ^ with E [ X ^ 2 ] < ,
2 σ X 2 ϕ K L ( p X ^ p X ) = max μ , σ η ( μ , σ )
s u b j e c t   t o ( μ X μ ) 2 2 σ X 2 + ψ ( σ ) ϕ K L ( p X ^ p X ) .
Proof of Theorem 2.
First, recall the definitions of σ ̲ and σ ¯ from (16). It can be verified that
( μ X μ ) 2 η ( μ , σ ) = e ( μ X μ ) 2 + σ 2 σ X 2 2 σ X 2 ϕ K L ( p X ^ p X ) + 1 .
Given σ < σ ̲ , there is no μ satisfying (22). Given σ [ σ ̲ , σ X ] , for μ satisfying (22), we have
( μ X μ ) 2 + σ 2 σ X 2 2 σ X 2 ϕ K L ( p X ^ p X ) = ( μ X μ ) 2 2 σ X 2 + ψ ( σ ) ϕ K L ( p X ^ p X ) log σ X σ 0 ,
and, consequently,
( μ X μ ) 2 η ( μ , σ ) 0 ,
which implies that the maximum value of η ( μ , σ ) over μ satisfying (22) is attained when
log σ X σ + ( μ X μ ) 2 + σ 2 σ X 2 2 σ X 2 = ϕ K L ( p X ^ p X ) .
Therefore, for σ [ σ ̲ , σ X ] ,
max μ : ( 22 ) η ( μ , σ ) = κ ( σ ) ,
where
κ ( σ ) : = 2 σ X 2 ( ϕ K L ( p X ^ p X ) log σ X σ + 1 ) 2 σ X σ .
Since the maximum value of κ ( σ ) over σ [ σ ̲ , σ X ] is attained at σ = σ X , it follows that
max σ [ σ ̲ , σ X ] max μ : ( 22 ) η ( μ , σ ) = 2 σ X 2 ϕ K L ( p X ^ p X ) .
Given σ ( σ X , 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 ) , for μ satisfying (22), we have
( μ X μ ) 2 η ( μ , σ ) 0 if ( μ X μ ) 2 + σ 2 σ X 2 2 σ X 2 ϕ K L ( p X ^ p X ) , < 0 if ( μ X μ ) 2 + σ 2 σ X 2 2 σ X 2 > ϕ K L ( p X ^ p X ) ,
which implies that the maximum value of η ( μ , σ ) over μ satisfying (22) is attained when
( μ X μ ) 2 + σ 2 σ X 2 2 σ X 2 = ϕ K L ( p X ^ p X ) .
Therefore, for σ ( σ X , 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 ) ,
max μ : ( 22 ) η ( μ , σ ) = 2 σ X 2 ϕ K L ( p X ^ p X ) .
As a consequence,
max σ ( σ X , 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 ) max μ : ( 22 ) η ( μ , σ ) = 2 σ X 2 ϕ K L ( p X ^ p X ) .
Given σ [ 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 , σ ¯ ] , for μ satisfying (22), we have
( μ X μ ) 2 η ( μ , σ ) 0 ,
which implies that the maximum value of η ( μ , σ ) over μ satisfying (22) is attained when
( μ X μ ) 2 = 0 , i . e . , μ = μ X .
Therefore, for σ [ 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 , σ ¯ ] ,
max μ : ( 22 ) η ( μ , σ ) = κ ( σ ) ,
where
κ ( σ ) : = 2 σ X 2 e σ 2 σ X 2 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 + σ 2 .
Since the maximum value of κ ( σ ) over σ [ 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 , σ ¯ ] is attained at σ = 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 , it follows that
max σ [ 2 σ X 2 ϕ K L ( p X ^ p X ) + σ X 2 , σ ¯ ] max μ : ( 22 ) η ( μ , σ ) = 2 σ X 2 ϕ K L ( p X ^ p X ) .
Given σ > σ ¯ , there is no μ satisfying (22). Combining (29), (33), and (38) proves Theorem 2. □

3. Application to Rate-Distortion-Perception Theory

A length-n perception-aware lossy source coding system consists of an encoder f ( n ) : R n × K J , a decoder g ( n ) : J × K R n , and a random seed K. It takes an i.i.d. source sequence X n as input and produces an i.i.d. reconstruction sequence X ^ n . Specifically, the encoder maps X n and K to a codeword J in codebook J according to some conditional distribution p J | X n K , while the decoder generates X ^ n based on J and K according to some conditional distribution p X ^ n | J K . Here, K is assumed to be uniformly distributed over the alphabet K and independent of X n . End-to-end distortion is quantified by 1 n t = 1 n E [ ( X t X ^ t ) 2 ] and perceptual quality by 1 n t = 1 n ϕ ( p X t , p X ^ t ) with some divergence ϕ . It is clear that 1 n t = 1 n ϕ ( p X t , p X ^ t ) = ϕ ( p X , p X ^ ) , where p X and p X ^ are the marginal distributions of X n and X ^ n , respectively.
Definition 1.
For an i.i.d. source { X t } t = 1 , distortion level D is said to be achievable and subject to the compression rate constraint R, the common randomness rate constraint R c , and the perception constraint P if there exists a length-n perception-aware lossy source coding system such that
1 n log | J | R ,
1 n log | K | R c ,
1 n t = 1 n E [ ( X t X ^ t ) 2 ] D ,
1 n t = 1 n ϕ ( p X t , p X ^ t ) P .
Moreover, the reconstruction sequence X ^ n is ensured to be i.i.d. The infimum of such achievable distortion levels D is denoted by D ( R , R c , P | ϕ ) .
The following result, which is built upon (Theorem 1, [6]) (see also (Theorem 2, [15])), provides a single-letter characterization of D ( R , R c , P | ϕ ) .
Theorem 3
(Theorem 1, [22]). For p X with E [ X 2 ] < ,
D ( R , R c , P | ϕ ) = inf p U X ^ | X E [ ( X X ^ ) 2 ]
s u b j e c t   t o X U X ^   f o r m   a   M a r k o v   c h a i n ,
I ( X ; U ) R ,
I ( X ^ ; U ) R + R c ,
ϕ ( p X , p X ^ ) P .
According to (Lemmas 1 and 3, [22]), for p X = N ( μ X , σ X 2 ) , there is no loss of generality in focusing on p X ^ with μ X ^ = μ X and σ X ^ σ X as far as D ( R , R c , P | ϕ K L ) and D ( R , R c , P | W 2 2 ) are concerned; therefore, it follows from Theorem 1 that
D ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 ) D ( R , R c , P | ϕ K L ) .
This reveals an intrinsic connection between the Gaussian distortion-rate-perception functions with limited common randomness under the Kullback–Leibler divergence-based and squared Wasserstein-2 distance-based perception measures. Since D ( R , R c , P | ϕ K L ) and D ( R , R c , P | W 2 2 ) do not appear to have explicit expressions, recent research efforts have been devoted to deriving bounds for these functions. Note that via (48), every lower bound on D ( R , R c , · | W 2 2 ) induces a corresponding lower bound on D ( R , R c , · | ϕ K L ) , and every upper bound on D ( R , R c , · | ϕ K L ) induces a corresponding upper bound on D ( R , R c , · | W 2 2 ) . As a consequence, a lower bound on D ( R , R c , · | ϕ K L ) (or an upper bound on D ( R , R c , · | W 2 2 ) ) can be considered redundant if it is implied by a lower bound on D ( R , R c , · | W 2 2 ) (or an upper bound on D ( R , R c , · | ϕ K L ) ) through this connection. This provides an organizational framework for assessing existing bounds on these functions. As an illustrative example, we examine the best-known bounds due to Xie et al. [22], summarized in the following two theorems, from this perspective.
Let
ξ ( R , R c ) : = ( 1 e 2 R ) ( 1 e 2 ( R + R c ) ) .
Moreover, let σ ( P ) be the unique number σ [ 0 , σ X ] satisfying ψ ( σ ) = P , where ψ ( σ ) is defined in (15).
Theorem 4
(Theorem 3, [22]). For p X = N ( μ X , σ X 2 ) ,
D ̲ ( R , R c , P | ϕ K L ) D ( R , R c , P | ϕ K L ) D ¯ ( R , R c , P | ϕ K L ) ,
where
D ̲ ( R , R c , P | ϕ K L ) : = min σ X ^ [ σ ( P ) , σ X ] σ X 2 + σ X ^ 2 2 σ X σ X ^ ( 1 e 2 R ) ( 1 e 2 ( R + R c + P ψ ( σ X ^ ) ) )
and
D ¯ ( R , R c , P | ϕ K L ) : = σ X 2 σ X 2 ξ 2 ( R , R c ) + ( σ ( P ) σ X ξ ( R , R c ) ) + 2 .
Theorem 5
(Theorem 4, [22]). For p X = N ( μ X , σ X 2 ) ,
D ̲ ( R , R c , P | W 2 2 ) D ( R , R c , P | W 2 2 ) D ¯ ( R , R c , P | W 2 2 ) ,
where
D ̲ ( R , R c , P | W 2 2 ) : = min σ X ^ [ ( σ X P ) + , σ X ] σ X 2 + σ X ^ 2 2 σ X ( 1 e 2 R ) ( σ X ^ 2 ( σ X e ( R + R c ) P ) + 2 )
and
D ¯ ( R , R c , P | W 2 2 ) : = σ X 2 σ X 2 ξ 2 ( R , R c ) + ( σ X P σ X ξ ( R , R c ) ) + 2 .
In view of (48), Theorems 4 and 5 imply that
D ( R , R c , P | ϕ K L ) D ̲ ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 )
and
D ( R , R c , P | W 2 2 ) D ¯ ( R , R c , ν ( P ) | ϕ K L ) ,
where
ν ( P ) : = log 2 σ X 2 ( 2 σ X 2 P ) + .
It is thus of considerable interest to see how these induced bounds compare to their counterparts in Theorems 4 and 5, namely
D ( R , R c , P | ϕ K L ) D ̲ ( R , R c , P | ϕ K L )
and
D ( R , R c , P | W 2 2 ) D ¯ ( R , R c , P | W 2 2 ) .
The following result indicates that (56) and (57) are, in general, looser. In this sense, (59) and (60) are nonredundant.
Theorem 6.
For p X = N ( μ X , σ X 2 ) ,
D ̲ ( R , R c , P | ϕ K L ) D ̲ ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 )
and
D ¯ ( R , R c , P | W 2 2 ) D ¯ ( R , R c , ν ( P ) | ϕ K L ) .
Proof of Theorem 6.
In view of the definitions of D ̲ ( R , R c , P | ϕ K L ) and D ̲ ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 ) , for the purpose of proving (61), it suffices to show
[ σ ( P ) , σ X ] [ ( σ X 2 σ X 2 ( 1 e P ) ) + , σ X ]
and
σ X ^ 2 ( σ X e ( R + R c ) 2 σ X 2 ( 1 e P ) ) + 2 σ X ^ 2 σ X ^ 2 e 2 ( R + R c + P ψ ( σ X ^ ) )
for σ X ^ [ σ ( P ) , σ X ] . Invoking (4) with p X ^ = N ( μ X , σ ( P ) ) (see also Lemma 1 for the expressions of the Kullback–Leibler divergence and the squared Wasserstein-2 distance between two Gaussian distributions) yields
( σ X σ ( P ) ) 2 2 σ X 2 ( 1 e P ) ,
from which (63) follows immediately. Note that (64) is trivially true when e ( R + R c ) 2 ( 1 e P ) . When e ( R + R c ) > 2 ( 1 e P ) , it can be written equivalently as
2 ( 1 e P ) e ( R + R c ) ( 1 e ( P + σ X 2 σ X ^ 2 2 σ X 2 ) ) .
Since e ( R + R c ) 1 and
1 e ( P + σ X 2 σ X ^ 2 2 σ X 2 ) 1 e ( P + σ X 2 σ 2 ( P ) 2 σ X 2 )
for σ X ^ [ σ ( P ) , σ X ] , it suffices to show
2 ( 1 e P ) 1 e ( P + σ X 2 σ 2 ( P ) 2 σ X 2 ) .
According to the definition of σ ( P ) ,
P = log σ X σ ( P ) + σ 2 ( P ) σ X 2 2 σ X 2 .
Substituting (69) into (68) gives
2 ( 1 e log σ ( P ) σ X σ 2 ( P ) 2 σ X 2 + 1 2 ) 1 σ ( P ) σ X .
We can rewrite (70) as
τ ( β ) 0 ,
where
τ ( β ) : = 1 2 β e β 2 2 + 1 2 + 2 β β 2
with β : = σ ( P ) σ X . Note that β [ 0 , 1 ] . We have
d τ ( β ) d β = 2 e β 2 2 + 1 2 + 2 β 2 e β 2 2 + 1 2 + 2 2 β 2 ( 1 β 2 ) + 2 2 β = 2 ( 1 β ) β 0 .
Since τ ( 1 ) = 0 , it follows that τ ( β ) 0 for β [ 0 , 1 ] , which verifies (71) and consequently proves (64).
Now, we proceed to prove (62), which is equivalent to
D ¯ ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 ) D ¯ ( R , R c , P | ϕ K L ) .
Since D ¯ ( R , R c , P | ϕ K L ) = D ¯ ( R , R c , ( σ X σ ( P ) ) 2 | W 2 2 ) , it suffices to show
( σ X σ ( P ) ) 2 2 σ X 2 ( 1 e P ) ,
i.e.,
P log 2 σ X 2 σ X 2 σ 2 ( P ) + 2 σ X σ ( P ) .
Substituting (69) into (76) and rearranging the inequality yields
log σ X 2 σ 2 ( P ) + 2 σ X σ ( P ) 2 σ X σ ( P ) σ X 2 σ 2 ( P ) 2 σ X 2 ,
which is indeed true since
log σ X 2 σ 2 ( P ) + 2 σ X σ ( P ) 2 σ X σ ( P ) ( a ) 1 2 σ X σ ( P ) σ X 2 σ 2 ( P ) + 2 σ X σ ( P ) = σ X 2 σ 2 ( P ) σ X 2 σ 2 ( P ) + 2 σ X σ ( P ) σ X 2 σ 2 ( P ) 2 σ X 2 ,
where (a) is due to log z 1 1 z for z > 0 . This completes the proof of (62). □
It can be seen from Figure 1 that D ̲ ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 ) is indeed a looser lower bound on D ( R , R c , P | ϕ K L ) as compared to D ̲ ( R , R c , P | ϕ K L ) , and the latter almost meets the upper bound D ¯ ( R , R c , P | ϕ K L ) . Similarly, Figure 2 shows that D ¯ ( R , R c , ν ( P ) | ϕ K L ) is indeed a looser upper bound on D ( R , R c , P | W 2 2 ) as compared to D ¯ ( R , R c , P | W 2 2 ) , especially in the low-rate regime, where the latter has a diminishing gap from the lower bound D ̲ ( R , R c , P | W 2 2 ) .
The fact that the bounds in Theorems 4 and 5 are nonredundant when examined through the connection in (48) serves as evidence of their non-triviality. Consequently, further improvements will likely require exploring deeper properties of the Kullback–Leibler divergence and squared Wasserstein-2 distance.

4. Conclusions

In this work, we have established a constrained variant of Talagrand’s transportation inequality. This result reveals a fundamental link between the information-theoretic performance limits of perception-aware lossy source coding under the Kullback–Leibler divergence-based and squared Wasserstein-2 distance-based perception measures. Moreover, it provides an organizational framework for assessing existing bounds in this setting. We believe that similar approaches could be applied to other perception measures. More broadly, the interplay between transportation inequalities and rate-distortion-perception theory presents a rich avenue for further exploration, with promising implications for both theoretical advancements and practical applications.

Author Contributions

Conceptualization, L.X. and J.C.; methodology, J.C. and L.Y.; validation, L.L.; formal analysis, L.X. and J.C.; writing—original draft preparation, L.X. and J.C.; writing—review and editing, J.C.; visualization, L.L.; supervision, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 1991. [Google Scholar]
  2. Blau, Y.; Michaeli, T. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6228–6237. [Google Scholar]
  3. Li, M.; Klejsa, J.; Kleijn, W.B. Distribution preserving quantization with dithering and transformation. IEEE Signal Process. Lett. 2010, 17, 1014–1017. [Google Scholar] [CrossRef]
  4. Klejsa, J.; Zhang, G.; Li, M.; Kleijn, W.B. Multiple description distribution preserving quantization. IEEE Trans. Signal Process. 2013, 61, 6410–6422. [Google Scholar] [CrossRef]
  5. Saldi, N.; Linder, T.; Yüksel, S. Randomized quantization and source coding with constrained output distribution. IEEE Trans. Inf. Theory 2015, 61, 91–106. [Google Scholar] [CrossRef]
  6. Saldi, N.; Linder, T.; Yüksel, S. Output constrained lossy source coding with limited common randomness. IEEE Trans. Inf. Theory 2015, 61, 4984–4998. [Google Scholar] [CrossRef]
  7. Blau, Y.; Michaeli, T. Rethinking lossy compression: The rate-distortion-perception tradeoff. Proc. Mach. Learn. Res. 2019, 97, 675–685. [Google Scholar]
  8. Matsumoto, R. Introducing the perception-distortion tradeoff into the rate-distortion theory of general information sources. IEICE Comm. Express 2018, 7, 427–431. [Google Scholar] [CrossRef]
  9. Matsumoto, R. Rate-distortion-perception tradeoff of variable-length source coding for general information sources. IEICE Comm. Express 2019, 8, 38–42. [Google Scholar] [CrossRef]
  10. Yan, Z.; Wen, F.; Ying, R.; Ma, C.; Liu, P. On perceptual lossy compression: The cost of perceptual reconstruction and an optimal training framework. Proc. Mach. Learn. Res. 2021, 139, 11682–11692. [Google Scholar]
  11. Zhang, G.; Qian, J.; Chen, J.; Khisti, A. Universal rate-distortion-perception representations for lossy compression. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021; pp. 11517–11529. [Google Scholar]
  12. Freirich, D.; Michaeli, T.; Meir, R. A theory of the distortion-perception tradeoff in Wasserstein space. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021; pp. 25661–25672. [Google Scholar]
  13. Yan, Z.; Wen, F.; Liu, P. Optimally controllable perceptual lossy compression. In Proceedings of the ICMLC 2022: 2022 14th International Conference on Machine Learning and Computing, Guangzhou, China, 18–21 February 2022; pp. 24911–24928. [Google Scholar]
  14. Theis, L.; Agustsson, E. On the advantages of stochastic encoders. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021; pp. 1–8. [Google Scholar]
  15. Wagner, A.B. The rate-distortion-perception tradeoff: The role of common randomness. arXiv 2022, arXiv:2202.04147. [Google Scholar]
  16. Chen, J.; Yu, L.; Wang, J.; Shi, W.; Ge, Y.; Tong, W. On the rate-distortion-perception function. IEEE J. Sel. Areas Inf. Theory 2022, 3, 664–673. [Google Scholar] [CrossRef]
  17. Hamdi, Y.; Wagner, A.B.; Gündxuxz, D. The rate-distortion-perception trade-off: The role of private randomness. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT 2024), Athens, Greece, 7–12 July 2024; pp. 1083–1088. [Google Scholar]
  18. Theis, L.; Wagner, A.B. A coding theorem for the rate-distortion-perception function. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021; pp. 1–5. [Google Scholar]
  19. Freirich, D.; Weinberger, N.; Meir, R. Characterization of the distortion-perception tradeoff for finite channels with arbitrary metrics. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT 2024), Athens, Greece, 7–12 July 2024; pp. 238–243. [Google Scholar]
  20. Serra, G.; Stavrou, P.A.; Kountouris, M. On the computation of the Gaussian rate–distortion–perception function. IEEE J. Sel. Areas Inf. Theory 2024, 5, 314–330. [Google Scholar] [CrossRef]
  21. Qian, J.; Salehkalaibar, S.; Chen, J.; Khisti, A.; Yu, W.; Shi, W.; Ge, Y.; Tong, W. Rate-distortion-perception tradeoff for vector Gaussian sources. IEEE J. Sel. Areas Inf. Theory 2025, 6, 1–17. [Google Scholar] [CrossRef]
  22. Xie, L.; Li, L.; Chen, J.; Zhang, Z. Output-constrained lossy source coding with application to rate-distortion-perception theory. IEEE Trans. Commun. 2025, 73, 1801–1815. [Google Scholar] [CrossRef]
  23. Xu, T.; Zhang, Q.; Li, Y.; He, D.; Wang, Z.; Wang, Y.; Qin, H.; Wang, Y.; Liu, J.; Zhang, Y.-Q. Conditional perceptual quality preserving image compression. arXiv 2023, arXiv:2308.08154. [Google Scholar]
  24. Niu, X.; Gündüz, D.; Bai, B.; Han, W. Conditional rate-distortion-perception trade-off. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 1068–1073. [Google Scholar]
  25. Qiu, Y.; Wagner, A.B.; Ballé, J.; Theis, L. Wasserstein distortion: Unifying fidelity and realism. In Proceedings of the 2024 58th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 11–13 March 2024; pp. 1–6. [Google Scholar]
  26. Qiu, Y.; Wagner, A.B. Low-rate, low-distortion compression with Wasserstein distortion. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT 2024), Athens, Greece, 7–12 July 2024; pp. 855–860. [Google Scholar]
  27. Salehkalaibar, S.; Chen, J.; Khisti, A.; Yu, W. Rate-distortion-perception tradeoff based on the conditional-distribution perception measure. IEEE Trans. Inf. Theory 2024, 70, 8432–8454. [Google Scholar] [CrossRef]
  28. Zhou, C.; Lu, G.; Li, J.; Chen, X.; Cheng, Z.; Song, L.; Zhang, W. Controllable distortion-perception tradeoff through latent diffusion for neural image compression. In Proceedings of the 2025 AAAI Conference on Artificial Intelligence, Dubai, United Arab Emirates, 20–22 May 2025; pp. 10725–10733. [Google Scholar]
  29. Niu, X.; Bai, B.; Guo, N.; Zhang, W.; Han, W. Rate–distortion–perception trade-off in information theory, generative models, and intelligent communications. Entropy 2025, 27, 373. [Google Scholar] [CrossRef]
  30. Gunlu, O.; Skorski, M.; Poor, H.V. Low-latency Rate-distortion-perception Trade-Off: A Randomized Distributed Function Computation Application. 2025. Cryptology ePrint Archive. Paper 2025/613. Available online: https://eprint.iacr.org/2025/613 (accessed on 12 March 2025).
  31. Tan, K.; Dai, J.; Liu, Z.; Wang, S.; Qin, X.; Xu, W.; Niu, K.; Zhang, P. Rate-distortion-perception controllable joint source-channel coding for high-fidelity generative semantic communications. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 672–686. [Google Scholar] [CrossRef]
  32. Lei, E.; Hassani, H.; Bidokhti, S.S. Optimal neural compressors for the rate-distortion-perception tradeoff. arXiv 2025, arXiv:2503.17558. [Google Scholar]
  33. Talagrand, M. Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 1996, 6, 587–600. [Google Scholar] [CrossRef]
  34. Bai, Y.; Wu, X.; Özgür, A. Information constrained optimal transport: From Talagrand, to Marton, to Cover. IEEE Trans. Inf. Theory 2023, 69, 2059–2073. [Google Scholar] [CrossRef]
Figure 1. Illustrations of D ¯ ( R , R c , P | ϕ K L ) , D ̲ ( R , R c , P | ϕ K L ) , and D ̲ ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 ) for p X = N ( 0 , 1 ) , R c = 0 , and P = 0.1 .
Figure 1. Illustrations of D ¯ ( R , R c , P | ϕ K L ) , D ̲ ( R , R c , P | ϕ K L ) , and D ̲ ( R , R c , 2 σ X 2 ( 1 e P ) | W 2 2 ) for p X = N ( 0 , 1 ) , R c = 0 , and P = 0.1 .
Entropy 27 00441 g001
Figure 2. Illustrations of D ¯ ( R , R c , ν ( P ) | ϕ K L ) , D ¯ ( R , R c , P | W 2 2 ) , and D ̲ ( R , R c , P | W 2 2 ) for p X = N ( 0 , 1 ) , R c = 0 , and P = 0.1 .
Figure 2. Illustrations of D ¯ ( R , R c , ν ( P ) | ϕ K L ) , D ¯ ( R , R c , P | W 2 2 ) , and D ̲ ( R , R c , P | W 2 2 ) for p X = N ( 0 , 1 ) , R c = 0 , and P = 0.1 .
Entropy 27 00441 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, L.; Li, L.; Chen, J.; Yu, L.; Zhang, Z. A Constrained Talagrand Transportation Inequality with Applications to Rate-Distortion-Perception Theory. Entropy 2025, 27, 441. https://doi.org/10.3390/e27040441

AMA Style

Xie L, Li L, Chen J, Yu L, Zhang Z. A Constrained Talagrand Transportation Inequality with Applications to Rate-Distortion-Perception Theory. Entropy. 2025; 27(4):441. https://doi.org/10.3390/e27040441

Chicago/Turabian Style

Xie, Li, Liangyan Li, Jun Chen, Lei Yu, and Zhongshan Zhang. 2025. "A Constrained Talagrand Transportation Inequality with Applications to Rate-Distortion-Perception Theory" Entropy 27, no. 4: 441. https://doi.org/10.3390/e27040441

APA Style

Xie, L., Li, L., Chen, J., Yu, L., & Zhang, Z. (2025). A Constrained Talagrand Transportation Inequality with Applications to Rate-Distortion-Perception Theory. Entropy, 27(4), 441. https://doi.org/10.3390/e27040441

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop