Abstract
Motivated by task-oriented semantic communication and distributed learning systems, this paper studies a distributed indirect source coding problem where M correlated sources are independently encoded for a central decoder. The decoder has access to correlated side information in addition to the messages received from the encoders and aims to recover a latent random variable under a given distortion constraint rather than recovering the sources themselves. We characterize the exact rate-distortion function for the case where the sources are conditionally independent given the side information. Furthermore, we develop a distributed Blahut–Arimoto (BA) algorithm to numerically compute the rate-distortion function. Numerical examples are provided to demonstrate the effectiveness of the proposed approach in calculating the rate-distortion region.
1. Introduction
Consider the multiterminal source coding setup as shown in Figure 1. Let be a discrete memoryless source (DMS) taking values in the finite alphabets according to a fixed and known probability distribution . In this setup, the encoder has local observations . The agents independently encode their observations into binary sequences at rates bits per input symbol, respectively. The decoder with side information aims to recover some task-oriented latent information which is correlated with , but it is not observed directly by any of the encoders. We are interested in the lossy reconstruction of with the average distortion measured by for some prescribed single-letter distortion measure . A formal rate-distortion code for this setup consists of the following:
Figure 1.
Distributed remote compression of a latent variable with M correlated sources at distributed transmitters and side information at the receiver.
- M independent encoders, where encoder assigns an index to each sequence ;
- A decoder that produces the estimate to each index tuple and side information .
A rate tuple is said to be achievable with the distortion measure and the distortion value D if there exists a sequence of codes that satisfy
The rate-distortion region for this distributed source coding problem is the closure of the set of all achievable rate tuples that permit the reconstruction of the latent variable within the average distortion constraint D.
The problem as illustrated in Figure 1 is motivated by semantic/task-oriented communication and distributed learning problems. In semantic/task-oriented communication [1], the decoder only needs to reconstruct some task-oriented information implied by the sources. For instance, it might extract hidden features from a scene captured by multiple cameras positioned at various angles. Here, may also be a deterministic function of the source samples , which then reduces to the problem of lossy distributed function computation [2,3,4]. A similar problem also arises in distributed training. Consider as the global model available at the server at an iteration of a federated learning process and as the independent correlated versions of this model after downlink transmission and local training. The server aims to recover the updated global model, , based on the messages received from all M clients. It is often assumed that the global model is transmitted to the clients intact, but in practical scenarios where downlink communication is limited, the clients may receive noisy or compressed versions of the global model [5,6,7].
For the case of , the problem reduces to the remote compression in a point-to-point scenario with side information available at the decoder. In [8,9], the authors studied this problem without the correlated side information at the receiver, which is motivated in the context of semantic communication. This problem is known in the literature as the remote rate-distortion problem [10,11], and the rate-distortion trade-off is fully characterized in the general case. The authors studied this trade-off in detail for specific source distributions in [8]. Similarly, the authors of [12] characterized the remote rate-distortion trade-off when correlated side information is available both at the encoder and decoder. Our problem for can be solved by combining the remote rate-distortion problem with the classical Wyner–Ziv rate-distortion function [13,14].
The rate-distortion region for the multi-terminal version of the remote rate-distortion problem considered here remains open. Wagner et al. developed an improved outer bound for a general multiterminal source model [15]. Sung et al. proposed an achievable rate region for the distributed lossy computation problem without giving an conclusive rate-distortion function [16]. Gwanmo et al. considered a special case in which the sources are independent and derived a single-letter expression for the rate-distortion region [17]. Gastpar [18] considered the lossy compression of the source sequences in the presence of side information at the receiver. He characterized the rate-distortion region for the special case, in which values are conditionally independent given the side information.
To provide a performance reference for practical coding schemes, it is necessary not only to characterize the exact theoretical expression for the rate-distortion region but also to calculate the rate-distortion region for a given distribution and a specific distortion metric. In the traditional single-source direct scenario, determining the rate-distortion function involves solving a convex optimization problem, which can be addressed using the globally convergent iterative Blahut–Arimoto algorithm, as discussed in [17]. In this paper, we are interested in computing the rate-distortion region for the general distributed coding problem. We pay particular attention to the special case in which the sources are conditionally independent given the side information, which is motivated by the aforementioned examples. For the sake of brevity of the presentation, we set in this paper with the understanding that the results can be readily extended to an arbitrary number of sources. To numerically compute the rate-distortion region, we introduce a distributed iterative optimization framework that generalizes the classical Blahut–Arimoto (BA) algorithm. While the standard BA algorithm is designed for single-source point-to-point settings, our framework extends its alternating minimization structure to a distributed scenario with multiple encoders, indirect source reconstruction, and decoder-side side information. This extension enables the computation of rate-distortion regions in settings that are significantly more general than those considered in the existing literature.
In Section 2, we derive an achievable region . In Section 3, we determine a general outer bound . In Section 4, we show that the two regions coincide and the region is optimal when the sources are conditionally independent given the side information Y. In Section 5, we develop an alternating minimization framework to calculate the rate-distortion region by generalizing the Blahut–Arimoto (BA) algorithm. In Section 6, we demonstrate the effectiveness of the proposed framework through numerical examples.
2. An Achievable Rate Region
In this section, we introduce an achievable rate region , which is contained within the goal rate-distortion region .
Theorem 1.
, where is the set of all rate tuples such that there exists a tuples of discrete random variables with , for which the following conditions are satisfied
and there exist a decoder such that
The auxiliary random variables , and serve as intermediate variables in the encoding process, which was introduced to optimize compression efficiency. Ideally, , and should carry information about the source , and , respectively. But this information should be as independent as possible from the side information Y in order to avoid redundancy and fully exploit the decoder-side knowledge. This helps minimize the required transmission rate R. The rigorous proof of Theorem 1 is provided in Appendix A.
Corollary 1.
The conditions (25) of Theorem 1 can be expressed equivalently as
Proof.
First, we prove . The bound of (4a) can be written as
where because is conditionally independent of for given . For the first term of the right in (5), we have
where because is conditionally independent of given . Then, we have
This completes the proof of (4a). (4b) and (4c) can be proved in the same way.
Then, we prove (4d), the bound of the sum rate can be written as
where (8a) and (8b) are obtained by the chain rule of mutual information, , because is conditionally independent of given . For the last term in (8b), we have
where because is conditionally independent of given , and because is conditionally independent of given . Thus, the last term in (8b) can be written as
For the last term in the right-hand side, we have
thus, the last term in (10) can be written as
By combining (8), (9), (10) and (12), we have
The rest sum rate bounds in (4) can be proved in the same way, which is omitted here. □
3. A General Outer Bound
In this section, we derive a region which contains the goal rate-distortion region .
Theorem 2.
, where is the set of all rate triples such that there exists a triple of discrete random variables with , and , for which the following conditions are satisfied
and there exists a decoding function such that
The rigorous proof of Theorem 2 is provided in Appendix B.
While the expressions of the inner bound (4) and the outer bound (14) are the same, these two regions do not coincide because the marginal constrains in Theorem 1 limit the degree of freedom for choosing the auxiliary random variables compared with the marginal constraints in Theorem 2. In the next section, we will demonstrate that the additional degree of freedom in choosing the auxiliary random variables in Theorem 2 cannot lower the value of the rate-distortion functions.
4. Conclusive Rate-Distortion Results
Corollary 2.
If are conditionally independent given the side information Y, where is the set of all rate triples such that there exists a triple of random variables with , for which the following conditions are satisfied
and there exist decoding functions such that
Proof.
Since the joint distribution can be written as
the terms in the sum rate bound (2d) are 0 because is conditionally independent of . Similarly, the terms and in the sum rate bound (2e)–(2g) are all 0. Therefore, the sum rate bound can be expressed as the combination of the side bounds and hence can be omitted. Meanwhile, the term in the side bound in (2a) can be written as
Similarly, we have
This completes the proof Corollary 2. □
Corollary 3.
If are conditionally independent given the side information Y, , and hence where is the set of all rate triples such that there exists a triple of discrete random variables with , and , for which the following conditions are satisfied
and there exists decoding functions such that
Proof.
First, we can enlarge the region by omitting the sum rate bound in (14). Then, the side rate bounds in (14) can be relaxed as
According to the conditional independence relations, we have , and then we have
where (24a) is obtained by the condition that are conditionally independent given the side information Y, (24b) follows from the chain rule of mutual information and (24c) is derived by the Markov chain . The same derivation can be applied to and ; this proves Corollary 3. □
Theorem 3.
If are conditionally independent given the side information Y,
Proof.
We note that the only difference between and is the degrees of freedom when choosing the auxiliary random variables , and all of the mutual information functions in (16) and (21) only depend on the marginal distribution , and . Randomly choose a certain rate triple with a auxiliary random variable triple meeting the conditions of Corollary 3, and the corresponding joint distribution is given in (18). Then, we construct the auxiliary random variables such that
The joint distribution
has the same marginal distributions on , and . Therefore, the additional degree of freedom for choosing the auxiliary random variables in Corollary 3 cannot lower the value of rate-distortion functions. This proves Theorem 3. The arguments leading to Theorem 3 indicate that the result extends to the M sources scenario
□
5. Iterative Optimization Framework Based on BA Algorithm
In this section, we present the iterative optimization framework for calculating the rate distortion region. Starting with the standard Lagrange multiplier method, the problem of calculating the rate-distortion region is equivalent to minimize
By the definition of mutual information, we can rewrite (29) as
where represent the distributions that need to be iteratively updated, and the vectorized notation represents the conditional distribution of the auxiliary variables given Y, i.e., , represents the conditional distribution of the auxiliary variables given sources , and represents the conditional distribution of the indirect variable T given Y and auxiliary variables, .
Lemma 1
(Optimization of ). For a fixed , the Lagrangian is minimized by
where .
Proof.
For any
where (a) follows from the fact that , and the equality is achieved if . This completes the proof of Lemma 1. □
Lemma 2
(Optimization of ). For a fixed , the Lagrangian is minimized by
and the minimum is given by
Proof.
For a fixed , the Lagrangian is minimized by if and only if the following Kuhn–Tucker (KT) conditions are satisfied
and
Since
the first KT condition (35) becomes
where
and we have
Then, (33) can be obtained after normalizing . □
Lemma 3
(Optimization of ). For a fixed , the Lagrangian is minimized by the maximum Bayes detector.
where is selected to guarantee
and denotes
Proof.
We note that the distortion term in the Lagrangian that depends on is the mean distortion, which can be minimized by a Bayes detector.
Based on Lemmas 1–3, we have the iterative algorithm for computing the rate-distortion region for the distributed indirect source with decoder side information. For a given , we iteratively calculate the following (44) and (45) to alternately update and
until the Lagrangian converges, and the associated is
Then, we update according to
with and if and and if , and where is selected to guarantee
and
□
Next, we repeat the process for the next user, i.e., .
Convergence analysis:
The algorithm employs an alternating minimization approach that produces Lagrangian values that are monotonically non-increasing and bounded below, thereby generating a convergent sequence of Lagrangians. Since the rate component of the Lagrangian is convex, and the sum of convex functions remains convex, the Lagrangian will also be convex if the expected distortion is a convex function. As a result, the proposed iterative optimization framework is capable of achieving the global minimum [19].
We note that the expected distortion (50) includes a product of variables in the optimization process, it is not a linear function, and the Lagrangian may exhibit non-convex behavior. Even when the problem is non-convex, the authors in [20] demonstrate that the BA-based iterative algorithm initialized randomly and followed by selecting the minimum Lagrangian among all converged values can still provide highly effective information-theoretic inner bounds for the rate-distortion region, serving as a benchmark for practical quantization schemes.
6. Numerical Examples
In this section, we provide an example to illustrate the proposed iterative algorithms for computing the rate-distortion region of a distributed source coding problem with decoder-side information. As in the problem considered in this paper, distributed edge devices compress their observations and transmit them to a central server (CEO). The central server then aims to recover the indirect information T from the received data, utilizing side information Y. For the convenience of demonstration, we consider a simple case where and the sources are binary, i.e., . The joint distributions, denoted by and , are given by
where the Kronecker delta function equals 1 when x = y and 0 otherwise. We can consider Y as the input to two different binary symmetric channels (BSCs) with crossover probabilities , respectively, where . The corresponding outputs of these channels are and . In this example, we set .
We also assume that the information of interest is directly the combination of the two distributed sources, i.e., . The distortion measure is given by , where
By applying the proposed iterative optimization framework, we can obtain the optimal transition probability distribution and that meets a given distortion constraint D on , and the corresponding minimum rate can be calculated by
The contour plot of the rate-distortion region for this scenario is presented in Figure 2, while Figure 3 displays a surface plot depicting the rate-distortion region. We note that when , the considered problem reduces to the traditional point-to-point Wyner–Ziv problem. In Figure 4, we compare the rate-distortion results computed using the proposed approach with the theoretical analysis by Wyner et al. [13]. We observe that the two rate-distortion function curves coincide, demonstrating the effectiveness of the proposed iterative approach for calculating rate distortion.
Figure 2.
Contour plot of the rate distortion region with two distributed binary sources , where the labels on the contours represent the distortion values D on .
Figure 3.
Surface plot of the rate-distortion region.
Figure 4.
The rate-distortion function for the case when , i.e., the Wyner–Ziv problem.
7. Conclusions
This paper explored a variant of the rate-distortion problem motivated by semantic communication and distributed learning systems, where correlated sources are independently encoded for a central decoder to reconstruct the indirect source of interest. In addition to receiving messages from the encoders, the decoder has access to correlated side information and aims to reconstruct the indirect source under a specified distortion constraint. We derived the exact rate-distortion function for the case where the sources are conditionally independent given the side information. Furthermore, we introduced a distributed iterative optimization framework based on the Blahut–Arimoto (BA) algorithm to numerically compute the rate-distortion function. A numerical example has been provided to demonstrate the effectiveness of the proposed approach.
Author Contributions
Conceptualization, Q.Y.; Methodology, J.T.; Validation, J.T.; Formal analysis, J.T.; Investigation, Q.Y.; Resources, Q.Y.; Writing—original draft, J.T.; Writing—review & editing, Q.Y.; Visualization, J.T.; Project administration, Q.Y.; Funding acquisition, Q.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This work is partly supported by NSFC under grant No. 62293481, No. 62201505, partly by the SUTD-ZJU IDEA Grant (SUTD-ZJU (VP) 202102).
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Here, we provide the rigorous proof of Theorem 1.
Lemma A1
(Extended Markov Lemma). Let
For a fixed , , and are drawn from , and , respectively. Then
Proof.
According to (A1), we have several Markov chains , , . We use the Markov lemma three times: first to derive that are jointly typical and are jointly typical. Then, using the Markov chain , we can demonstrate that are joint typical. For a fixed , is drawn from , which implies that for sufficiently large n. is drawn from , which implies that for sufficiently large n; is drawn from , which implies that for sufficiently large n. Then, with high probability, . □
Proof of Theorem 1.
For , fix and such that the distortion constraint is satisfied. Calculate . □
Generation of codebooks: Generate i.i.d codewords , and index them by . Provide random bins with indices . Randomly assign the codewords to one of bins using a uniform distribution. Let denote the set of codeword indices assigned to bin index .
Encoding: Given a source sequence , the encoder looks for a codeword such that . The encoder sends the index of the bin in which belongs.
Decoding: The decoder looks for a pair such that and . If the decoder finds a unique triple , he then calculates , where .
Analysis of the probability of error:
1. The encoders cannot find the codewords such that . The probability of this event is small if
2. The pair of sequences , and but the codewords are not jointly typical with the side information sequences , i.e., . We have assumed that
Hence, by the Markov lemma, the probability of this event goes to zero if n is large enough.
3. There exists another with the same bin index that is jointly typical with the side information sequences. The correct codeword indices are denoted by , and . We first consider the situation where the codeword index is in error. The probability that a randomly chosen is jointly typical with can be bounded as
The probability of this error event is bounded by the number of codewords in the bin times the probability of joint typicality
Similarly, the probability that the codeword index or is in error can be bounded by
We then consider the case that two of the three codeword indices are in error. The probability that the randomly chosen and are jointly typical with can be bounded as
Hence, the error probability can be bounded as
Similarly, we can obtain the probability that the codeword indices or are in error, which we omit here.
For the case where all the codeword indices , and are in error, the probability that the randomly chosen , and are jointly typical with can be bounded as
The probability of the above error events goes to 0 when
If are correctly decoded, we have . Therefore, the empirical joint distribution is close to the distribution that achieves distortion D.
Appendix B
Here, we provide the rigorous proof of Theorem 2. Define a series of encoders and decoders that achieve the given distortion D. We can derive the following inequalities
where (A12a) follows from the fact that conditioning reduces entropy, (A12b) and (A12d) are obtained by the definition of conditional mutual information and (A12c) is the chain rule of mutual information. In (A12e), we let , and . Similarly, we have
For the sum rate part, we have
where (A14a) follows from the fact that conditioning reduces entropy, (A14b) and (A14d) are obtained by the definition of conditional mutual information, (A14c) is the chain rule of mutual information and (A14e) follows from the fact that is conditionally independent of the past and feature of given . In (A14f), the definition of is used again.
References
- Han, T.; Yang, Q.; Shi, Z.; He, S.; Zhang, Z. Semantic-preserved communication system for highly efficient speech transmission. IEEE J. Sel. Areas Commun. 2022, 41, 245–259. [Google Scholar] [CrossRef]
- Adikari, T.; Draper, S. Two-terminal source coding with common sum reconstruction. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1420–1424. [Google Scholar]
- Korner, J.; Marton, K. How to encode the modulo-two sum of binary sources (corresp.). IEEE Trans. Inf. Theory 1979, 25, 219–221. [Google Scholar] [CrossRef]
- Pastore, A.; Lim, S.H.; Feng, C.; Nazer, B.; Gastpar, M. Distributed Lossy Computation with Structured Codes: From Discrete to Continuous Sources. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1681–1686. [Google Scholar]
- Amiri, M.M.; Gunduz, D.; Kulkarni, S.R.; Poor, H.V. Federated Learning with Quantized Global Model Updates. arXiv 2020, arXiv:2006.10672. [Google Scholar] [CrossRef]
- Gruntkowska, K.; Tyurin, A.; Richtárik, P. Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity. arXiv 2024, arXiv:2402.06412. [Google Scholar] [CrossRef]
- Amiri, M.M.; Gündüz, D.; Kulkarni, S.R.; Poor, H.V. Convergence of Federated Learning Over a Noisy Downlink. IEEE Trans. Wirel. Commun. 2022, 21, 1422–1437. [Google Scholar] [CrossRef]
- Stavrou, P.A.; Kountouris, M. The Role of Fidelity in Goal-Oriented Semantic Communication: A Rate Distortion Approach. IEEE Trans. Commun. 2023, 71, 3918–3931. [Google Scholar] [CrossRef]
- Liu, J.; Zhang, W.; Poor, H.V. A rate-distortion framework for characterizing semantic information. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Virtual, 12–20 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2894–2899. [Google Scholar]
- Dobrushin, R.; Tsybakov, B. Information transmission with additional noise. IRE Trans. Inf. Theory 1962, 8, 293–304. [Google Scholar] [CrossRef]
- Wolf, J.; Ziv, J. Transmission of noisy information to a noisy receiver with minimum distortion. IEEE Trans. Inf. Theory 1970, 16, 406–411. [Google Scholar] [CrossRef]
- Guo, T.; Wang, Y.; Han, J.; Wu, H.; Bai, B.; Han, W. Semantic compression with side information: A rate-distortion perspective. arXiv 2022, arXiv:2208.06094. [Google Scholar] [CrossRef]
- Wyner, A.; Ziv, J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
- Slepian, D.; Wolf, J. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19, 471–480. [Google Scholar] [CrossRef]
- Wagner, A.B.; Anantharam, V. An improved outer bound for multiterminal source coding. IEEE Trans. Inf. Theory 2008, 54, 1919–1937. [Google Scholar] [CrossRef]
- Lim, S.H.; Feng, C.; Pastore, A.; Nazer, B.; Gastpar, M. Towards an algebraic network information theory: Distributed lossy computation of linear functions. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1827–1831. [Google Scholar]
- Cheng, S.; Stankovic, V.; Xiong, Z. Computing the channel capacity and rate-distortion function with two-sided state information. IEEE Trans. Inf. Theory 2005, 51, 4418–4425. [Google Scholar] [CrossRef]
- Gastpar, M. The Wyner-Ziv problem with multiple sources. IEEE Trans. Inf. Theory 2004, 50, 2762–2768. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Ku, G.; Ren, J.; Walsh, J.M. Computing the rate distortion region for the CEO problem with independent sources. IEEE Trans. Signal Process. 2014, 63, 567–575. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).



