*4.4. Related Work*

In (19), we derived a variational formulation of Predictive Rate–Distortion. This is formally related to a variational formulation of the Information Bottleneck that was introduced by [38], who applied it to neural-network based image recognition. Unlike our approach, they used a fixed diagonal Gaussian instead of a flexible parametrized distribution for *q*. Some recent work has applied similar approaches to the modeling of sequences, employing models corresponding to the objective (19) with *λ* = 1 [39–42].

In the neural networks literature, the most commonly used method using variational bounds similar to Equation (19) is the Variational Autoencoder [36,43], which corresponds to the setting where *λ* = 1 and the predicted output is equal to the observed input. The *β*-VAE [44], a variant of the Variational Autoencoder, uses *λ* > 1 (whereas the Predictive Rate–Distortion objective (7) uses *λ* ∈ (0, 1)), and has been linked to the Information Bottleneck by [45].
