**7. Conclusions**

We introduce Neural Predictive Rate–Distortion (NPRD), a method for estimating Predictive Rate–Distortion when only sample trajectories are given. Unlike OCF, the most general prior method, NPRD scales to long sequences and large state spaces. On analytically tractable processes, we show that it closely fits the analytical rate–distortion curve and recovers the causal states of the process. On part-of-speech-level modeling of natural language, it agrees with OCF in the setting of low rate and short sequences; outside these settings, OCF fails due to combinatorial explosion and overfitting, while NPRD continues to provide estimates. Finally, we use NPRD to provide the first estimates of Predictive Rate–Distortion for modeling natural language in five different languages, finding qualitatively very similar curves in all languages.

All code for reproducing the results in this work is available at https://github.com/m-hahn/ predictive-rate--distortion.

**Author Contributions:** Methodology, M.H.; Writing—original draft, M.H. and R.F.

**Funding:** This research received no external funding.

**Acknowledgments:** We thank Dan Jurafsky for helpful discussion, and the anonymous reviewers for helpful comments.

**Conflicts of Interest:** The authors declare no conflict of interest.
