Minimum Description Length Codes Are Critical
Abstract
:1. Introduction
1.1. Resolution, Relevance and Maximally Informative Samples
1.2. Minimum Description Length and the Normalized Maximum Likelihood
2. Results
2.1. NML Codes Provide Efficient Representations
2.1.1. Dirichlet Model
2.1.2. A Model of Independent Spins
2.1.3. Sherrington-Kirkpatrick Model
2.1.4. Restricted Boltzmann Machines
2.2. Large Deviations of the Universal Codes Exhibit Phase Transitions
3. Discussion
Author Contributions
Funding
Conflicts of Interest
Abbreviations
MDL | minimum description length |
NML | normalized maximum likelihood |
MIS | maximally informative sample |
SK | Sherrington-Kirkpatrick |
RBM | restricted Boltzmann machine |
CD | contrastive divergence |
PCD | persistent contrastive divergence |
MCMC | Markov chain Monte Carlo |
Appendix A. Derivation for the Parametric Complexity
Appendix B. Calculating the Parametric Complexity
Appendix B.1. Dirichlet Model
Appendix B.2. Paramagnet Model
Appendix C. Simulation Details
Appendix C.1. Sampling Universal Codes through Markov Chain Monte Carlo
- Starting from the sample, , we calculate the maximum likelihood estimates, , of the parameters of the model, by either solving Equation (41) for the SK model or by Contrastive Divergence (CD) [24,25] for the RBM (see Appendix C.2).
- We generate a new sample, from by flipping a spin in randomly selected r points of the sample. The number of selected spins, r, must be chosen carefully such that r must be large enough to ensure faster mixing but small enough so the new inferred model, , is not too far from the starting model, .
- The maximum likelihood estimators, for the new sample are calculated as in Step 1.
- Compute
Appendix C.2. Estimating RBM Parameters through Contrastive Divergence
Appendix C.3. Source Codes
References
- Muñoz, M.A. Colloquium: Criticality and dynamical scaling in living systems. Rev. Mod. Phys. 2018, 90, 031001. [Google Scholar] [CrossRef]
- Newman, M.E.J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 2005, 46, 323–351. [Google Scholar] [CrossRef]
- Bak, P. How Nature Works: The Science of Self-Organized Criticality; Copernicus: Göttingen, Germany, 1996. [Google Scholar]
- Mora, T.; Bialek, W. Are biological systems poised at criticality? J. Stat. Phys. 2011, 144, 268–302. [Google Scholar] [CrossRef]
- Simini, F.; González, M.C.; Maritan, A.; Barabási, A.L. A universal model for mobility and migration patterns. Nature 2012, 484, 96. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schwab, D.J.; Nemenman, I.; Mehta, P. Zipf’s law and criticality in multivariate data without fine-tuning. Phys. Rev. Lett. 2014, 113, 068102. [Google Scholar] [CrossRef] [PubMed]
- Marsili, M.; Mastromatteo, I.; Roudi, Y. On sampling and modeling complex systems. J. Stat. Mech. Theory Exp. 2013, 9, 1267–1279. [Google Scholar] [CrossRef]
- Haimovici, A.; Marsili, M. Criticality of mostly informative samples: A bayesian model selection approach. J. Stat. Mech. Theory Exp. 2015, 10, P10013. [Google Scholar] [CrossRef]
- Cubero, R.J.; Jo, J.; Marsili, M.; Roudi, Y.; Song, J. Minimally sufficient representations, maximally informative samples and Zipf’s law. arXiv, 2018; arXiv:1808.00249. [Google Scholar]
- Song, J.; Marsili, M.; Jo, J. Resolution and relevance trade-offs in deep learning. arXiv, 2017; arXiv:1710.11324. [Google Scholar]
- Grünwald, P.D. The Minimum Description Length Principle; MIT Press: Massachusetts, MA, USA, 2007. [Google Scholar]
- Ter Steege, H.; Pitman, N.C.A.; Sabatier, D.; Baraloto, C.; Salomão, R.P.; Guevara, J.E.; Phillips, O.L.; Castilho, C.V.; Magnusson, W.E.; Molino, J.F.; et al. Hyperdominance in the Amazonian tree flora. Science 2013, 342, 1243092. [Google Scholar] [CrossRef] [PubMed]
- Condit, R.; Lao, S.; Pérez, R.; Dolins, S.B.; Foster, R.; Hubbell, S. Barro Colorado Forest Census Plot Data (Version 2012). Available online: https://repository.si.edu/handle/10088/20925 (accessed on 1 October 2018).
- Combine Your Old LEGO® to Build New Creations. Available online: https://rebrickable.com/ (accessed on 1 October 2018).
- Mazzolini, A.; Gherardi, M.; Caselle, M.; Lagomarsino, M.C.; Osella, M. Statistics of shared components in complex component systems. Phys. Rev. X 2018, 8, 021023. [Google Scholar] [CrossRef]
- Gama-Castro, S.; Salgado, H.; Santos-Zavaleta, A.; Ledezma-Tejeida, D.; Muñiz-Rascado, L.; García-Sotelo, J.S.; Alquicira-Hernández, K.; Martínez-Flores, I.; Pannier, L.; Castro-Mondragón, J.A.; et al. Regulondb version 9.0: High-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res. 2015, 44, 133–143. [Google Scholar] [CrossRef] [PubMed]
- Balakrishnan, R.; Park, J.; Karra, K.; Hitz, B.C.; Binkley, G.; Hong, E.L.; Sullivan, J.; Micklem, G.; Cherry, J.M. Yeastmine—An integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit. Database 2012, 2012, bar062. [Google Scholar] [CrossRef] [PubMed]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
- Grünwald, P.D. A tutorial introduction to the minimum description length principle. arXiv, 2004; arXiv:math/0406077. [Google Scholar]
- Shtarkov, Y.M. Universal sequential coding of single messages. Transl. Prob. Inf. Transm. 1987, 23, 175–186. [Google Scholar]
- Rissanen, J.J. Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 1996, 42, 40–47. [Google Scholar] [CrossRef]
- Balasubramanian, V. MDL, Bayesian inference, and the geometry of the space of probability distributions. In Advances in Minimum Description Length: Theory and Applications; Grnwald, P.D., Myung, I.J., Pitt, M.A., Eds.; The MIT Press: Massachusetts, MA, USA, 2005. [Google Scholar]
- Beretta, A.; Battistin, C.; de Mulatier, C.; Mastromatteo, I.; Marsili, M. The stochastic complexity of spin models: How simple are simple spin models? arXiv, 2017; arXiv:1702.07549. [Google Scholar]
- Hinton, G.E. Training products of experts by minimizing contrastive divergence. Neural Comput. 2002, 14, 1771–1800. [Google Scholar] [CrossRef] [PubMed]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- Mezard, M.; Montanari, A. Information, Physics, and Computation; Oxford University Press: Oxford, UK, 2009. [Google Scholar]
- Filiasi, M.; Livan, G.; Marsili, M.; Peressi, M.; Vesselli, E.; Zarinelli, E. On the concentration of large deviations for fat tailed distributions, with application to financial data. J. Stat. Mech. Theory Exp. 2014, 9, P09030. [Google Scholar] [CrossRef]
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cubero, R.J.; Marsili, M.; Roudi, Y. Minimum Description Length Codes Are Critical. Entropy 2018, 20, 755. https://doi.org/10.3390/e20100755
Cubero RJ, Marsili M, Roudi Y. Minimum Description Length Codes Are Critical. Entropy. 2018; 20(10):755. https://doi.org/10.3390/e20100755
Chicago/Turabian StyleCubero, Ryan John, Matteo Marsili, and Yasser Roudi. 2018. "Minimum Description Length Codes Are Critical" Entropy 20, no. 10: 755. https://doi.org/10.3390/e20100755
APA StyleCubero, R. J., Marsili, M., & Roudi, Y. (2018). Minimum Description Length Codes Are Critical. Entropy, 20(10), 755. https://doi.org/10.3390/e20100755