entropy-logo

Journal Browser

Journal Browser

The Information Bottleneck in Deep Learning

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (30 November 2019) | Viewed by 16650

Special Issue Editors


E-Mail Website
Guest Editor
The School of Computer Science and Engineering and the Interdisciplinary Center for Neural Computation, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
Interests: information theory in machine learning, dynamical systems and control, statistical physics of neural systems, computational neuroscience
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Professor Of Computer Science, UCLA, Boelter Hall 3531d, 405 Hilgard Ave, Los Angeles, CA 90095-1596, USA
Interests: computer vision, nonlinear estimation, control theory

Special Issue Information

The Information Bottleneck is a principle to trade off complexity and fidelity in statistical modeling and inference. It was introduced in the 1990s and has been applied to different domains such as clustering and system identification. Most recently, it has shown to play a role in the analysis of deep neural networks. This Special Issue focuses on the role of the Information Bottleneck and related principles in the analysis and design of representation learning and optimization algorithms for training deep neural networks. For instance, connections have been established between the Information Bottleneck and Bayesian Inference, PAC-Bayes Theory, Kolmogorov Complexity, and Minimum-Description Length—all with different algorithmic instantiation. Contributions are solicited that explore both the modeling aspect, the optimization aspect, and the empirical analysis aspect of deep learning using tools from Information Theory and Statistical Theory. Application papers are also welcome that explore the use of the Information Bottleneck as a regularization method for training deep learning models. Explorations that explore connections to biological systems are also encouraged. Manuscripts will be peer-reviewed, and published work will be available through open access. Expansion of manuscripts published at conferences are welcome, so long as they include meaningful expansion of the scope of these papers. Simultaneous submission of a short conference version and the longer journal version is acceptable, so long as it is notified to the editors and a copy of the conference submission is enclosed with the journal submission.

Prof. Dr. Naftali Tishby
Prof. Dr. Stefano Soatto
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information theory
  • deep learning
  • representation learning
  • information bottleneck

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 2198 KiB  
Article
Information Bottleneck for Estimating Treatment Effects with Systematically Missing Covariates
by Sonali Parbhoo, Mario Wieser, Aleksander Wieczorek and Volker Roth
Entropy 2020, 22(4), 389; https://doi.org/10.3390/e22040389 - 29 Mar 2020
Cited by 11 | Viewed by 3157
Abstract
Estimating the effects of an intervention from high-dimensional observational data is a challenging problem due to the existence of confounding. The task is often further complicated in healthcare applications where a set of observations may be entirely missing for certain patients at test [...] Read more.
Estimating the effects of an intervention from high-dimensional observational data is a challenging problem due to the existence of confounding. The task is often further complicated in healthcare applications where a set of observations may be entirely missing for certain patients at test time, thereby prohibiting accurate inference. In this paper, we address this issue using an approach based on the information bottleneck to reason about the effects of interventions. To this end, we first train an information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of information for treatment effects. As a second step, we subsequently use the compressed covariates to perform a transfer of relevant information to cases where data are missing during testing. In doing so, we can reliably and accurately estimate treatment effects even in the absence of a full set of covariate information at test time. Our results on two causal inference benchmarks and a real application for treating sepsis show that our method achieves state-of-the-art performance, without compromising interpretability. Full article
(This article belongs to the Special Issue The Information Bottleneck in Deep Learning)
Show Figures

Figure 1

11 pages, 281 KiB  
Article
On the Difference between the Information Bottleneck and the Deep Information Bottleneck
by Aleksander Wieczorek and Volker Roth
Entropy 2020, 22(2), 131; https://doi.org/10.3390/e22020131 - 22 Jan 2020
Cited by 8 | Viewed by 4194
Abstract
Combining the information bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proven successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the deep variational information bottleneck and the [...] Read more.
Combining the information bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proven successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the deep variational information bottleneck and the assumptions needed for its derivation. The two assumed properties of the data, X and Y, and their latent representation T, take the form of two Markov chains T X Y and X T Y . Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions P ( X , Y , T ) . We, therefore, show how to circumvent this limitation by optimising a lower bound for the mutual information between T and Y: I ( T ; Y ) , for which only the latter Markov chain has to be satisfied. The mutual information I ( T ; Y ) can be split into two non-negative parts. The first part is the lower bound for I ( T ; Y ) , which is optimised in deep variational information bottleneck (DVIB) and cognate models in practice. The second part consists of two terms that measure how much the former requirement T X Y is violated. Finally, we propose interpreting the family of information bottleneck models as directed graphical models, and show that in this framework, the original and deep information bottlenecks are special cases of a fundamental IB model. Full article
(This article belongs to the Special Issue The Information Bottleneck in Deep Learning)
Show Figures

Figure 1

7 pages, 276 KiB  
Article
A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
by Rita Fioresi, Pratik Chaudhari and Stefano Soatto
Entropy 2020, 22(1), 101; https://doi.org/10.3390/e22010101 - 15 Jan 2020
Cited by 3 | Viewed by 4715
Abstract
This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly [...] Read more.
This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former. Full article
(This article belongs to the Special Issue The Information Bottleneck in Deep Learning)
15 pages, 1074 KiB  
Article
Multistructure-Based Collaborative Online Distillation
by Liang Gao, Xu Lan, Haibo Mi, Dawei Feng, Kele Xu and Yuxing Peng
Entropy 2019, 21(4), 357; https://doi.org/10.3390/e21040357 - 2 Apr 2019
Cited by 9 | Viewed by 3596
Abstract
Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth [...] Read more.
Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth or ensembling different networks increases the demand for memory resources and computing resources. This leads to difficulties in deploying depth-learning models in resource-constrained scenarios such as drones, mobile phones, and autonomous driving. Improving network performance without expanding the network scale has become a hot topic for research. In this paper, we propose a cross-architecture online-distillation approach to solve this problem by transmitting supplementary information on different networks. We use the ensemble method to aggregate networks of different structures, thus forming better teachers than traditional distillation methods. In addition, discontinuous distillation with progressively enhanced constraints is used to replace fixed distillation in order to reduce loss of information diversity in the distillation process. Our training method improves the distillation effect and achieves strong network-performance improvement. We used some popular models to validate the results. On the CIFAR100 dataset, AlexNet’s accuracy was improved by 5.94%, VGG by 2.88%, ResNet by 5.07%, and DenseNet by 1.28%. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. On the CIFAR10, CIFAR100, and ImageNet datasets, we observed significant improvements over traditional knowledge distillation. Full article
(This article belongs to the Special Issue The Information Bottleneck in Deep Learning)
Show Figures

Graphical abstract

Back to TopTop