Next Article in Journal
Deeper Exploiting Graph Structure Information by Discrete Ricci Curvature in a Graph Transformer
Next Article in Special Issue
Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning
Previous Article in Journal
Analysis of the Maximum Efficiency and the Maximum Net Power as Objective Functions for Organic Rankine Cycles Optimization
Previous Article in Special Issue
Semi-Supervised k-Star (SSS): A Machine Learning Method with a Novel Holo-Training Approach
 
 
Article
Peer-Review Record

On Sequential Bayesian Inference for Continual Learning

Entropy 2023, 25(6), 884; https://doi.org/10.3390/e25060884
by Samuel Kessler 1,*, Adam Cobb 2, Tim G. J. Rudner 3, Stefan Zohren 1 and Stephen J. Roberts 1
Reviewer 1: Anonymous
Reviewer 2:
Entropy 2023, 25(6), 884; https://doi.org/10.3390/e25060884
Submission received: 1 May 2023 / Revised: 24 May 2023 / Accepted: 28 May 2023 / Published: 31 May 2023
(This article belongs to the Special Issue Information Theory for Data Science)

Round 1

Reviewer 1 Report

As main contribution, this paper studies sequential Bayesian inference on the parameters of a neural network as an approach for continual learning. First, it is shown that variational continual learning (VCL) does not perform well when performed with a single-headed output layer. The authors then show that even when they use better, gold-standard approximations to the posterior distributions (i.e., Hamiltonian Monte Carlo followed by density estimation using GMMs), they are unable to substantially improve upon the performance of VCL.

 

Another contribution of this paper is proposing Prototypical Bayesian Continual Learning (ProtoCL), which learns a generative classifier by modeling each class as a Gaussian in an embedding space of an underlying neural network. To prevent drift in the underlying neural network, data from previous tasks are stored and replayed. The proposed ProtoCL method outperforms a number of Bayesian continual learning methods on several class-incremental learning benchmarks.

 

I think this is an interesting paper with a valuable contribution. Sequential Bayesian inference on the parameters of a deep neural network is an important, often-used approach in continual learning that underlies many existing methods. Demonstrating fundamental problems with this approach is an important contribution. I think the authors succeed in making a convincing case that sequential Bayesian inference on the parameters of a neural network is very challenging in practice. I also consider the proposed ProtoCL method a useful, albeit incremental, contribution to the continual learning literature.

 

I support acceptance of this paper, provided the issue described below is satisfactorily addressed.

 

I think the claim in Section 4 that a misspecified model can forget despite performing exact inference is somewhat problematic and potentially misleading. The bias that is observed in this experiment towards the second task is due to the imbalance in data, not due to the temporal order in which the data is presented. If the order of the data would be changed, the final learning outcome would stay the same. I think this makes the use of the term “forgetting” in this case questionable. I think the authors should at least discuss that the biased performance towards the second task they observe in this experiment is not due to the temporal order of the data.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Summary

This work studies the Bayesian learning approach to continual learning. To this end, the authors investigate two cases where the popular approach of using previous task's posterior as a prior for new task may cause catastrophic forgetting: (i) approximate inference and (ii) model misspecification.  Then, the authors show how data imbalance can affect continual learning and argue for the need to model the underlying data generating process. Lastly, the authors propose a simple approach called "Prototypical Bayesian Continual Learning" and demonstrate its encouraging results compared to other Bayesian continual learning strategies.

Strengths

Bayesian learning is a natural and promising approach to continual learning. This work investigates the potential limitations of the conventional Bayesian inference approach and discuss several directions to address these issues. Overall, this work provides useful insights to the Bayesian learning community.

The empirical investigation (Sec. 3 and 4) are well-designed and presented clearly.   

The proposed ProtoCL method is simple yet achieved encouraging results with low complexities.

Weaknesses

My biggest concern of this work is the contributions are rather incoherent, which leaves several questions unanswered. Particularly, it is unclear to me how the proposed ProtoCL could address the challenges arisen from approximate inference, model misspecification, data imbalance, and how it relates to the argument of modeling the underlying data generating process for CL. Overall, I think this work presented several rather minor contributions but failed to connect them to form a big picture of Bayesian approach to CL.

The experiments in Sec. 7 can be further improved. First, it would be helpful to explore empirically how ProtoCL could address the limitations presented in Sec. 3, 4, and 5. Second, there are several recent, state-of-the-art Bayesian continual learning methods that should be discussed, e.g. [A,B,C].

[A] Adel, Tameem, Han Zhao, and Richard E. Turner. "Continual Learning with Adaptive Weights (CLAW)." International Conference on Learning Representations.

[B] Ebrahimi, Sayna, et al. "Uncertainty-guided Continual Learning with Bayesian Neural Networks." International Conference on Learning Representations.   

[C] Loo, Noel, Siddharth Swaroop, and Richard E. Turner. "Generalized Variational Continual Learning." International Conference on Learning Representations.

 

Overall, the writing quality is high and only requires minor editing.

For example, two consecutive sentences in L8 and L11 start with Furthermore. Theorem 2 in L239 is undefined. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I'm happy with the response of the authors and support acceptance.

Reviewer 2 Report

After the revision, the manuscript is greatly improved and I support for publication.

Back to TopTop