On Sequential Bayesian Inference for Continual Learning
Round 1
Reviewer 1 Report
As main contribution, this paper studies sequential Bayesian inference on the parameters of a neural network as an approach for continual learning. First, it is shown that variational continual learning (VCL) does not perform well when performed with a single-headed output layer. The authors then show that even when they use better, gold-standard approximations to the posterior distributions (i.e., Hamiltonian Monte Carlo followed by density estimation using GMMs), they are unable to substantially improve upon the performance of VCL.
Another contribution of this paper is proposing Prototypical Bayesian Continual Learning (ProtoCL), which learns a generative classifier by modeling each class as a Gaussian in an embedding space of an underlying neural network. To prevent drift in the underlying neural network, data from previous tasks are stored and replayed. The proposed ProtoCL method outperforms a number of Bayesian continual learning methods on several class-incremental learning benchmarks.
I think this is an interesting paper with a valuable contribution. Sequential Bayesian inference on the parameters of a deep neural network is an important, often-used approach in continual learning that underlies many existing methods. Demonstrating fundamental problems with this approach is an important contribution. I think the authors succeed in making a convincing case that sequential Bayesian inference on the parameters of a neural network is very challenging in practice. I also consider the proposed ProtoCL method a useful, albeit incremental, contribution to the continual learning literature.
I support acceptance of this paper, provided the issue described below is satisfactorily addressed.
I think the claim in Section 4 that a misspecified model can forget despite performing exact inference is somewhat problematic and potentially misleading. The bias that is observed in this experiment towards the second task is due to the imbalance in data, not due to the temporal order in which the data is presented. If the order of the data would be changed, the final learning outcome would stay the same. I think this makes the use of the term “forgetting” in this case questionable. I think the authors should at least discuss that the biased performance towards the second task they observe in this experiment is not due to the temporal order of the data.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Summary
This work studies the Bayesian learning approach to continual learning. To this end, the authors investigate two cases where the popular approach of using previous task's posterior as a prior for new task may cause catastrophic forgetting: (i) approximate inference and (ii) model misspecification. Then, the authors show how data imbalance can affect continual learning and argue for the need to model the underlying data generating process. Lastly, the authors propose a simple approach called "Prototypical Bayesian Continual Learning" and demonstrate its encouraging results compared to other Bayesian continual learning strategies.
Strengths
Bayesian learning is a natural and promising approach to continual learning. This work investigates the potential limitations of the conventional Bayesian inference approach and discuss several directions to address these issues. Overall, this work provides useful insights to the Bayesian learning community.
The empirical investigation (Sec. 3 and 4) are well-designed and presented clearly.
The proposed ProtoCL method is simple yet achieved encouraging results with low complexities.
Weaknesses
My biggest concern of this work is the contributions are rather incoherent, which leaves several questions unanswered. Particularly, it is unclear to me how the proposed ProtoCL could address the challenges arisen from approximate inference, model misspecification, data imbalance, and how it relates to the argument of modeling the underlying data generating process for CL. Overall, I think this work presented several rather minor contributions but failed to connect them to form a big picture of Bayesian approach to CL.
The experiments in Sec. 7 can be further improved. First, it would be helpful to explore empirically how ProtoCL could address the limitations presented in Sec. 3, 4, and 5. Second, there are several recent, state-of-the-art Bayesian continual learning methods that should be discussed, e.g. [A,B,C].
[A] Adel, Tameem, Han Zhao, and Richard E. Turner. "Continual Learning with Adaptive Weights (CLAW)." International Conference on Learning Representations.
[B] Ebrahimi, Sayna, et al. "Uncertainty-guided Continual Learning with Bayesian Neural Networks." International Conference on Learning Representations.
[C] Loo, Noel, Siddharth Swaroop, and Richard E. Turner. "Generalized Variational Continual Learning." International Conference on Learning Representations.
Overall, the writing quality is high and only requires minor editing.
For example, two consecutive sentences in L8 and L11 start with Furthermore. Theorem 2 in L239 is undefined.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
I'm happy with the response of the authors and support acceptance.
Reviewer 2 Report
After the revision, the manuscript is greatly improved and I support for publication.