Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Humans have neurophysiological mechanisms that enable efficient continual learning.
Brain encodes information in non-overlapping sparse codes to facilitate learning of new associations.
DNNs use activation sparsity and dropout to mimic sparse coding.
Multiple-memory replay mechanism maintains a long-term semantic memory.

Paper Content

Introduction

Standard DNNs are not designed for lifelong learning and exhibit catastrophic forgetting of previously learned knowledge.
The core challenge in continual learning (CL) in DNNs is to maintain an optimal balance between plasticity and the stability of the model.
Rehearsal-based methods have proven to be an effective approach in challenging CL tasks.
The human brain provides an existence proof for successful CL in complex dynamic environments.
The human brain maintains a delicate balance between stability and plasticity through a complex set of neurophysiological mechanisms.
Sparse Coding and Complementary Learning Systems (CLS) theory can provide insight into the design principles and mechanisms that can enable CL in DNNs.
SCoMMER employs sparse coding in a multi-memory experience replay mechanism to encourage sparse coding in DNNs and mimic the interplay of multiple memory systems.

Three categories of approaches to address catastrophic forgetting in CL: regularization-based, dynamic architecture, and rehearsal-based
Rehearsal-based methods have proven to be effective in challenging CL scenarios
Different aspects of rehearsal have been studied, such as memory sample selection, sample retrieval, and what information to extract and replay
CLS theory has inspired approaches that utilize multiple memory systems

Methodology

Overview of motivation from biological systems
Introduction to components of proposed approach

Continual learning in the biological system

Brain uses Sparse Coding to represent information
Sparse codes provide advantages for learning new associations
Brain has specialized regions and multiple memory systems
Complementary Learning Systems theory states two systems are needed for efficient learning
Method aims to incorporate components of brain in ANNs

Sparse coding in dnns

Brain has sparse neural codes, while DNNs have dense connections and overlapping representations
Sparse representations can reduce interference between tasks and lead to less forgetting
Activation sparsity can lead to natural emergence of modules
To mimic sparse coding in DNNs, activation sparsity and semantic dropout are employed
K-winner-take-all activation function is used
Activation score is assigned to each filter in the layer
Sparsity ratio is set for each layer
Semantic dropout encourages similar units for semantically similar samples
Probability of retention of a unit is proportional to the number of times it has been activated for that class
Activation sparsity and semantic dropout provide efficient mechanism for balancing reusability and interference of features

Multiple memory systems

Episodic Memory: Fixed-size buffer to store information
Long-Term Memory: Aggregates weights of working memory over time
Long-Term Memory: Builds structural representations for generalization

Overall formulation

CL task is to learn joint distribution of observed tasks without task labels at test time
SCoMMER involves training a working memory and maintaining an additional long-term memory and episodic memory
Long-term memory is initialized with same parameters as working memory and has same sparsity constraints
Heterogeneous dropout probabilities are randomly initialized
Overall loss is combination of cross-entropy and knowledge retrieval loss
Long-term memory is used for inference

Evaluation protocol

SCoMMER is a model designed to tackle the challenges of lifelong learning
Class-IL presents a challenging CL scenario where the model must learn to distinguish between all the classes seen so far without task labels
GCIL extends the Class-IL setting to more realistic scenarios with class imbalance and varying number of classes
Task-IL setting uses task labels to select the subset of output logits

Empirical evaluation

SCoMMER is compared to state-of-the-art rehearsal-based methods
SCoMMER provides performance gains in the majority of cases
SCoMMER provides considerable improvement under low buffer size settings
SCoMMER enables effective utilization of a single semantic memory
SCoMMER provides advantages in GCIL setting
SCoMMER enables reuse of sparse code associated with recurring objects
SCoMMER leads to emergence of subnetworks that provide modularity and protection

Ablation study

SCoMMER contributes to performance gains
Semantic dropout enforces sparse coding and reduces interference between tasks
Multiple memory systems provide considerable performance improvement
Sparsity is critical for enabling CL in DNNs
Sparse activations improve ER performance and enable efficient utilization of semantic memory

Characteristics analysis

Analyzed models trained on S-CIFAR100
Buffer size of 200 used

Stability-plasticity dilemma

SCoMMER is able to maintain a better balance between stability and plasticity than other methods.
SCoMMER provides more uniform performance on tasks compared to baselines.
SCoMMER retains performance on earlier tasks and provides good performance on recent tasks.
Long-term memory aggregates learned knowledge into synaptic weights and generalizes well across tasks.

Emergence of subnetworks

Average activity of units in penultimate layer is tracked during training
High correlation between test set activation counts and semantic dropout probabilities
Activation counts hint at emergence of semantically conditioned subnets
Semantically similar classes have higher degree of correlation between their activation patterns

Task recency bias

Recency bias leads to forgetting of earlier tasks
SCoMMER provides more uniform probabilities to predict each task
CLS-ER mitigates bias towards the last task but reduces probability of predicting the last task
SCoMMER effectively mitigates recency bias

Conclusion

Proposed a novel approach to employ sparse coding in multiple memory systems
Enforces activation sparsity and a complementary semantic dropout mechanism
Maintains long-term memory to consolidate learned knowledge
Evaluation shows effectiveness of approach in mitigating forgetting
Sparse coding enables efficient consolidation of knowledge in long-term memory
Reduces bias towards recent tasks and leads to emergence of semantically conditioned subnetworks
Use same network, optimizer, batch size, data augmentations, and number of epochs for all experiments
Hyperparameter tuning uses small held-out validation set and grid search
Consider Class-IL and Generalized Class-IL settings for empirical evaluation
Implementation of k-WTA activation for CNNs and semantic dropout mechanism
Initialize heterogeneous dropout probabilities to ensure learning of first task does not utilize all filters
Semantic dropout probabilities updated at end of each epoch
Dropout applied only to working memory
Long-term memory provides better generalization across tasks
Table S2 provides performance of both working memory and long-term memory
Table S3 shows effect of different sets of hyperparameters
Not highly sensitive to particular set of parameters

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Methodology#

Continual learning in the biological system#

Sparse coding in dnns#

Multiple memory systems#

Overall formulation#

Evaluation protocol#

Empirical evaluation#

Ablation study#

Characteristics analysis#

Stability-plasticity dilemma#

Emergence of subnetworks#

Task recency bias#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Related work

Methodology

Continual learning in the biological system

Sparse coding in dnns

Multiple memory systems

Overall formulation

Evaluation protocol

Empirical evaluation

Ablation study

Characteristics analysis

Stability-plasticity dilemma

Emergence of subnetworks

Task recency bias

Conclusion