Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Unsupervised learning of discrete representations from continuous ones in neural networks is used in several applications.
- Vector Quantisation (VQ) is a popular method to achieve such representations.
- EMA-VQ is often used, but here we study an alternative VQ algorithm based on the learning rule of Kohonen Self-Organising Maps.
- KSOM is known to offer two potential benefits over EMA-VQ: faster VQ and discrete representations that form a topological structure.
- Experiments show that KSOM is more robust than EMA-VQ.
Paper Content
Introduction
(online) algorithm
- Teuvo Kohonen developed the Self-Organising Map (SOM) algorithm
- SOM is an unsupervised learning/clustering algorithm
- SOM achieves vector quantisation (VQ) and topological mapping
- SOM requires a distance function and a neighbourhood matrix
- Input vectors are clustered into K clusters
- Weight vectors are randomly initialised and updated iteratively
- Distance function is typically Euclidean distance
- Grid of codebook indices is defined
- Neighbourhood matrix is defined with a pre-defined threshold distance
- Neighbourhood matrix is updated with a learning rate
- Hard and Gaussian variants of the neighbourhood matrix are used
- SOM can be interpreted as a variant of Hebbian learning
Batch algorithm & relation to k-means
- Algorithm is online and updates weights after every input
- Batch version takes into account all data points for single update
- At each iteration step, best matching unit for each input is computed
- Results summarised for each cluster
- In case of zero neighbourhood, algorithm reduces to K-means
- Algorithm needs to be both online and mini-batch
Topographical maps in the brain as motivation
- KSOM performs clustering and topological mapping
- Topographical maps in the brain specialise to different types of sensory inputs
- KSOM design inspired by topographical maps
- KSOM introduces concept of neighbourhoods between output neurons
- Clusters whose indices are spatially close on the grid are encouraged to store inputs that are close to each other in the feature space
- Topological ordering has limited practical benefits
Alternative vq in vq-vaes
- Replacing VQ algorithm with Kohonen’s algorithm
- Focusing on image processing as an example
Background: vq-vaes
- A VQ-VAE consists of an encoder, decoder, and codebook of size K.
- The encoder transforms an input to a sequence of embedding vectors.
- Each embedding is quantised to yield a quantised embedding.
- The decoder transforms the quantised embeddings to a reconstruction of the original input.
- The parameters of the encoder and decoder are trained to minimise a loss which includes a reconstruction loss and a commitment loss.
- The codebook weights are trained by an EMA-VQ algorithm.
Initialisation & updates of emas
- Initialize two EMAs
- Standard implementations apply updates to all clusters
- Smoothing over counts is applied to avoid division by zero
- Standard implementations update all EMAs
- Ablation study in Sec. 4.1 shows it is crucial to update all EMAs
Experiments
- Goal of experiments: revisit properties of KSOM when integrated into VQ-VAEs
- Demonstrate robustness and analyze learned representations
- Show sensitivity of standard EMA-VQ to configuration details
Sensitivity of the baseline ema-vq
- Baseline EMA-VQ is sensitive to initialisation and update schemes.
- Performance of N=1 remains above that of N=0 even after plateau.
- Updating all clusters, including those with no members, is crucial for good performance.
Reconstruction performance and speed
- Evaluated reconstruction performance and convergence speed of VQ-VAEs trained with KSOM
- Experiments conducted on 3 datasets
- KSOM achieves similar validation reconstruction loss as baseline
- KSOM is faster than baseline at beginning of training
- KSOM is more robust than EMA-VQ
- KSOM achieves same best validation loss as optimal configuration
- KSOM improves codebook utilisation
Conclusion
- KSOM is a generalization of EMA-VQ
- KSOM is more robust than EMA-VQ
- KSOM can be integrated into existing VQ-VAE code
- KSOM can develop topological structures
- KSOM requires an extra hyperparameter (τ)
- KSOM is recommended for VQ implementations
- KSOM is more robust than EMA-VQ for CIFAR-10