Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Unsupervised learning of discrete representations from continuous ones in neural networks is used in several applications.
  • Vector Quantisation (VQ) is a popular method to achieve such representations.
  • EMA-VQ is often used, but here we study an alternative VQ algorithm based on the learning rule of Kohonen Self-Organising Maps.
  • KSOM is known to offer two potential benefits over EMA-VQ: faster VQ and discrete representations that form a topological structure.
  • Experiments show that KSOM is more robust than EMA-VQ.

Paper Content

Introduction

(online) algorithm

  • Teuvo Kohonen developed the Self-Organising Map (SOM) algorithm
  • SOM is an unsupervised learning/clustering algorithm
  • SOM achieves vector quantisation (VQ) and topological mapping
  • SOM requires a distance function and a neighbourhood matrix
  • Input vectors are clustered into K clusters
  • Weight vectors are randomly initialised and updated iteratively
  • Distance function is typically Euclidean distance
  • Grid of codebook indices is defined
  • Neighbourhood matrix is defined with a pre-defined threshold distance
  • Neighbourhood matrix is updated with a learning rate
  • Hard and Gaussian variants of the neighbourhood matrix are used
  • SOM can be interpreted as a variant of Hebbian learning

Batch algorithm & relation to k-means

  • Algorithm is online and updates weights after every input
  • Batch version takes into account all data points for single update
  • At each iteration step, best matching unit for each input is computed
  • Results summarised for each cluster
  • In case of zero neighbourhood, algorithm reduces to K-means
  • Algorithm needs to be both online and mini-batch

Topographical maps in the brain as motivation

  • KSOM performs clustering and topological mapping
  • Topographical maps in the brain specialise to different types of sensory inputs
  • KSOM design inspired by topographical maps
  • KSOM introduces concept of neighbourhoods between output neurons
  • Clusters whose indices are spatially close on the grid are encouraged to store inputs that are close to each other in the feature space
  • Topological ordering has limited practical benefits

Alternative vq in vq-vaes

  • Replacing VQ algorithm with Kohonen’s algorithm
  • Focusing on image processing as an example

Background: vq-vaes

  • A VQ-VAE consists of an encoder, decoder, and codebook of size K.
  • The encoder transforms an input to a sequence of embedding vectors.
  • Each embedding is quantised to yield a quantised embedding.
  • The decoder transforms the quantised embeddings to a reconstruction of the original input.
  • The parameters of the encoder and decoder are trained to minimise a loss which includes a reconstruction loss and a commitment loss.
  • The codebook weights are trained by an EMA-VQ algorithm.

Initialisation & updates of emas

  • Initialize two EMAs
  • Standard implementations apply updates to all clusters
  • Smoothing over counts is applied to avoid division by zero
  • Standard implementations update all EMAs
  • Ablation study in Sec. 4.1 shows it is crucial to update all EMAs

Experiments

  • Goal of experiments: revisit properties of KSOM when integrated into VQ-VAEs
  • Demonstrate robustness and analyze learned representations
  • Show sensitivity of standard EMA-VQ to configuration details

Sensitivity of the baseline ema-vq

  • Baseline EMA-VQ is sensitive to initialisation and update schemes.
  • Performance of N=1 remains above that of N=0 even after plateau.
  • Updating all clusters, including those with no members, is crucial for good performance.

Reconstruction performance and speed

  • Evaluated reconstruction performance and convergence speed of VQ-VAEs trained with KSOM
  • Experiments conducted on 3 datasets
  • KSOM achieves similar validation reconstruction loss as baseline
  • KSOM is faster than baseline at beginning of training
  • KSOM is more robust than EMA-VQ
  • KSOM achieves same best validation loss as optimal configuration
  • KSOM improves codebook utilisation

Conclusion

  • KSOM is a generalization of EMA-VQ
  • KSOM is more robust than EMA-VQ
  • KSOM can be integrated into existing VQ-VAE code
  • KSOM can develop topological structures
  • KSOM requires an extra hyperparameter (τ)
  • KSOM is recommended for VQ implementations
  • KSOM is more robust than EMA-VQ for CIFAR-10