Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Propose Equiangular Basis Vectors (EBVs) for classification tasks.
  • Learn a transformation function that maps training data points from the original space to a new space where similar points are closer while dissimilar points become farther apart.
  • Generate normalized vector embeddings as “predefined classifiers” which are required to not only be with the equal status between each other, but also be as orthogonal as possible.
  • Experiments on ImageNet-1K dataset and other downstream tasks demonstrate that our method outperforms the general fully connected classifier.
  • Won first place in 2022 DIGIX Global AI Challenge.

Paper Content

Introduction

  • Pattern classification field developed to assign input signals to two or more classes
  • Deep learning models used to process image, video, audio, text, and other data
  • Deep learning methods used to solve classification problems in various scenarios and settings
  • Typical classification paradigms illustrated in Figure 1
  • Deep learning methods use a trainable fully connected layer with softmax as the classifier
  • Classical metric learning methods require a significant amount of extra computation for large-scale datasets
  • Equiangular Basis Vectors (EBVs) proposed to replace the fully connected layer associated with softmax
  • EBVs predefine fixed normalized vector embeddings for different categories
  • Trainable parameters of the network will not be changed with the growth of the number of categories
  • EBVs do not need to measure the similarity among different training samples and constrain distance between each category
  • EBVs used to minimize the spherical distance of the learned representations with different predefined basis vectors
  • EBVs evaluated on diverse computer vision tasks with large-scale datasets

Learning objectives

  • Traditional classifiers and machine/deep learning methods can be used for classification tasks
  • Two prominent deep learning paradigms discussed: k-way classification layers and classical deep metric learning
  • K-way classification layer maps deep representations to semantic categories and minimizes losses between predictions and ground truth
  • EBVs predefine equiangular basis vectors that are forced to be with equal status and orthogonal to each other
  • Learning objective of EBVs is to make vectorized embedding of input as close as possible to its categorical equiangular basis vector
  • Learning objective of deep metric learning is upon massive training samples, while EBVs is for training sample and its fixed categorical vectorized embedding
  • Different from specific metric learning approaches such as center loss, prototypical network, and nearest class mean approach

Methodology

Preliminaries

  • EBVs are based on Equiangular Lines and the Tammes Problem
  • Equiangular Lines are pairwise separated by the same angle
  • Maximum number of Equiangular Lines is linearly correlated with dimension d
  • Tammes Problem is finding the maximal number of points on a unit hypersphere with a given spherical distance

Definition of equiangular basis vectors

  • EBVs are used to predefine fixed d-dimensional embeddings for categories.
  • EBVs are kept at a distance from each other on a unit hypersphere.
  • The problem is to calculate the coordinates of each vector in the vector set W when given fixed α, d and N.
  • EBVs produce a distribution over classes for a query point v.
  • The maximum number of categories EBVs can handle is N.

How to generate ebvs?

  • The basic idea of Equiangular Basis Vectors (EBVs) is to generate fixed normalized vector embeddings with equal angles as “predefined classifiers”.
  • To calculate the EBVs, a Grassmannian Matrix is constructed with the vectors in W.
  • The mutual-coherence of W is defined and the lower bound for α is calculated.
  • When N = d, a unitary matrix can be constructed, but in other cases it is difficult.
  • Stochastic Gradient Descent is used to search the set W that satisfies the definition of EBVs.
  • Algorithm 1 provides the code for a simple generation method of the proposed EBVs.

How to achieve the learning objective of ebvs?

  • Equiangular Basis Vectors (EBVs) provide fixed learning targets for each independent optimization objective, i.e., semantic categories.
  • Deep network is used to extract high-dimensional features and a fully connected classification layer is used to map the features to semantic categories.
  • Each category is bound to a unique normalized d-dimensional basis vector in W.
  • Cosine distance is used as the distance metric to measure similarity between two inputs.

Merits of our ebvs

  • Embedding dimension of EBVs can be manually altered and the trainable parameters of the classifier will not grow linearly when the number of categories increases.
  • EBVs are generated before the training step and the fixed d-dimensional embedding of each category will not be changed throughout the optimization process, so they will not introduce a large amount of computation during the training stage.
  • EBVs are not sensitive to the optimizers and previous training tricks while they can still achieve state-of-the-art performance.

Experiments

  • Quantitative and qualitative experiments conducted on models with k-way fully connected layer with softmax and proposed Equiangular Basis Vectors (EBVs)
  • Experiments conducted on ImageNet-1K dataset with 1.28M training images and 50K validation images from 1,000 different object classes
  • ImageNet-1K top-1 accuracy reported on validation set under single crop setting
  • Followed state-of-the-art training methods provided by TorchVision and timm
  • Three training settings: A0, A1, A2
  • EBVs outperform FC among all settings
  • Ablation studies conducted on dimension d of each basis vector
  • Object detection and instance segmentation experiments conducted on COCO 2017 benchmark
  • Mask R-CNN and UperNet in MMdetection used as detection framework
  • ResNet and Swin Transformer used as backbones
  • EBVs surpass framework ending with fully connected layers under ResNet backbone
  • Semantic segmentation experiments conducted on ADE20K
  • FPN and UperNet in MMSEG used as segmentation framework
  • EBVs gain higher mIoU score than general framework trained for 160,000 steps
  • Analyzed features in hidden layers with Centered Kernel Alignment (CKA)
  • Shallow layers share higher similarity when using EBVs
  • Future exploration of relations between basis vector pairs and embed hierarchies, performance of EBVs with large number of categories, and EBVs in other related tasks