Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Propose Equiangular Basis Vectors (EBVs) for classification tasks.
Learn a transformation function that maps training data points from the original space to a new space where similar points are closer while dissimilar points become farther apart.
Generate normalized vector embeddings as “predefined classifiers” which are required to not only be with the equal status between each other, but also be as orthogonal as possible.
Experiments on ImageNet-1K dataset and other downstream tasks demonstrate that our method outperforms the general fully connected classifier.
Won first place in 2022 DIGIX Global AI Challenge.

Pattern classification field developed to assign input signals to two or more classes
Deep learning models used to process image, video, audio, text, and other data
Deep learning methods used to solve classification problems in various scenarios and settings
Typical classification paradigms illustrated in Figure 1
Deep learning methods use a trainable fully connected layer with softmax as the classifier
Classical metric learning methods require a significant amount of extra computation for large-scale datasets
Equiangular Basis Vectors (EBVs) proposed to replace the fully connected layer associated with softmax
EBVs predefine fixed normalized vector embeddings for different categories
Trainable parameters of the network will not be changed with the growth of the number of categories
EBVs do not need to measure the similarity among different training samples and constrain distance between each category
EBVs used to minimize the spherical distance of the learned representations with different predefined basis vectors
EBVs evaluated on diverse computer vision tasks with large-scale datasets

Traditional classifiers and machine/deep learning methods can be used for classification tasks
Two prominent deep learning paradigms discussed: k-way classification layers and classical deep metric learning
K-way classification layer maps deep representations to semantic categories and minimizes losses between predictions and ground truth
EBVs predefine equiangular basis vectors that are forced to be with equal status and orthogonal to each other
Learning objective of EBVs is to make vectorized embedding of input as close as possible to its categorical equiangular basis vector
Learning objective of deep metric learning is upon massive training samples, while EBVs is for training sample and its fixed categorical vectorized embedding
Different from specific metric learning approaches such as center loss, prototypical network, and nearest class mean approach

EBVs are based on Equiangular Lines and the Tammes Problem
Equiangular Lines are pairwise separated by the same angle
Maximum number of Equiangular Lines is linearly correlated with dimension d
Tammes Problem is finding the maximal number of points on a unit hypersphere with a given spherical distance

EBVs are used to predefine fixed d-dimensional embeddings for categories.
EBVs are kept at a distance from each other on a unit hypersphere.
The problem is to calculate the coordinates of each vector in the vector set W when given fixed α, d and N.
EBVs produce a distribution over classes for a query point v.
The maximum number of categories EBVs can handle is N.

The basic idea of Equiangular Basis Vectors (EBVs) is to generate fixed normalized vector embeddings with equal angles as “predefined classifiers”.
To calculate the EBVs, a Grassmannian Matrix is constructed with the vectors in W.
The mutual-coherence of W is defined and the lower bound for α is calculated.
When N = d, a unitary matrix can be constructed, but in other cases it is difficult.
Stochastic Gradient Descent is used to search the set W that satisfies the definition of EBVs.
Algorithm 1 provides the code for a simple generation method of the proposed EBVs.

Equiangular Basis Vectors (EBVs) provide fixed learning targets for each independent optimization objective, i.e., semantic categories.
Deep network is used to extract high-dimensional features and a fully connected classification layer is used to map the features to semantic categories.
Each category is bound to a unique normalized d-dimensional basis vector in W.
Cosine distance is used as the distance metric to measure similarity between two inputs.

Embedding dimension of EBVs can be manually altered and the trainable parameters of the classifier will not grow linearly when the number of categories increases.
EBVs are generated before the training step and the fixed d-dimensional embedding of each category will not be changed throughout the optimization process, so they will not introduce a large amount of computation during the training stage.
EBVs are not sensitive to the optimizers and previous training tricks while they can still achieve state-of-the-art performance.

Quantitative and qualitative experiments conducted on models with k-way fully connected layer with softmax and proposed Equiangular Basis Vectors (EBVs)
Experiments conducted on ImageNet-1K dataset with 1.28M training images and 50K validation images from 1,000 different object classes
ImageNet-1K top-1 accuracy reported on validation set under single crop setting
Followed state-of-the-art training methods provided by TorchVision and timm
Three training settings: A0, A1, A2
EBVs outperform FC among all settings
Ablation studies conducted on dimension d of each basis vector
Object detection and instance segmentation experiments conducted on COCO 2017 benchmark
Mask R-CNN and UperNet in MMdetection used as detection framework
ResNet and Swin Transformer used as backbones
EBVs surpass framework ending with fully connected layers under ResNet backbone
Semantic segmentation experiments conducted on ADE20K
FPN and UperNet in MMSEG used as segmentation framework
EBVs gain higher mIoU score than general framework trained for 160,000 steps
Analyzed features in hidden layers with Centered Kernel Alignment (CKA)
Shallow layers share higher similarity when using EBVs
Future exploration of relations between basis vector pairs and embed hierarchies, performance of EBVs with large number of categories, and EBVs in other related tasks