Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Propose Equiangular Basis Vectors (EBVs) for classification tasks.
- Learn a transformation function that maps training data points from the original space to a new space where similar points are closer while dissimilar points become farther apart.
- Generate normalized vector embeddings as “predefined classifiers” which are required to not only be with the equal status between each other, but also be as orthogonal as possible.
- Experiments on ImageNet-1K dataset and other downstream tasks demonstrate that our method outperforms the general fully connected classifier.
- Won first place in 2022 DIGIX Global AI Challenge.
Paper Content
Introduction
- Pattern classification field developed to assign input signals to two or more classes
- Deep learning models used to process image, video, audio, text, and other data
- Deep learning methods used to solve classification problems in various scenarios and settings
- Typical classification paradigms illustrated in Figure 1
- Deep learning methods use a trainable fully connected layer with softmax as the classifier
- Classical metric learning methods require a significant amount of extra computation for large-scale datasets
- Equiangular Basis Vectors (EBVs) proposed to replace the fully connected layer associated with softmax
- EBVs predefine fixed normalized vector embeddings for different categories
- Trainable parameters of the network will not be changed with the growth of the number of categories
- EBVs do not need to measure the similarity among different training samples and constrain distance between each category
- EBVs used to minimize the spherical distance of the learned representations with different predefined basis vectors
- EBVs evaluated on diverse computer vision tasks with large-scale datasets
Learning objectives
- Traditional classifiers and machine/deep learning methods can be used for classification tasks
- Two prominent deep learning paradigms discussed: k-way classification layers and classical deep metric learning
- K-way classification layer maps deep representations to semantic categories and minimizes losses between predictions and ground truth
- EBVs predefine equiangular basis vectors that are forced to be with equal status and orthogonal to each other
- Learning objective of EBVs is to make vectorized embedding of input as close as possible to its categorical equiangular basis vector
- Learning objective of deep metric learning is upon massive training samples, while EBVs is for training sample and its fixed categorical vectorized embedding
- Different from specific metric learning approaches such as center loss, prototypical network, and nearest class mean approach
Methodology
Preliminaries
- EBVs are based on Equiangular Lines and the Tammes Problem
- Equiangular Lines are pairwise separated by the same angle
- Maximum number of Equiangular Lines is linearly correlated with dimension d
- Tammes Problem is finding the maximal number of points on a unit hypersphere with a given spherical distance
Definition of equiangular basis vectors
- EBVs are used to predefine fixed d-dimensional embeddings for categories.
- EBVs are kept at a distance from each other on a unit hypersphere.
- The problem is to calculate the coordinates of each vector in the vector set W when given fixed α, d and N.
- EBVs produce a distribution over classes for a query point v.
- The maximum number of categories EBVs can handle is N.
How to generate ebvs?
- The basic idea of Equiangular Basis Vectors (EBVs) is to generate fixed normalized vector embeddings with equal angles as “predefined classifiers”.
- To calculate the EBVs, a Grassmannian Matrix is constructed with the vectors in W.
- The mutual-coherence of W is defined and the lower bound for α is calculated.
- When N = d, a unitary matrix can be constructed, but in other cases it is difficult.
- Stochastic Gradient Descent is used to search the set W that satisfies the definition of EBVs.
- Algorithm 1 provides the code for a simple generation method of the proposed EBVs.
How to achieve the learning objective of ebvs?
- Equiangular Basis Vectors (EBVs) provide fixed learning targets for each independent optimization objective, i.e., semantic categories.
- Deep network is used to extract high-dimensional features and a fully connected classification layer is used to map the features to semantic categories.
- Each category is bound to a unique normalized d-dimensional basis vector in W.
- Cosine distance is used as the distance metric to measure similarity between two inputs.
Merits of our ebvs
- Embedding dimension of EBVs can be manually altered and the trainable parameters of the classifier will not grow linearly when the number of categories increases.
- EBVs are generated before the training step and the fixed d-dimensional embedding of each category will not be changed throughout the optimization process, so they will not introduce a large amount of computation during the training stage.
- EBVs are not sensitive to the optimizers and previous training tricks while they can still achieve state-of-the-art performance.
Experiments
- Quantitative and qualitative experiments conducted on models with k-way fully connected layer with softmax and proposed Equiangular Basis Vectors (EBVs)
- Experiments conducted on ImageNet-1K dataset with 1.28M training images and 50K validation images from 1,000 different object classes
- ImageNet-1K top-1 accuracy reported on validation set under single crop setting
- Followed state-of-the-art training methods provided by TorchVision and timm
- Three training settings: A0, A1, A2
- EBVs outperform FC among all settings
- Ablation studies conducted on dimension d of each basis vector
- Object detection and instance segmentation experiments conducted on COCO 2017 benchmark
- Mask R-CNN and UperNet in MMdetection used as detection framework
- ResNet and Swin Transformer used as backbones
- EBVs surpass framework ending with fully connected layers under ResNet backbone
- Semantic segmentation experiments conducted on ADE20K
- FPN and UperNet in MMSEG used as segmentation framework
- EBVs gain higher mIoU score than general framework trained for 160,000 steps
- Analyzed features in hidden layers with Centered Kernel Alignment (CKA)
- Shallow layers share higher similarity when using EBVs
- Future exploration of relations between basis vector pairs and embed hierarchies, performance of EBVs with large number of categories, and EBVs in other related tasks