Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Many works have focused on reducing the number of FLOPs to design fast neural networks.
  • Reducing FLOPs does not necessarily lead to a similar level of reduction in latency.
  • Low FLOPS is mainly due to frequent memory access of operators, especially the depthwise convolution.
  • Proposed a novel partial convolution (PConv) to extract spatial features more efficiently.
  • Proposed FasterNet, a new family of neural networks, which is faster and more accurate than existing networks.

Paper Content

Introduction

  • Neural networks have been developed for computer vision tasks
  • Researchers and practitioners prefer to design cost-effective fast neural networks
  • Networks are designed to reduce computational complexity, measured in FLOPs
  • DWConv and GConv are used to extract spatial features
  • MicroNet further decomposes and sparsifies the network to reduce FLOPs
  • ViTs and MLPs are also being made smaller and faster
  • FLOPS is a measure of effective computational speed
  • Many existing neural networks suffer from low FLOPS
  • Discrepancy between FLOPs and latency is noticed
  • PConv is proposed as a competitive alternative to reduce computational redundancy and memory access
  • FasterNet is introduced as a new family of networks with low latency and high throughput
  • PConv and FasterNet are validated with extensive experiments
  • CNNs are the mainstream architecture in computer vision
  • Numerous studies have been done to increase efficiency
  • Group convolution and depthwise separable convolution are popular
  • Growing interest in ViT and variants
  • Studies have attempted to improve ViT in terms of training and model design
  • Attention-based mechanisms generally run slower than convolutional counterparts
  • Focus on analyzing convolution operations, particularly DWConv

Design of pconv and fasternet

  • DWConv has an issue with frequent memory access
  • PConv is an alternative operator to resolve the issue
  • FasterNet is introduced and its details are explained

Preliminary

  • DWConv is a variant of Conv and is used in many neural networks.
  • DWConv has low FLOPs compared to regular Conv.
  • DWConv is typically followed by a pointwise convolution.
  • DWConv requires a higher channel number to compensate for accuracy drop.
  • This results in higher memory access and can slow down computation.

Partial convolution as a basic operator

  • Feature maps share high similarities among different channels
  • Proposed a simple PConv to reduce computational redundancy and memory access
  • PConv has fewer FLOPs and memory access than a regular Conv
  • Remaining channels are kept untouched instead of removed

Pconv followed by pwconv

  • PConv and PW-Conv have an effective receptive field that looks like a T-shaped Conv.
  • The center position of the T-shaped Conv is more important than its surrounding neighbors.
  • Decomposing the T-shaped Conv into a PConv and a PWConv saves FLOPs.

Fasternet as a general backbone

  • Proposed FasterNet, a new family of neural networks
  • Architecture has four hierarchical stages
  • Each stage has a stack of FasterNet blocks
  • Blocks in last two stages consume less memory access and have higher FLOPS
  • Each FasterNet block has a PConv layer followed by two PWConv layers
  • Normalization and activation layers used after each middle PWConv
  • Batch normalization and GELU/ReLU used
  • Global average pooling, Conv 1x1, and fully-connected layer used for feature transformation and classification
  • Four variants of FasterNet provided (Tiny, Small, Medium, Large)

Experimental results

  • Examined computational speed and effectiveness of PConv and PWConv
  • Evaluated performance of FasterNet for classification, detection, and segmentation tasks
  • Conducted ablation study
  • Benchmarked latency and throughput using GPU, CPU, and ARM processors

Pconv is fast with high flops

  • PConv is fast and exploits on-device computational capacity
  • PConv has 1/16 FLOPs of a regular Conv and achieves higher FLOPS than other convolutional variants
  • Regular Conv has highest FLOPS but unaffordable latency/throughput
  • GConv and DWConv have significant reduction in FLOPs but decrease in FLOPS and increase latency

Pconv is effective together with pwconv

  • PConv followed by PWConv is effective in approximating a regular Conv to transform feature maps
  • 4 datasets created by feeding ImageNet-1k val split images into a pre-trained ResNet50 and extracting feature maps before and after the first Conv 3 ร— 3 in each of the 4 stages
  • Simple network consisting of PConv followed by PWConv trained on feature map datasets with mean squared error loss
  • PConv + PWConv achieved lowest test loss, meaning they better approximate a regular Conv in feature transformation
  • PConv shows potential to be new go-to choice in designing fast and effective neural networks

Fasternet on imagenet-1k classification

  • Conducted experiments on ImageNet-1k classification dataset
  • Trained models for 300 epochs using AdamW optimizer
  • Used regularization and augmentation techniques
  • Used 192x192 resolution for first 280 epochs and 224x224 for remaining 20 epochs
  • Reported top-1 accuracy on validation set with center crop at 224x224 resolution
  • Achieved state-of-the-art in balancing accuracy and latency/throughput
  • FasterNet runs faster than various CNN, ViT and MLP models on a wide range of devices
  • Achieved 83.5% top-1 accuracy, comparable to Swin-B and ConvNeXt-B
  • FasterNet is much simpler than many other models in terms of architectural design

Fasternet on downstream tasks

  • Experiments conducted on COCO dataset for object detection and instance segmentation
  • ImageNet pre-trained FasterNet used as backbone
  • Mask R-CNN detector used
  • FasterNet outperforms ResNet and ResNext in terms of latency and average precision
  • FasterNet-S saves 36% compute time and yields higher box and mask AP compared to ResNet50
  • FasterNet is competitive against ViT variants

Ablation study

  • Partial ratio r affects accuracy, throughput, and latency
  • Batch-Norm is used instead of LayerNorm for faster inference
  • GELU is more efficient than ReLU for FasterNet-T0/T1, but not for FasterNet-T2/S/M/L

Conclusion

  • Investigated issue of low FLOPS in established neural networks
  • Identified bottleneck operator DWConv and cause of slowdown - frequent memory access
  • Proposed PConv operator to overcome issue and achieve faster neural networks
  • Introduced FasterNet built upon PConv to achieve state-of-the-art speed and accuracy trade-off
  • Provided details on experimental settings, comparison plots, architectural configurations, PConv implementations, comparisons with related work, limitations, and future work
  • Used ImageNet-1k pre-trained weights to initialize FasterNet backbone for object detection and instance segmentation
  • Compared PConv to GConv and FasterNet to ConvNeXt
  • Explored other paradigms for efficient inference
  • Provided two implementations of the forward pass for PConv
  • Demonstrated that PConv and FasterNet are fast and effective
  • Visualized feature maps in an intermediate layer of a pre-trained ResNet50
  • Showed histogram of salient position distribution for regular Conv 3x3 filters in a pre-trained ResNet18
  • Compared FasterNet with state-of-the-art networks
  • Explained benefit of BN and how it can be merged into adjacent Conv layers for faster inference
  • Compared models with similar top-1 accuracy on ImageNet-1k benchmark
  • Evaluated FasterNet on COCO object detection and instance segmentation benchmarks
  • Conducted ablation on partial ratio, normalization, and activation of FasterNet