Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Presents a fast and accurate object detection method called DAMO-YOLO
Uses Neural Architecture Search, Reparameterized Generalized-FPN, AlignedOTA label assignment, and distillation enhancement
Searches for a detection backbone with low latency and high performance
Follows the rule of “large neck, small head”
Investigates how detector head size affects detection performance
Achieves 43.0/46.8/50.0 mAPs on COCO with the latency of 2.78/3.83/5.62 ms on T4 GPUs

Paper Content

Introduction

Researchers have developed object detection methods with great progress
Network structure plays a critical role in object detection, and NAS techs have been used to find efficient network structures
Dynamic label assignment and Knowledge Distillation have been used to improve performance of object detection

Damo-yolo

Introduces each module of DAMO-YOLO
Includes Neural Architecture Search (NAS) backbones, efficient Reparameterized Generalized-FPN (RepGFPN) neck, ZeroHead, AlignedOTA label assignment and distillation enhancement
Whole framework of DAMO-YOLO is displayed in Fig. 2

Mae-nas backbone

MAE-NAS is used to obtain optimal networks under different computational budgets.
Search process only takes a few hours.
Backbones designed in the vanilla convolution network space with a new search block “k1kx”.
GPU inference latency, not FLOPs, used as the target budget.
SPP, Focus and CSP modules applied to the final backbones.

Efficient repgfpn

Feature Pyramid Network (FPN) is an effective part of object detection
Conventional FPN has a top-down pathway to fuse multi-scale features
PAFPN adds a bottom-up path aggregation network, but with higher computational costs
BiFPN removes nodes with one input edge and adds skiplink from the original input
GFPN is proposed to serve as neck and achieves SOTA performance by exchanging high-level semantic and low-level spatial information

Zerohead and alignota

Decoupled head is widely used in object detection
Label assignment is a crucial component during detector training
Dynamic label assignment methods assign labels according to the assignment cost between prediction and ground truth
Misalignment of classification and regression is a common issue in static assignment methods
Focal loss is introduced into the classification cost and IoU of prediction and ground truth box is used as the soft label
AlignOTA outperforms all other label assignment methods

Distillation enhancement

Knowledge Distillation (KD) is an effective method to boost the performance of pocket-size models
KD can be difficult to optimize and features can carry too much noise
DAMO-YOLO uses feature-based distillation to transfer dark knowledge
Experiments show CWD is more suitable for DAMO-YOLO
Distillation is split into two stages
Align Module and Channel-wise Dynamic Temperature are used to enhance distillation
Balance between distillation and task loss is necessary
Shallow head of detector is beneficial to feature distillation

Implementation details

Trained 300 epochs with SGD optimizer
Weight decay and SGD momentum of 5e-4 and 0.9
Initial learning rate of 0.4 with batch size of 256
Learning rate decays according to cosine schedule
Utilized Mosaic and Mixup for image-level augmentation and SADA for box-level augmentation

Comparison with the sota

DAMO-YOLO family outperforms all YOLO series in accuracy and speed
Method can detect objects effectively and efficiently

DAMO-YOLO : A Report on Real-Time Object Detection Design

Link to paper

Abstract

Paper Content

Introduction

Damo-yolo

Mae-nas backbone

Efficient repgfpn

Zerohead and alignota

Distillation enhancement

Implementation details

Comparison with the sota

Conclusion

Link to paper#

Abstract#

Paper Content#

Introduction#

Damo-yolo#

Mae-nas backbone#

Efficient repgfpn#

Zerohead and alignota#

Distillation enhancement#

Implementation details#

Comparison with the sota#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Damo-yolo

Mae-nas backbone

Efficient repgfpn

Zerohead and alignota

Distillation enhancement

Implementation details

Comparison with the sota

Conclusion