Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Task of open-world class-agnostic object detection is addressed
- RGB-based models suffer from overfitting and fail to detect novel-looking objects
- Geometric cues such as depth and normals are incorporated to train an object proposal network
- GOOD significantly improves detection recall for novel object categories and performs well with few training classes
- Using a single “person” class for training on the COCO dataset, GOOD surpasses SOTA methods by 5.0% AR@100
Paper Content
Abstract
- Task of open-world class-agnostic object detection
- RGB-based models suffer from overfitting and fail at detecting novel-looking objects
- Incorporating geometric cues such as depth and normals to train an object proposal network
- Geometry-guided Open-world Object Detector (GOOD) improves detection recall for novel object categories
Introduction
- The standard object detection task is to detect objects from a predefined class list.
- In the open-world setup, object detectors are required to detect all the objects in the scene even though they have only been trained on objects from a limited number of classes.
- Current state-of-the-art object detectors typically struggle in the open-world setup.
- Estimating geometric cues such as depth and normals from a single RGB image has been an active research area for a long time.
- Geometric cues possess built-in invariance to many changes and are more class-agnostic than RGB signals.
- Monocular estimators for mid-level representations have significantly advanced in terms of prediction quality and generalization to novel scenes.
- We propose to use a pseudo-labeling method for incorporating geometric cues into open-world object detector training.
- Incorporating the geometry cues can significantly improve the detection recall of unseen objects.
- Geometry cues help discover novel-looking objects that RGB-based detectors cannot detect.
- RGB-based detectors can make use of more annotations with their strong representation learning ability to generalize to novel, unseen categories.
Exploiting geometric cues
- Models trained on RGB images tend to over-rely on appearance cues for object detection.
- Involving novel objects into training can mitigate the appearance bias.
- Geometric cues such as depth and normals are used to detect unannotated novel objects.
Pseudo labeling method
- Train object proposal network on depth or normal input
- Filter out detected bounding boxes that overlap with base class annotations
- Add remaining top-k boxes to ground truth annotations
- Train class-agnostic object detector using extended bounding box annotation pool
- Use strong data augmentation during Phase-II training
Experiments
- Target two major challenges of open-world class-agnostic object detection: cross-category and cross-dataset generalization
- Split 80 classes in COCO dataset into two parts: one for training, one for testing
- First benchmark: single “person” class and 79 non-person classes
- Second benchmark: 20 PASCAL-VOC classes for training, 60 for testing
- Cross-dataset evaluation: ADE20K dataset for testing
Advantages of geometric cues
- Geometric cues are less sensitive to appearance shifts across classes.
- Geometric cues can achieve higher per-novel-class AR@5 than RGB in many categories.
- Geometric cues are better than edges and pairwise affinities.
- Geometric cues are better than adding shape bias and multi-scale augmentation.
- Depth and normals outperform 2D edge and PA on detecting novel objects.
- Data augmentation is helpful to counteract the noise in pseudo labels.
- More base classes in training allow object detectors to better detect novel objects.
- Ensembling pseudo labels is slightly better than using stacked inputs for Phase-I training.
- Stacking RGB with geometric cues leads to inferior performance.
- Adding pseudo boxes from RGB leads to no performance gains or even worsens the performance.
- Object proposal networks trained on RGB favor smaller detection boxes.