Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Goal of paper is to detect objects by exploiting their interrelationships.
Infer graph prior from object co-occurrence statistics.
Model object relations as a function of initial class predictions and co-occurrence priors.
Learn object-relation joint distribution via energy based modeling.
Experiments show method is detector agnostic, end-to-end trainable, and beneficial for rare object classes.
Establish consistent improvement over object detectors and state-of-the-art methods.

Paper Content

Introduction

Object detection is a classical task in computer vision
Deep convolutional networks and self-attention based transformer networks are used
These architectures focus on the image space for feature representation
Jiang et al. hypothesizes that incorporating prior knowledge in the training of a detector should improve detection
Xu et al. propose to calculate object relations solely from object features and their spatial orientation
Xu et al. also introduce a different way of leveraging the prior information
We propose a novel way of representing object-relations in an image
We propose a way to jointly model object-relation distributions
We create an end-to-end framework where relation edges are created based on initial class predictions
We demonstrate the potential of our graph priors for object detection
We present an energy-based framework for learning the object-relation distribution in an image
Our method is detector agnostic, end-to-end trainable, and beneficial for rare object classes
We establish a consistent improvement over object detectors and state-of-the-art methods

Object detection models can be divided into convolution and transformer networks
ObjectBox treats objects as center-points in a shape and size agnostic way
Object detectors use spatial context information in the image space
Early works used object relations as a post-processing step
Recent works model objects and relations together in a graph structure
Works exploit priors like cooccurrence or attributes of object classes to obtain a graph representation
Energy-based models bring simplicity and generality in likelihood modeling

Method

Graph priors for object detection

Visual features are extracted from an input image
Prior knowledge is used to create an edge connectivity matrix
Visual Genome dataset is used as a source of prior knowledge
Graphical representation of the input image is obtained based on feature vectors and edge values
Edge connectivity matrix is calculated based on class predictions and prior matrix
Joint task loss function is used to optimize for final task of classification

Potential of graph priors

Experiment performed to test optimal condition of proposed graph formulation
Experiment replaces set of edges with ground truth relation set
Ground truth graph created by performing IoU-matching and looking up prior knowledge matrix
Graph priors improve detection results of base network by large margin
Edge matrix is a function of class probability matrix generated from feature vector set

Energy-based models represent joint probability distribution of two variables x and y
Computing the normalization constant (Z(θ)) is intractable
Maximizing log-likelihood is equivalent to minimizing KL-divergence
Stochastic gradient Langevin Dynamics (SGLD) used to approximate model distribution
Represent image as graph G=(N, E)
Learn conditional distribution p(z|N, E)
Model joint distribution with classification model and energy function
Initialize graph G0 with base detector predictions
Update graph iteratively with energy function
Optimize model parameters by minimizing joint loss function
Inference: pass image through base network, refine graph t times, feed through message passing and classification model

Experiments

Datasets, evaluation and implementation

Experiments conducted on Visual Genome and MS-COCO 2017 datasets
Task is to localize and classify objects into preset categories
Visual Genome dataset split into 87.9K for training and 5K for testing
Evaluated on 80 object categories of MS-COCO 2017
Base detector is Faster-RCNN and DETR
Message passing and energy model used
Results and source code made publicly available

State-of-the-art comparison

Comparing our method to state-of-the-art on VG1000 and VG3000
Achieved state-of-the-art results on almost all metrics on VG1000 and VG3000
Good improvement in average recall numbers on VG1000 and VG3000
Results on MS-COCO comparable to state-of-the-art
Graph prior ablation study on VG1000
Performance of our method for different proposal numbers on VG1000
Exploiting object context information for object detection via graph priors
Using graph priors during training and refinement at test time
New state-of-the-art on Visual Genome data partitions for 1,000 and 3,000 object categories
Enhanced features obtained by aggregating node and edge information
Positive effects of exploiting co-occurrence priors
Failure cases due to overconfident wrong results

Detecting Objects with Graph Priors and Graph Refinement

Link to paper

Abstract

Paper Content

Introduction

Method

Graph priors for object detection

Potential of graph priors

Energy-based graph refinement

Experiments

Datasets, evaluation and implementation

State-of-the-art comparison

Link to paper#

Abstract#

Paper Content#

Introduction#

Related works#

Method#

Graph priors for object detection#

Potential of graph priors#

Energy-based graph refinement#

Experiments#

Datasets, evaluation and implementation#

State-of-the-art comparison#

Link to paper

Abstract

Paper Content

Introduction

Related works

Method

Graph priors for object detection

Potential of graph priors

Energy-based graph refinement

Experiments

Datasets, evaluation and implementation

State-of-the-art comparison