Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Recent advances in machine learning and computer vision are revolutionizing the field of animal behavior.
Large datasets of annotated images of animals for markerless pose tracking are still scarce.
A method is proposed that uses a motion capture system to obtain a large amount of annotated data on animal movement and posture.
The method extracts the 3D positions of morphological keypoints in reference to the positions of markers attached to the animals.
A new dataset - 3D-POP - is offered with approximately 300k annotated frames in the form of videos of groups of one to ten freely moving birds.
3D-POP is the first dataset of flocking birds with accurate keypoint annotations in 2D and 3D.

Paper Content

Introduction

Computer vision and machine learning are revolutionizing research methods
Dataset-driven machine learning methods have been successful in animal behavior tasks
Automatic methods reduce labor and errors associated with manual coding
Data on animal locomotion is used to reverse-engineer behaviors and movements
Creating large datasets with animals is difficult
Recently, datasets have been created with a focus on animal behavior
Most solutions are limited to 2D space
Marker-based motion capture technology has been used to create 3D datasets
Propose a new motion capture-based approach to create large-scale datasets with a bird species
Method enables a large amount of training data to be generated in a semi-automatic manner
Able to track a variety of naturalistic behaviors in a flock of up to 10 individuals
CNN models trained on dataset are able to predict postures of birds with no markers attached

State of the art

2d posture

Animal Kingdom is the largest dataset with 50 hours of video annotations
Other datasets contain images and focus on capturing variations in terms of specific taxa
Few datasets offer posture annotations for multiple individuals
Existing datasets have motivated the development of various methods for posture estimation
Manual annotations limit the complexity of datasets

3d posture

Obtaining 3D ground truth posture is difficult with a group of animals
Popular method is triangulation of 2D postures using multiple views
Acinoset, Fly3D, OpenMonkeyStudio use triangulation-based approaches
Alternative approach is marker-based motion capture with skeleton tracking
Rat 7M dataset uses motion capture with RGB cameras and 20 markers
Rat 7M dataset is first with 3D ground truth with more than one animal
Motion capture systems offer high accuracy and low noise
Marker placement is a limitation for smaller species and wild animals
2D keypoints and silhouettes, synthetic datasets, and toys used to predict 3D posture
Computer vision literature focuses on extracting detail
Head and body orientations sufficient to quantify key behaviors in groundforaging contexts
Measuring head direction in 3D allows gaze reconstruction

Multi-object tracking with identity

Identity recognition is a critical problem in biological studies
Tracking and identification of multiple individuals in large groups is important
Existing solutions perform well with specific perspectives, but not with occlusion
Very few datasets offer the possibility of solving all problems simultaneously
3D-POP dataset includes video recordings of 18 unique pigeons from multiple views
Ground truth for identity, 2D-3D trajectories, and 2D-3D posture mapping is available
Annotations for object detection in the form of bounding boxes are included

Methods

Experimental setup

Dataset was collected from pigeons moving on a jute fabric
Grains were scattered to encourage the birds to feed in that area
Mo-cap system was used to track 3D positions of reflective markers
4 high-resolution cameras and an Arduino-based synchronization box were placed at the corners of the feeding area

Animal subjects

18 pigeons were studied over 6 days
10 pigeons were randomly selected each day
4 reflective markers were attached to each pigeon’s head
4 markers were attached to a customized backpack worn by each pigeon
Pigeons tolerated markers and quickly habituated to backpacks
Unique geometric configuration was used to track individual identities
11 trials were performed each day
Total frames and duration of samples are described in Table 1
An additional session was recorded with birds without markers

Data annotation pipeline

6-DOF pose of rigid objects can be tracked in 3D space
4 markers attached to head and body of bird used to compute 6-DOF pose
Pipeline designed to annotate position of features on head and body
Relationship between markers and features does not change during sequence
9 morphological keypoints annotated on 5-10 frames from all view angles
3D positions of keypoints transferred to global coordinate system using 6-DOF pose
Bounding box annotations derived from keypoint projections
Dataset provides accurate ground truth for 3D keypoints, 2D keypoints, bounding boxes, and individual identities
Dataset includes RGB images from 4 high-resolution cameras and up to 6 hours of recordings

Customization

We released 3D-POPAP, a tool to manipulate annotations of a dataset
We designed the annotation approach to allow for easy addition of 2D/3D keypoint annotations
There are no datasets available with ground truth on 3D posture of birds

Dataset validation

3D-POP annotations are obtained automatically
3 tests designed to validate accuracy and consistency of annotations
First test compares accuracy of 3D features computed with 3D-POP and Kano et al.
Second test measures consistency of 3D/2D annotations across dataset
Third test checks variation in 3D pose captured in all sequences
Ground truth 3D position of eyes and beak compared to 3D position computed with 3D-POP
RMSE for all three features is sufficient for pigeons
Method alleviates need of using dedicated calibration rigs
2D keypoint detection model trained on 15177 images
Outlier analysis used to filter out 2.9% of dataset
Consistency check reveals annotations are largely consistent with model predictions
Visual examples of outlier frames with mo-cap errors
96.1% of gaps in dataset are less than 30 frames
74,924 unique orientations of head and 14,191 unique orientations of body in dataset

Experiments

Marker-based + markerless hybrid approach

Markerless tracking algorithm trained on 3D-POP can solve 3D tracking when mo-cap fails
Hybrid tracking solution has potential applications for future behavior studies
Tested on 5 min sequence with 25% of mo-cap tracking data removed
Achieved avg. RMS error of 9.2 mm with proposed solution, compared to 52.1 mm with linear interpolation
Robust markerless 3D tracking solution needed for biologists to switch from motion-tracking technology

Markerless bird tracking

Models trained with 3D-POP dataset can be used to track birds without markers.
Experiment works as a “sanity check” to ensure models are not biased.
Experiment demonstrates potential contribution of method to develop markerless 3D tracking, posture estimation, and identification.

Manual validation

Experiment demonstrated validity of assumption that body keypoints behave like points on a rigid body
Compared manual and automatic annotations using PCK05 and PCK10 metrics, average PCK05 of 66% and PCK10 of 94%
Visual quantification showed only 2.8% of frames are cases where birds are moving wings, valid in over 97% of dataset

Limitations and future work

Assumption that head and body behave as rigid bodies does not hold for certain body parts
Proposed approach does not support annotation for flying birds or birds that change shape of body parts
Approach depends on tracking accuracy of motion capture system
Outlier detection method is effective at identifying noisy annotations
Dataset was curated semi-automatically in existing motion tracking setup, limited to indoor environment

Conclusion

Introduced novel method to use mocap system for generating large-scale datasets with multiple animals
Semi-automated method offers alternative for generating high-quality datasets with animals without manual effort
3D-POP dataset offers ground truth for 3D posture prediction and identity tracking in birds
Intrinsic calibration used A0 charuco checkerboard and extrinsic calibration used subject based approach
Synchronized RGB action cameras with camera control box and arduino based synchronization device
3D-POP dataset contains ground truth annotation for pigeons with markers attached to their body
Experiment 2 shows models trained on 3D-POP dataset work well with pigeons recorded in same arena without markers
3D-POP dataset includes trials of freely moving birds without any marker attachment
Post-processing pipeline used to fix mis-labelled frames
Technique designed to measure 3D orientation of pigeon body parts relative to each axis
Dataset available for download

Link to paper#

Abstract#

Paper Content#

Introduction#

State of the art#

2d posture#

3d posture#

Multi-object tracking with identity#

Methods#

Experimental setup#

Animal subjects#

Data annotation pipeline#

Customization#

Dataset validation#

Experiments#

Marker-based + markerless hybrid approach#

Markerless bird tracking#

Manual validation#

Limitations and future work#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

State of the art

2d posture

3d posture

Multi-object tracking with identity

Methods

Experimental setup

Animal subjects

Data annotation pipeline

Customization

Dataset validation

Experiments

Marker-based + markerless hybrid approach

Markerless bird tracking

Manual validation

Limitations and future work

Conclusion