Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Recent advances in machine learning and computer vision are revolutionizing the field of animal behavior.
- Large datasets of annotated images of animals for markerless pose tracking are still scarce.
- A method is proposed that uses a motion capture system to obtain a large amount of annotated data on animal movement and posture.
- The method extracts the 3D positions of morphological keypoints in reference to the positions of markers attached to the animals.
- A new dataset - 3D-POP - is offered with approximately 300k annotated frames in the form of videos of groups of one to ten freely moving birds.
- 3D-POP is the first dataset of flocking birds with accurate keypoint annotations in 2D and 3D.
Paper Content
Introduction
- Computer vision and machine learning are revolutionizing research methods
- Dataset-driven machine learning methods have been successful in animal behavior tasks
- Automatic methods reduce labor and errors associated with manual coding
- Data on animal locomotion is used to reverse-engineer behaviors and movements
- Creating large datasets with animals is difficult
- Recently, datasets have been created with a focus on animal behavior
- Most solutions are limited to 2D space
- Marker-based motion capture technology has been used to create 3D datasets
- Propose a new motion capture-based approach to create large-scale datasets with a bird species
- Method enables a large amount of training data to be generated in a semi-automatic manner
- Able to track a variety of naturalistic behaviors in a flock of up to 10 individuals
- CNN models trained on dataset are able to predict postures of birds with no markers attached
State of the art
2d posture
- Animal Kingdom is the largest dataset with 50 hours of video annotations
- Other datasets contain images and focus on capturing variations in terms of specific taxa
- Few datasets offer posture annotations for multiple individuals
- Existing datasets have motivated the development of various methods for posture estimation
- Manual annotations limit the complexity of datasets
3d posture
- Obtaining 3D ground truth posture is difficult with a group of animals
- Popular method is triangulation of 2D postures using multiple views
- Acinoset, Fly3D, OpenMonkeyStudio use triangulation-based approaches
- Alternative approach is marker-based motion capture with skeleton tracking
- Rat 7M dataset uses motion capture with RGB cameras and 20 markers
- Rat 7M dataset is first with 3D ground truth with more than one animal
- Motion capture systems offer high accuracy and low noise
- Marker placement is a limitation for smaller species and wild animals
- 2D keypoints and silhouettes, synthetic datasets, and toys used to predict 3D posture
- Computer vision literature focuses on extracting detail
- Head and body orientations sufficient to quantify key behaviors in groundforaging contexts
- Measuring head direction in 3D allows gaze reconstruction
Multi-object tracking with identity
- Identity recognition is a critical problem in biological studies
- Tracking and identification of multiple individuals in large groups is important
- Existing solutions perform well with specific perspectives, but not with occlusion
- Very few datasets offer the possibility of solving all problems simultaneously
- 3D-POP dataset includes video recordings of 18 unique pigeons from multiple views
- Ground truth for identity, 2D-3D trajectories, and 2D-3D posture mapping is available
- Annotations for object detection in the form of bounding boxes are included
Methods
Experimental setup
- Dataset was collected from pigeons moving on a jute fabric
- Grains were scattered to encourage the birds to feed in that area
- Mo-cap system was used to track 3D positions of reflective markers
- 4 high-resolution cameras and an Arduino-based synchronization box were placed at the corners of the feeding area
Animal subjects
- 18 pigeons were studied over 6 days
- 10 pigeons were randomly selected each day
- 4 reflective markers were attached to each pigeon’s head
- 4 markers were attached to a customized backpack worn by each pigeon
- Pigeons tolerated markers and quickly habituated to backpacks
- Unique geometric configuration was used to track individual identities
- 11 trials were performed each day
- Total frames and duration of samples are described in Table 1
- An additional session was recorded with birds without markers
Data annotation pipeline
- 6-DOF pose of rigid objects can be tracked in 3D space
- 4 markers attached to head and body of bird used to compute 6-DOF pose
- Pipeline designed to annotate position of features on head and body
- Relationship between markers and features does not change during sequence
- 9 morphological keypoints annotated on 5-10 frames from all view angles
- 3D positions of keypoints transferred to global coordinate system using 6-DOF pose
- Bounding box annotations derived from keypoint projections
- Dataset provides accurate ground truth for 3D keypoints, 2D keypoints, bounding boxes, and individual identities
- Dataset includes RGB images from 4 high-resolution cameras and up to 6 hours of recordings
Customization
- We released 3D-POPAP, a tool to manipulate annotations of a dataset
- We designed the annotation approach to allow for easy addition of 2D/3D keypoint annotations
- There are no datasets available with ground truth on 3D posture of birds
Dataset validation
- 3D-POP annotations are obtained automatically
- 3 tests designed to validate accuracy and consistency of annotations
- First test compares accuracy of 3D features computed with 3D-POP and Kano et al.
- Second test measures consistency of 3D/2D annotations across dataset
- Third test checks variation in 3D pose captured in all sequences
- Ground truth 3D position of eyes and beak compared to 3D position computed with 3D-POP
- RMSE for all three features is sufficient for pigeons
- Method alleviates need of using dedicated calibration rigs
- 2D keypoint detection model trained on 15177 images
- Outlier analysis used to filter out 2.9% of dataset
- Consistency check reveals annotations are largely consistent with model predictions
- Visual examples of outlier frames with mo-cap errors
- 96.1% of gaps in dataset are less than 30 frames
- 74,924 unique orientations of head and 14,191 unique orientations of body in dataset
Experiments
Marker-based + markerless hybrid approach
- Markerless tracking algorithm trained on 3D-POP can solve 3D tracking when mo-cap fails
- Hybrid tracking solution has potential applications for future behavior studies
- Tested on 5 min sequence with 25% of mo-cap tracking data removed
- Achieved avg. RMS error of 9.2 mm with proposed solution, compared to 52.1 mm with linear interpolation
- Robust markerless 3D tracking solution needed for biologists to switch from motion-tracking technology
Markerless bird tracking
- Models trained with 3D-POP dataset can be used to track birds without markers.
- Experiment works as a “sanity check” to ensure models are not biased.
- Experiment demonstrates potential contribution of method to develop markerless 3D tracking, posture estimation, and identification.
Manual validation
- Experiment demonstrated validity of assumption that body keypoints behave like points on a rigid body
- Compared manual and automatic annotations using PCK05 and PCK10 metrics, average PCK05 of 66% and PCK10 of 94%
- Visual quantification showed only 2.8% of frames are cases where birds are moving wings, valid in over 97% of dataset
Limitations and future work
- Assumption that head and body behave as rigid bodies does not hold for certain body parts
- Proposed approach does not support annotation for flying birds or birds that change shape of body parts
- Approach depends on tracking accuracy of motion capture system
- Outlier detection method is effective at identifying noisy annotations
- Dataset was curated semi-automatically in existing motion tracking setup, limited to indoor environment
Conclusion
- Introduced novel method to use mocap system for generating large-scale datasets with multiple animals
- Semi-automated method offers alternative for generating high-quality datasets with animals without manual effort
- 3D-POP dataset offers ground truth for 3D posture prediction and identity tracking in birds
- Intrinsic calibration used A0 charuco checkerboard and extrinsic calibration used subject based approach
- Synchronized RGB action cameras with camera control box and arduino based synchronization device
- 3D-POP dataset contains ground truth annotation for pigeons with markers attached to their body
- Experiment 2 shows models trained on 3D-POP dataset work well with pigeons recorded in same arena without markers
- 3D-POP dataset includes trials of freely moving birds without any marker attachment
- Post-processing pipeline used to fix mis-labelled frames
- Technique designed to measure 3D orientation of pigeon body parts relative to each axis
- Dataset available for download