Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Reinforcement learning can enable robots to learn complex manipulation tasks
System provides a “programming-free” approach for users to define new tasks
System includes a framework for users to define tasks with image examples
Reinforcement learning procedure learns the task autonomously without interventions
Experimental results with a four-finger robotic hand learning tasks in the real world

Paper Content

I. introduction

Robotic manipulation tasks are difficult to control
Reinforcement learning (RL) is an appealing choice for such settings
RL on real-world robotic platforms raises practical issues
Alternative solutions such as transfer from simulation, imitation learning, or motion capture
Effective real-world RL requires manual engineering, reward engineering, reset mechanisms, etc.
Propose a robotic learning system that can learn to control multi-fingered robotic hands from raw visual observations
System autonomously practices a sequence of sub-skills based on high-level milestone specifications
Multi-camera visual observations used to localize and manipulate objects
Policies learned end-to-end from pixels and no motion capture
Simple physical instrumentation used to prevent object from falling out of reach
System can learn dexterous behaviors by practicing for up to 48 hours autonomously

Prior work studied control of complex hands using various methods
Majority of prior work assumed access to compact state representations or accurate simulators and object models
Closer to the system described in the paper is prior work on learning visuomotor policies for dexterous manipulation
Prior systems on RL for dexterous manipulation require assumptions on manually designed rewards or ground truth object state observations
Task specification without manual reward engineering by using intermediate milestone examples

Method no hand engineered reward

Multi-task vision involves using computer vision to complete multiple tasks.
High-DoFs (degrees of freedom) are used to increase the accuracy of the tasks.

Ours

Combination of building blocks enables learning image-based dexterous manipulation in the real world
Most closely related robotic RL systems are R3L and MTRF
Assumptions regarding lightweight instrumentation and vision-based autonomous learning most resemble R3L
Our work tackles a more challenging setting than R3L
Multitask RL formulation builds on ideas from MTRF
Focus is on providing a framework for vision-based learning of object manipulation skills with high-dimensional hands
Automated RL system runs continuously for 48 hours
Lightweight physical instrumentation used (tethering object to prevent it from falling outside of grasping range)
Visual perception and reward specification challenges addressed, avoiding manual reward engineering and motion capture

Iii. robotic platform and problem overview

Robot platform consists of 4-finger, 16-DoF robot hand mounted on 7-DoF Sawyer robotic arm
Action space is 22-dimensional, state space is 29-dimensional
System designed to operate for 48 hours in contact-rich environments
Two RGB image observations provided to robot via two low-cost web cameras
Tasks consist of manipulation behaviors such as reaching, grasping, in-hand and mid-air reorienting, and inserting
Supervision is to allow user to place robot and object in desired position and capture image “snapshots”

Iv. problem formalism and user assumptions

Problem setting and supervision assumptions are formalized
MDP defined by tuple (S, A, p dyn , ρ, γ, R)
Standard RL assumes hand-engineered reward function
Challenge is acute in dexterous manipulation setting
Environment assumed to be episodic
User supplies robot with set of sub-problems to practice
Graph structure of cardinality K with M outcome images
Directed edge (v, v ) indicates which sub-task is to be practiced next
Optimize all sub-tasks in milestone graph simultaneously
Learn set of K policies, reward function, and sub-task transition function
Leverage user-provided milestone supervision and categorical labels

V. the avail system: autonomy via

System presented to address problem: AVAIL
System reframes RL training process as multitask problem
System learns from sparse milestone examples provided by users
System leverages user-provided milestones
System functions by deriving reward functions, optimizing rewards, and determining which tasks to perform

A. visual multi-task policy and reward learning from user milestones

Ability for robot to assign rewards to its own experience
Relieves burden of manual reward engineering
Trade-off of removing ability of designer to provide task information
Leverage sparse milestone supervision to learn set of success classifiers
Learn binary classifier over user-provided examples and agent’s own experience
Classifier probability used as reward
Learn joint encoders and raw image observations using DroQ approach

B. multi-task learning without oracles

Agent must decide which task to attempt next based on current situation
User provides labels to train task dynamics model
Agent samples task from learned model and executes corresponding policy
Task inference and task learning are separated
Robot learns skills to autonomously practice hooking and unhooking
Robot learns skills to enable successful insertion
After 36 hours of training, success rate of 95% for hooking and 80% for insertion

C. algorithm summary and implementation details

AVAIL performs supervised learning of the next task transitions
AVAIL chooses the most probable task using an observed state
AVAIL trains a set of separate policies for each of the K sets of example images
Each policy is parameterized as a deep neural network and trained using the soft actor-critic algorithm
Resampling the task is done every 100 steps

Vi. experimental evaluation

Evaluated AVAIL’s ability to learn complex manipulation skills in the real world
Evaluated AVAIL on three real world manipulation tasks
Evaluated AVAIL in simulation for comparison with prior methods

A. real-world task descriptions

Use a two-sided cleaning brush to scrape a plate
Insert a cylindrical hose connector onto a peg connector
Attach a hook to a handle

B. real-world evaluation

Evaluated system by saving policies at regular intervals and evaluating performance
Evaluation metric for each milestone is binary success measurement based on distance of hand and object to desired pose
Real-world skill learning: robot able to successfully perform all milestones with > 80% success rate
Real-world task graph learning: learned task graph outperforms hand-designed task graph in terms of sample efficiency

C. simulated comparison

SAC (Soft Actor-Critic) is a state-of-the-art RL algorithm
Forward Backward Controller provides two milestones for the task
R3L interleaves training of the forward policy with a “perturbation controller”
SAC fails to progress in the reset-free setting
Forward Backward Controller is the only prior method that succeeds in making progress

Vii. discussion and future work

Proposed a method for multi-task learning from high dimensional image observations
Constructs a task graph from a modest number of user-provided milestone examples
Results illustrate the benefits of milestone supervision
Learned task graph results in faster convergence on real world robotic task compared to handcoded heuristic based task graph

Viii. acknowledgement

Research project was partially funded by Office of Naval Research
Computing resources donated by Microsoft Azure

Link to paper#

Abstract#

Paper Content#

I. introduction#

Ii. related work#

Method no hand engineered reward#

Ours#

Iii. robotic platform and problem overview#

Iv. problem formalism and user assumptions#

V. the avail system: autonomy via#

A. visual multi-task policy and reward learning from user milestones#

B. multi-task learning without oracles#

C. algorithm summary and implementation details#

Vi. experimental evaluation#

A. real-world task descriptions#

B. real-world evaluation#

C. simulated comparison#

Vii. discussion and future work#

Viii. acknowledgement#

Link to paper

Abstract

Paper Content

I. introduction

Ii. related work

Method no hand engineered reward

Ours

Iii. robotic platform and problem overview

Iv. problem formalism and user assumptions

V. the avail system: autonomy via

A. visual multi-task policy and reward learning from user milestones

B. multi-task learning without oracles

C. algorithm summary and implementation details

Vi. experimental evaluation

A. real-world task descriptions

B. real-world evaluation

C. simulated comparison

Vii. discussion and future work

Viii. acknowledgement