Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Reinforcement learning can enable robots to learn complex manipulation tasks
  • System provides a “programming-free” approach for users to define new tasks
  • System includes a framework for users to define tasks with image examples
  • Reinforcement learning procedure learns the task autonomously without interventions
  • Experimental results with a four-finger robotic hand learning tasks in the real world

Paper Content

I. introduction

  • Robotic manipulation tasks are difficult to control
  • Reinforcement learning (RL) is an appealing choice for such settings
  • RL on real-world robotic platforms raises practical issues
  • Alternative solutions such as transfer from simulation, imitation learning, or motion capture
  • Effective real-world RL requires manual engineering, reward engineering, reset mechanisms, etc.
  • Propose a robotic learning system that can learn to control multi-fingered robotic hands from raw visual observations
  • System autonomously practices a sequence of sub-skills based on high-level milestone specifications
  • Multi-camera visual observations used to localize and manipulate objects
  • Policies learned end-to-end from pixels and no motion capture
  • Simple physical instrumentation used to prevent object from falling out of reach
  • System can learn dexterous behaviors by practicing for up to 48 hours autonomously
  • Prior work studied control of complex hands using various methods
  • Majority of prior work assumed access to compact state representations or accurate simulators and object models
  • Closer to the system described in the paper is prior work on learning visuomotor policies for dexterous manipulation
  • Prior systems on RL for dexterous manipulation require assumptions on manually designed rewards or ground truth object state observations
  • Task specification without manual reward engineering by using intermediate milestone examples

Method no hand engineered reward

  • Multi-task vision involves using computer vision to complete multiple tasks.
  • High-DoFs (degrees of freedom) are used to increase the accuracy of the tasks.

Ours

  • Combination of building blocks enables learning image-based dexterous manipulation in the real world
  • Most closely related robotic RL systems are R3L and MTRF
  • Assumptions regarding lightweight instrumentation and vision-based autonomous learning most resemble R3L
  • Our work tackles a more challenging setting than R3L
  • Multitask RL formulation builds on ideas from MTRF
  • Focus is on providing a framework for vision-based learning of object manipulation skills with high-dimensional hands
  • Automated RL system runs continuously for 48 hours
  • Lightweight physical instrumentation used (tethering object to prevent it from falling outside of grasping range)
  • Visual perception and reward specification challenges addressed, avoiding manual reward engineering and motion capture

Iii. robotic platform and problem overview

  • Robot platform consists of 4-finger, 16-DoF robot hand mounted on 7-DoF Sawyer robotic arm
  • Action space is 22-dimensional, state space is 29-dimensional
  • System designed to operate for 48 hours in contact-rich environments
  • Two RGB image observations provided to robot via two low-cost web cameras
  • Tasks consist of manipulation behaviors such as reaching, grasping, in-hand and mid-air reorienting, and inserting
  • Supervision is to allow user to place robot and object in desired position and capture image “snapshots”

Iv. problem formalism and user assumptions

  • Problem setting and supervision assumptions are formalized
  • MDP defined by tuple (S, A, p dyn , ρ, γ, R)
  • Standard RL assumes hand-engineered reward function
  • Challenge is acute in dexterous manipulation setting
  • Environment assumed to be episodic
  • User supplies robot with set of sub-problems to practice
  • Graph structure of cardinality K with M outcome images
  • Directed edge (v, v ) indicates which sub-task is to be practiced next
  • Optimize all sub-tasks in milestone graph simultaneously
  • Learn set of K policies, reward function, and sub-task transition function
  • Leverage user-provided milestone supervision and categorical labels

V. the avail system: autonomy via

  • System presented to address problem: AVAIL
  • System reframes RL training process as multitask problem
  • System learns from sparse milestone examples provided by users
  • System leverages user-provided milestones
  • System functions by deriving reward functions, optimizing rewards, and determining which tasks to perform

A. visual multi-task policy and reward learning from user milestones

  • Ability for robot to assign rewards to its own experience
  • Relieves burden of manual reward engineering
  • Trade-off of removing ability of designer to provide task information
  • Leverage sparse milestone supervision to learn set of success classifiers
  • Learn binary classifier over user-provided examples and agent’s own experience
  • Classifier probability used as reward
  • Learn joint encoders and raw image observations using DroQ approach

B. multi-task learning without oracles

  • Agent must decide which task to attempt next based on current situation
  • User provides labels to train task dynamics model
  • Agent samples task from learned model and executes corresponding policy
  • Task inference and task learning are separated
  • Robot learns skills to autonomously practice hooking and unhooking
  • Robot learns skills to enable successful insertion
  • After 36 hours of training, success rate of 95% for hooking and 80% for insertion

C. algorithm summary and implementation details

  • AVAIL performs supervised learning of the next task transitions
  • AVAIL chooses the most probable task using an observed state
  • AVAIL trains a set of separate policies for each of the K sets of example images
  • Each policy is parameterized as a deep neural network and trained using the soft actor-critic algorithm
  • Resampling the task is done every 100 steps

Vi. experimental evaluation

  • Evaluated AVAIL’s ability to learn complex manipulation skills in the real world
  • Evaluated AVAIL on three real world manipulation tasks
  • Evaluated AVAIL in simulation for comparison with prior methods

A. real-world task descriptions

  • Use a two-sided cleaning brush to scrape a plate
  • Insert a cylindrical hose connector onto a peg connector
  • Attach a hook to a handle

B. real-world evaluation

  • Evaluated system by saving policies at regular intervals and evaluating performance
  • Evaluation metric for each milestone is binary success measurement based on distance of hand and object to desired pose
  • Real-world skill learning: robot able to successfully perform all milestones with > 80% success rate
  • Real-world task graph learning: learned task graph outperforms hand-designed task graph in terms of sample efficiency

C. simulated comparison

  • SAC (Soft Actor-Critic) is a state-of-the-art RL algorithm
  • Forward Backward Controller provides two milestones for the task
  • R3L interleaves training of the forward policy with a “perturbation controller”
  • SAC fails to progress in the reset-free setting
  • Forward Backward Controller is the only prior method that succeeds in making progress

Vii. discussion and future work

  • Proposed a method for multi-task learning from high dimensional image observations
  • Constructs a task graph from a modest number of user-provided milestone examples
  • Results illustrate the benefits of milestone supervision
  • Learned task graph results in faster convergence on real world robotic task compared to handcoded heuristic based task graph

Viii. acknowledgement

  • Research project was partially funded by Office of Naval Research
  • Computing resources donated by Microsoft Azure