Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Reinforcement learning can enable robots to learn complex manipulation tasks
- System provides a “programming-free” approach for users to define new tasks
- System includes a framework for users to define tasks with image examples
- Reinforcement learning procedure learns the task autonomously without interventions
- Experimental results with a four-finger robotic hand learning tasks in the real world
Paper Content
I. introduction
- Robotic manipulation tasks are difficult to control
- Reinforcement learning (RL) is an appealing choice for such settings
- RL on real-world robotic platforms raises practical issues
- Alternative solutions such as transfer from simulation, imitation learning, or motion capture
- Effective real-world RL requires manual engineering, reward engineering, reset mechanisms, etc.
- Propose a robotic learning system that can learn to control multi-fingered robotic hands from raw visual observations
- System autonomously practices a sequence of sub-skills based on high-level milestone specifications
- Multi-camera visual observations used to localize and manipulate objects
- Policies learned end-to-end from pixels and no motion capture
- Simple physical instrumentation used to prevent object from falling out of reach
- System can learn dexterous behaviors by practicing for up to 48 hours autonomously
Ii. related work
- Prior work studied control of complex hands using various methods
- Majority of prior work assumed access to compact state representations or accurate simulators and object models
- Closer to the system described in the paper is prior work on learning visuomotor policies for dexterous manipulation
- Prior systems on RL for dexterous manipulation require assumptions on manually designed rewards or ground truth object state observations
- Task specification without manual reward engineering by using intermediate milestone examples
Method no hand engineered reward
- Multi-task vision involves using computer vision to complete multiple tasks.
- High-DoFs (degrees of freedom) are used to increase the accuracy of the tasks.
Ours
- Combination of building blocks enables learning image-based dexterous manipulation in the real world
- Most closely related robotic RL systems are R3L and MTRF
- Assumptions regarding lightweight instrumentation and vision-based autonomous learning most resemble R3L
- Our work tackles a more challenging setting than R3L
- Multitask RL formulation builds on ideas from MTRF
- Focus is on providing a framework for vision-based learning of object manipulation skills with high-dimensional hands
- Automated RL system runs continuously for 48 hours
- Lightweight physical instrumentation used (tethering object to prevent it from falling outside of grasping range)
- Visual perception and reward specification challenges addressed, avoiding manual reward engineering and motion capture
Iii. robotic platform and problem overview
- Robot platform consists of 4-finger, 16-DoF robot hand mounted on 7-DoF Sawyer robotic arm
- Action space is 22-dimensional, state space is 29-dimensional
- System designed to operate for 48 hours in contact-rich environments
- Two RGB image observations provided to robot via two low-cost web cameras
- Tasks consist of manipulation behaviors such as reaching, grasping, in-hand and mid-air reorienting, and inserting
- Supervision is to allow user to place robot and object in desired position and capture image “snapshots”
Iv. problem formalism and user assumptions
- Problem setting and supervision assumptions are formalized
- MDP defined by tuple (S, A, p dyn , ρ, γ, R)
- Standard RL assumes hand-engineered reward function
- Challenge is acute in dexterous manipulation setting
- Environment assumed to be episodic
- User supplies robot with set of sub-problems to practice
- Graph structure of cardinality K with M outcome images
- Directed edge (v, v ) indicates which sub-task is to be practiced next
- Optimize all sub-tasks in milestone graph simultaneously
- Learn set of K policies, reward function, and sub-task transition function
- Leverage user-provided milestone supervision and categorical labels
V. the avail system: autonomy via
- System presented to address problem: AVAIL
- System reframes RL training process as multitask problem
- System learns from sparse milestone examples provided by users
- System leverages user-provided milestones
- System functions by deriving reward functions, optimizing rewards, and determining which tasks to perform
A. visual multi-task policy and reward learning from user milestones
- Ability for robot to assign rewards to its own experience
- Relieves burden of manual reward engineering
- Trade-off of removing ability of designer to provide task information
- Leverage sparse milestone supervision to learn set of success classifiers
- Learn binary classifier over user-provided examples and agent’s own experience
- Classifier probability used as reward
- Learn joint encoders and raw image observations using DroQ approach
B. multi-task learning without oracles
- Agent must decide which task to attempt next based on current situation
- User provides labels to train task dynamics model
- Agent samples task from learned model and executes corresponding policy
- Task inference and task learning are separated
- Robot learns skills to autonomously practice hooking and unhooking
- Robot learns skills to enable successful insertion
- After 36 hours of training, success rate of 95% for hooking and 80% for insertion
C. algorithm summary and implementation details
- AVAIL performs supervised learning of the next task transitions
- AVAIL chooses the most probable task using an observed state
- AVAIL trains a set of separate policies for each of the K sets of example images
- Each policy is parameterized as a deep neural network and trained using the soft actor-critic algorithm
- Resampling the task is done every 100 steps
Vi. experimental evaluation
- Evaluated AVAIL’s ability to learn complex manipulation skills in the real world
- Evaluated AVAIL on three real world manipulation tasks
- Evaluated AVAIL in simulation for comparison with prior methods
A. real-world task descriptions
- Use a two-sided cleaning brush to scrape a plate
- Insert a cylindrical hose connector onto a peg connector
- Attach a hook to a handle
B. real-world evaluation
- Evaluated system by saving policies at regular intervals and evaluating performance
- Evaluation metric for each milestone is binary success measurement based on distance of hand and object to desired pose
- Real-world skill learning: robot able to successfully perform all milestones with > 80% success rate
- Real-world task graph learning: learned task graph outperforms hand-designed task graph in terms of sample efficiency
C. simulated comparison
- SAC (Soft Actor-Critic) is a state-of-the-art RL algorithm
- Forward Backward Controller provides two milestones for the task
- R3L interleaves training of the forward policy with a “perturbation controller”
- SAC fails to progress in the reset-free setting
- Forward Backward Controller is the only prior method that succeeds in making progress
Vii. discussion and future work
- Proposed a method for multi-task learning from high dimensional image observations
- Constructs a task graph from a modest number of user-provided milestone examples
- Results illustrate the benefits of milestone supervision
- Learned task graph results in faster convergence on real world robotic task compared to handcoded heuristic based task graph
Viii. acknowledgement
- Research project was partially funded by Office of Naval Research
- Computing resources donated by Microsoft Azure