Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Testing conditions can affect the reliability of black-box learned components in robot autonomy.
  • Coping with out-of-distribution (OOD) data is an important challenge for trustworthy learning-enabled open-world autonomy.
  • This paper aims to demystify OOD data and its associated challenges in the context of data-driven robotic systems.
  • Reasoning about the overall system-level competence of a robot in OOD conditions is important.

Paper Content

I. introduction

  • Machine learning systems are being used in robot autonomy stacks.
  • ML models are used to estimate and forecast the state of the environment.
  • ML models can behave unreliably on data that is dissimilar from the training data.
  • Coping with out-of-distribution inputs is a key challenge for reliable and safe open-world autonomy.

Ii. running examples

  • Autonomous Drone Delivery Service uses ML in its autonomy stack
  • Robotic Manipulators Assisting in the Home use RL in a controlled environment
  • Autonomous Drone Delivery Service must detect and manage OOD inputs
  • Robotic Manipulators Assisting in the Home must perform reliably in OOD test environments

Iii. what makes data out-of-distribution?

  • ML pipelines produce models that generalize well to test data
  • When models fail to generalize, it is attributed to “OOD data”
  • OOD data is data from a different distribution than the training data
  • Distributional shifts occur when test data is from a different distribution than the training data
  • Functional uncertainty occurs when the model is uncertain of its predictions, even when the test data is from the same distribution as the training data
  • OOD problem is an open challenge in ML community
  • State-of-the-art models are sensitive to distributional shifts
  • Classical and core formulations and techniques from ML literature guide approach to tackling OOD challenge

A. coping with distributional shift

  • Standard ML techniques assume P test = P train
  • Major line of ML research aims to relax this assumption
  • Domain generalization considers capacity of model trained on P train to generalize to P test
  • Research direction aims to improve distributional robustness
  • Complementary research direction targets root cause of poor generalization from a causal inference perspective
  • Domain adaptation leverages training dataset and test inputs to optimize model on P test
  • Domain adaptation often yields drastic performance improvements with simple algorithms

B. assessing functional uncertainty

  • Domain adaptation and generalization focus on methods to select or improve a learned model
  • Anomaly detection considers predicting if an individual input is dissimilar to training data
  • Predictive uncertainty measures confidence in model predictions
  • Calibration algorithms and design choices encourage high predictive uncertainty on anomalous inputs
  • Bayesian ML allows us to quantify epistemic uncertainty by incorporating prior beliefs

C. evaluation

  • Researchers have developed benchmark datasets to evaluate OOD performance
  • OOD test sets can include synthetic corruptions and naturally occurring distribution shifts
  • Recent datasets emphasize tasks relevant to robotics
  • Datasets provide an intuitive foothold to develop algorithms to isolate reliability problems rooted in OOD data

V. open challenges for ood in robotics

  • Robotics is focused on building systems that work in the real world.
  • Reasoning about the reliability and competence of ML-enabled autonomous systems when they use learned models in a feedback loop over time.
  • System-level perspectives present unique challenges for the robotics community related to detecting, responding to, and improving OOD closed-loop performance.
  • Three different timescales of data-driven robotic systems with distinct OOD challenges.
  • Open research questions towards autonomous systems that leverage ML while being robust to OOD conditions.

A. real-time ml-enabled decision making

  • Need to reason about downstream impact of OOD inputs on decision-making system in real-time
  • Need to construct safeguards to ensure inference errors don’t lead to system failure
  • Need to monitor competence of full decision-making stack on individual inputs at test time

Rq 2 (ood aware decision making). can we design decision-making systems compatible with runtime monitors robust to high functional uncertainty?

  • Robot must choose an action even if runtime monitors suggest model is operating OOD
  • Design systems to assess and account for model uncertainties
  • Fallback strategies may need to rely on redundancy or alternate sources of information
  • Runtime monitors can flag when model outputs are unreliable

B. episodic closed-loop interaction

  • Learning-enabled robots actively interact with their environment to perform tasks
  • Reliable robotic systems should reason about the influence of OOD conditions on the closed-loop decision-making system
  • Temporally Correlated OOD events should be accounted for
  • Mitigating Distributional Shifts should be considered
  • Externally shifting conditions can degrade the perception system
  • Exploiting the drone’s ability to control shifts is important
  • Quickly adapting to shifted or evolving conditions is necessary

C. data lifecycle

  • Data collected during operation can be used to improve system performance
  • Data collected during different episodes of robot execution can be grouped into different operational domains
  • Data collected during operation should be used to efficiently improve models
  • Data collected during operation can be used to mitigate influence of OOD conditions
  • Training data should match test conditions
  • Need to increase diversity of data
  • Need to efficiently collect and label data

Vi. conclusion

  • Robotics requires a system-level perspective on the OOD problem.
  • Investigate how OOD data impacts the reliability of the autonomy stack.