Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Testing conditions can affect the reliability of black-box learned components in robot autonomy.
- Coping with out-of-distribution (OOD) data is an important challenge for trustworthy learning-enabled open-world autonomy.
- This paper aims to demystify OOD data and its associated challenges in the context of data-driven robotic systems.
- Reasoning about the overall system-level competence of a robot in OOD conditions is important.
Paper Content
I. introduction
- Machine learning systems are being used in robot autonomy stacks.
- ML models are used to estimate and forecast the state of the environment.
- ML models can behave unreliably on data that is dissimilar from the training data.
- Coping with out-of-distribution inputs is a key challenge for reliable and safe open-world autonomy.
Ii. running examples
- Autonomous Drone Delivery Service uses ML in its autonomy stack
- Robotic Manipulators Assisting in the Home use RL in a controlled environment
- Autonomous Drone Delivery Service must detect and manage OOD inputs
- Robotic Manipulators Assisting in the Home must perform reliably in OOD test environments
Iii. what makes data out-of-distribution?
- ML pipelines produce models that generalize well to test data
- When models fail to generalize, it is attributed to “OOD data”
- OOD data is data from a different distribution than the training data
- Distributional shifts occur when test data is from a different distribution than the training data
- Functional uncertainty occurs when the model is uncertain of its predictions, even when the test data is from the same distribution as the training data
Iv. trends in ood in machine learning
- OOD problem is an open challenge in ML community
- State-of-the-art models are sensitive to distributional shifts
- Classical and core formulations and techniques from ML literature guide approach to tackling OOD challenge
A. coping with distributional shift
- Standard ML techniques assume P test = P train
- Major line of ML research aims to relax this assumption
- Domain generalization considers capacity of model trained on P train to generalize to P test
- Research direction aims to improve distributional robustness
- Complementary research direction targets root cause of poor generalization from a causal inference perspective
- Domain adaptation leverages training dataset and test inputs to optimize model on P test
- Domain adaptation often yields drastic performance improvements with simple algorithms
B. assessing functional uncertainty
- Domain adaptation and generalization focus on methods to select or improve a learned model
- Anomaly detection considers predicting if an individual input is dissimilar to training data
- Predictive uncertainty measures confidence in model predictions
- Calibration algorithms and design choices encourage high predictive uncertainty on anomalous inputs
- Bayesian ML allows us to quantify epistemic uncertainty by incorporating prior beliefs
C. evaluation
- Researchers have developed benchmark datasets to evaluate OOD performance
- OOD test sets can include synthetic corruptions and naturally occurring distribution shifts
- Recent datasets emphasize tasks relevant to robotics
- Datasets provide an intuitive foothold to develop algorithms to isolate reliability problems rooted in OOD data
V. open challenges for ood in robotics
- Robotics is focused on building systems that work in the real world.
- Reasoning about the reliability and competence of ML-enabled autonomous systems when they use learned models in a feedback loop over time.
- System-level perspectives present unique challenges for the robotics community related to detecting, responding to, and improving OOD closed-loop performance.
- Three different timescales of data-driven robotic systems with distinct OOD challenges.
- Open research questions towards autonomous systems that leverage ML while being robust to OOD conditions.
A. real-time ml-enabled decision making
- Need to reason about downstream impact of OOD inputs on decision-making system in real-time
- Need to construct safeguards to ensure inference errors don’t lead to system failure
- Need to monitor competence of full decision-making stack on individual inputs at test time
Rq 2 (ood aware decision making). can we design decision-making systems compatible with runtime monitors robust to high functional uncertainty?
- Robot must choose an action even if runtime monitors suggest model is operating OOD
- Design systems to assess and account for model uncertainties
- Fallback strategies may need to rely on redundancy or alternate sources of information
- Runtime monitors can flag when model outputs are unreliable
B. episodic closed-loop interaction
- Learning-enabled robots actively interact with their environment to perform tasks
- Reliable robotic systems should reason about the influence of OOD conditions on the closed-loop decision-making system
- Temporally Correlated OOD events should be accounted for
- Mitigating Distributional Shifts should be considered
- Externally shifting conditions can degrade the perception system
- Exploiting the drone’s ability to control shifts is important
- Quickly adapting to shifted or evolving conditions is necessary
C. data lifecycle
- Data collected during operation can be used to improve system performance
- Data collected during different episodes of robot execution can be grouped into different operational domains
- Data collected during operation should be used to efficiently improve models
- Data collected during operation can be used to mitigate influence of OOD conditions
- Training data should match test conditions
- Need to increase diversity of data
- Need to efficiently collect and label data
Vi. conclusion
- Robotics requires a system-level perspective on the OOD problem.
- Investigate how OOD data impacts the reliability of the autonomy stack.