Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Conformal prediction is a tool for uncertainty quantification
- It produces valid prediction intervals with finite-sample guarantees
- Self-supervised learning can be used to improve the quality of conformal regressors
- Self-supervised pretext tasks can improve the adaptability of conformal intervals
- We use self-supervised error as an additional feature to estimate nonconformity scores
- We demonstrate the benefit of the additional information using synthetic and real data
Paper Content
Introduction
- Machine Learning research is focused on minimizing a model’s predictive error on unseen data.
- Conformal prediction provides finite-sample, frequentist guarantees on the marginal coverage of prediction intervals.
- Locally adaptive conformal prediction aims to provide wider intervals for challenging samples and narrower intervals for easier samples.
- Self-Supervised Conformal Prediction (SSCP) uses information from self-supervised tasks to improve prediction intervals.
- SSCP does not impact the theoretical guarantees of conformal prediction.
- SSCP is the first work to integrate and examine the benefit of self-supervised errors to improve conformal prediction.
Related work
- Standard conformal prediction provides marginal coverage
- Exact conditional coverage cannot be guaranteed in finite samples
- Locally adaptive conformal prediction uses an independent model to induce adaptiveness
- Self-supervised learning has been used to improve performance of downstream models
Background: conformal prediction
- ICP and CRF are two methods that are fundamental to the proposed framework.
- The supervised learning setting involves features X and labels Y.
- The goal is to learn a prediction interval Ĉ that ensures Equation 1 holds.
- Labeled data D labeled is used to train the model.
- The label y i is a scalar.
Conformal residual fitting (crf)
- ICP confidence intervals are constant for all instances
- They do not reflect the difficulty of individual samples
- CRF proposes to produce locally adaptive intervals
- A normalized nonconformity function is used to enable locally adaptive intervals
- A disjoint dataset is used to train the conformal normalization model
Method: self-supervised conformal prediction
- Train predictive model to solve regression task
- Train self-supervised model on top of predictive model
- Train conformal normalizer to learn residuals of predictive model
- Apply CRF using predictive, self-supervised and residual models
Data splits and assumptions
- Predictive and self-supervised models are trained on a mixture of labeled and unlabeled data
- Residual model should be trained on disjoint data from the predictive model
- Calibration and testing data are exchangeable
- Model performance is affected by divergence between data distributions
Conformal normalizer phase (s2)
- The conformal normalization model is responsible for the adaptiveness of the prediction intervals.
- Self-supervised error is used as an additional feature to the model.
- Self-supervised error can be computed at both training and test time.
- Random forest and neural network are used in the experiments.
Conformalization phase (s3)
- Train predictive model f
- Train self-supervised model f ss
- Train residual model σ
- Apply calibration procedure to obtain α-quantile nonconformity score
- For new test example x, obtain adaptive and valid intervals
- Summarize complete SSCP framework algorithmically in Algorithm 1
Other locally adaptive methods
- SSCP framework focuses on CRF for locally adaptive intervals
- CQR is a powerful alternative to CRF
- Appendix C.3 describes adaption to integrate CQR into SSCP framework
Remark on sscp and coverage guarantees
- Conformal prediction provides theoretical guarantees on the validity of coverage of the prediction intervals.
- SSCP inherits these same guarantees.
Experiments
- Synthetic example to show why adding self-supervised loss improves prediction intervals
- Performance of SSCP on real-world datasets
- Quality of prediction intervals assessed using commonly used metrics
- Locally adaptive conformal prediction used as baseline
Synthetic demonstration
- Synthetic dataset used to illustrate method
- Self-supervision used to improve predictive model and conformal normalization model
- Synthetic residuals generated from hypothetical predictive model
- Inputs and residuals generated as a function of single, uniformly distributed latent dimension
- Autoencoder used as self-supervised task
- Comparing conformal prediction intervals with and without self-supervised feature
- Self-supervised feature results in more adaptive intervals
- Self-supervised loss better models heteroscedasticity in predictive model’s residuals
Real data
- Self-supervision can be used to improve the quality of conformal intervals.
- Self-supervision can be used with only labeled data, by ignoring the label.
- Self-supervision with labeled and unlabeled data improves the width of prediction intervals.
- The benefit of unlabeled data decreases as the proportion of labeled data increases.
- Different pretext tasks may provide superior signal to the normalization model.
Insights
- SSCP improves prediction interval width compared to CRF
- SSCP helps most on samples with largest interval width
- SSCP helps in sparser regions of PC1 space
- SSCP improves conformal normalization model by ±14% and ±5% in Fig. 7 (a) and (b) respectively
Discussion
- Improving the quality of prediction intervals is important for reliable uncertainty quantification
- Self-supervised learning can be used to improve conformal prediction intervals
- Self-supervision assists in more challenging and sparser regions
- Future research should explore alternative self-supervised approaches, new conformal-aware self-supervised tasks, and the use of self-supervision with other data modalities
C additional experiments
- Goal of experiment: Understand benefit of unlabeled data to SSCP
- Analysis: Unlabeled data increases average interval width when labeled dataset is small
- Goal: Assess value of alternative sources of signal
- Analysis: Isolation forest and self-supervised signal provide best intervals
- Goal: Examine VIME and AE as self-supervised tasks
- Analysis: VIME provides greater performance improvements
- Goal: Deep-dive to understand impact of self-supervised task
- Analysis: Self-supervision helps to improve prediction intervals on most uncertain examples
- Goal: Assess robustness of intervals
- Analysis: SSCP has net gain over CRF across all datasets