Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Conformal prediction is a tool for uncertainty quantification
  • It produces valid prediction intervals with finite-sample guarantees
  • Self-supervised learning can be used to improve the quality of conformal regressors
  • Self-supervised pretext tasks can improve the adaptability of conformal intervals
  • We use self-supervised error as an additional feature to estimate nonconformity scores
  • We demonstrate the benefit of the additional information using synthetic and real data

Paper Content

Introduction

  • Machine Learning research is focused on minimizing a model’s predictive error on unseen data.
  • Conformal prediction provides finite-sample, frequentist guarantees on the marginal coverage of prediction intervals.
  • Locally adaptive conformal prediction aims to provide wider intervals for challenging samples and narrower intervals for easier samples.
  • Self-Supervised Conformal Prediction (SSCP) uses information from self-supervised tasks to improve prediction intervals.
  • SSCP does not impact the theoretical guarantees of conformal prediction.
  • SSCP is the first work to integrate and examine the benefit of self-supervised errors to improve conformal prediction.
  • Standard conformal prediction provides marginal coverage
  • Exact conditional coverage cannot be guaranteed in finite samples
  • Locally adaptive conformal prediction uses an independent model to induce adaptiveness
  • Self-supervised learning has been used to improve performance of downstream models

Background: conformal prediction

  • ICP and CRF are two methods that are fundamental to the proposed framework.
  • The supervised learning setting involves features X and labels Y.
  • The goal is to learn a prediction interval Ĉ that ensures Equation 1 holds.
  • Labeled data D labeled is used to train the model.
  • The label y i is a scalar.

Conformal residual fitting (crf)

  • ICP confidence intervals are constant for all instances
  • They do not reflect the difficulty of individual samples
  • CRF proposes to produce locally adaptive intervals
  • A normalized nonconformity function is used to enable locally adaptive intervals
  • A disjoint dataset is used to train the conformal normalization model

Method: self-supervised conformal prediction

  • Train predictive model to solve regression task
  • Train self-supervised model on top of predictive model
  • Train conformal normalizer to learn residuals of predictive model
  • Apply CRF using predictive, self-supervised and residual models

Data splits and assumptions

  • Predictive and self-supervised models are trained on a mixture of labeled and unlabeled data
  • Residual model should be trained on disjoint data from the predictive model
  • Calibration and testing data are exchangeable
  • Model performance is affected by divergence between data distributions

Conformal normalizer phase (s2)

  • The conformal normalization model is responsible for the adaptiveness of the prediction intervals.
  • Self-supervised error is used as an additional feature to the model.
  • Self-supervised error can be computed at both training and test time.
  • Random forest and neural network are used in the experiments.

Conformalization phase (s3)

  • Train predictive model f
  • Train self-supervised model f ss
  • Train residual model σ
  • Apply calibration procedure to obtain α-quantile nonconformity score
  • For new test example x, obtain adaptive and valid intervals
  • Summarize complete SSCP framework algorithmically in Algorithm 1

Other locally adaptive methods

  • SSCP framework focuses on CRF for locally adaptive intervals
  • CQR is a powerful alternative to CRF
  • Appendix C.3 describes adaption to integrate CQR into SSCP framework

Remark on sscp and coverage guarantees

  • Conformal prediction provides theoretical guarantees on the validity of coverage of the prediction intervals.
  • SSCP inherits these same guarantees.

Experiments

  • Synthetic example to show why adding self-supervised loss improves prediction intervals
  • Performance of SSCP on real-world datasets
  • Quality of prediction intervals assessed using commonly used metrics
  • Locally adaptive conformal prediction used as baseline

Synthetic demonstration

  • Synthetic dataset used to illustrate method
  • Self-supervision used to improve predictive model and conformal normalization model
  • Synthetic residuals generated from hypothetical predictive model
  • Inputs and residuals generated as a function of single, uniformly distributed latent dimension
  • Autoencoder used as self-supervised task
  • Comparing conformal prediction intervals with and without self-supervised feature
  • Self-supervised feature results in more adaptive intervals
  • Self-supervised loss better models heteroscedasticity in predictive model’s residuals

Real data

  • Self-supervision can be used to improve the quality of conformal intervals.
  • Self-supervision can be used with only labeled data, by ignoring the label.
  • Self-supervision with labeled and unlabeled data improves the width of prediction intervals.
  • The benefit of unlabeled data decreases as the proportion of labeled data increases.
  • Different pretext tasks may provide superior signal to the normalization model.

Insights

  • SSCP improves prediction interval width compared to CRF
  • SSCP helps most on samples with largest interval width
  • SSCP helps in sparser regions of PC1 space
  • SSCP improves conformal normalization model by ±14% and ±5% in Fig. 7 (a) and (b) respectively

Discussion

  • Improving the quality of prediction intervals is important for reliable uncertainty quantification
  • Self-supervised learning can be used to improve conformal prediction intervals
  • Self-supervision assists in more challenging and sparser regions
  • Future research should explore alternative self-supervised approaches, new conformal-aware self-supervised tasks, and the use of self-supervision with other data modalities

C additional experiments

  • Goal of experiment: Understand benefit of unlabeled data to SSCP
  • Analysis: Unlabeled data increases average interval width when labeled dataset is small
  • Goal: Assess value of alternative sources of signal
  • Analysis: Isolation forest and self-supervised signal provide best intervals
  • Goal: Examine VIME and AE as self-supervised tasks
  • Analysis: VIME provides greater performance improvements
  • Goal: Deep-dive to understand impact of self-supervised task
  • Analysis: Self-supervision helps to improve prediction intervals on most uncertain examples
  • Goal: Assess robustness of intervals
  • Analysis: SSCP has net gain over CRF across all datasets