Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Predictive Interval (PI) given by Conformal Prediction (CP) may not reflect the uncertainty of a given model.
We propose using a Quantile Regression Forest (QRF) to learn the distribution of nonconformity scores and utilizing the QRF’s weights to assign more importance to samples with residuals similar to the test point.
This approach results in PI lengths that are more aligned with the model’s uncertainty.
Our approach enjoys an assumption-free finite sample marginal and training-conditional coverage.
Experiments on simulated and real-world data demonstrate significant improvements compared to existing methods.

Random Forest Localizer is used to construct adaptive PI that depends on the test point X n+1
RF algorithm partitions the input space by recursively splitting the data
Weights of each calibration sample for X n+1 are determined by the number of times it appears in the leaves of the trees where X n+1 falls
RF is grown as an ensemble of k trees based on random node and split point selection
Random Forests can be used to estimate more complex quantities
Quantile Regression Forests use the same weights as Random Forests to approximate the c.d.f F (y|x)
Localized Random Forest is used to approximate the estimated residuals V |X = x
PI is calibrated using the Localized Conformal Prediction (LCP) framework
LCP framework is used to select an appropriate level α to the quantile used in the PI to ensure marginal coverage at level 1 − α

LCP framework of (Guan, 2022) with Random Forest localizer is described
Calibration approach guarantees training-conditional coverage
Weights of RF are used to improve LCP calibration process
Proofs of theorems and lemmas are in the appendix
Lemma 4.1 shows how to achieve marginal coverage by selecting level α of the quantile of the localizer
Theorem 4.2 shows that the resulting PI has marginal coverage
Lemma 4.3 describes an algorithm to compute the largest accepted value v

Random Forest Localizer offers faster computation of PIs and more adaptive PIs than traditional kernel-based localizers
Weights of Random Forest Localizer are sparse
We can group similar observations together before applying calibration steps
We can view weights of Random Forest as a transition matrix or weighted adjacency matrix
We can group observations that are connected to each other and separate observations that are not connected
We can apply calibration steps separately on each group
We can regroup calibration observations by (non-overlapping) communities using the weights
We can get marginal/PAC coverage by applying calibration step conditionally on the groups

LCP-RF is a computer science paper that studies conditional coverage
Assumptions 5.1-5.3 are necessary to get uniform convergence of the RF estimator
Assumption 5.2 allows for control of the approximation error of the RF estimator
Assumption 5.3 means that the cells should contain a sufficiently large number of points
Theorem 5.4 states that the selected α(v) when V n+1 = v given by the LCP-RF converges to 1 − α

Evaluated performance of 3 proposed methods against competitors
Used original implementations of SLCP and LCP
Tested on simulated data and 4 real-world datasets
Used mean and quantile scores to measure nonconformity
Used Random Forest as mean estimate
Compared PI of each method to oracle PI
Our methods outperformed competitors in terms of uncertainty fidelity and adaptiveness of lengths
SPLIT-G improved PI of split-CP