Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposed a new method for calibrating predictors of heterogeneous treatment effects
- Introduced a data-efficient variant of calibration that avoids the need for hold-out calibration sets
- Established that proposed method achieves fast doubly-robust calibration rates
- Wrapping proposed method around any black-box learning algorithm provides strong calibration guarantees while preserving predictive performance
Paper Content
Introduction
- Estimation of causal effects is important for understanding interventions and informing policy
- Treatment effect heterogeneity can provide more insights than overall population effects
- Applications of HTEs include prioritizing treatment and individualizing treatment assignments
- CATE estimation is of great interest in statistics and data science
- CATE estimators build upon estimators of the conditional mean outcome and the probability of treatment given covariates
- CATE estimation can be challenging due to non-smooth, high-dimensional nuisance parameters
- Predictions from a given treatment effect predictor can still be useful for decision-making
- Theoretical guarantees for rational decision-making typically hinge on the predictor being a good approximation of the true CATE
- Calibration is a desirable property of a treatment effect predictor
- Calibration has been widely used to enhance prediction models for classification and regression
- Little research has been done on calibration of treatment effect predictors
- This paper proposes a nonparametric doubly-robust method for calibrating treatment effect predictors
Statistical setup
Notation and definitions
- Data unit O consists of three components: W, A, and Y
- W is a vector of baseline covariates
- A is a binary indicator of treatment
- Y is an outcome
- Dn is the observed dataset
- π0 is the propensity score
- µ0 is the potential outcome
- Higher values of Y1-Y0 are desirable
- τ0 is the true CATE
- γ0 is the conditional mean of the individual treatment effect
- Solution to isotonic regression problem is non-unique
- Solution follows Groeneboom and Lopuhaa (1993)
Measuring calibration and the calibration-distortion decomposition
- Various definitions of risk predictor calibration have been proposed
- Outline definition of calibration and rationale
- Best predictor of individual treatment effect is w → γ 0 (τ, w)
- Perfect calibration cannot be achieved in finite samples
- Calibration measure is 2 -expected calibration error
- Calibration measure plays role in mean squared error between treatment predictor and true CATE
- Calibration-distortion decomposition shows better-calibrated treatment effect predictors have lower mean-squared error
Calibrating predictors: desiderata and classical methods
- Calibration methods aim to find a function that makes a predictor more accurate
- Platt’s scaling is used for binary outcomes and is based on strong parametric assumptions
- Histogram binning partitions the sorted values of the predictor into a fixed number of bins
- Bayesian binning considers multiple binning models and their combinations
- Isotonic calibration learns the bins from data using isotonic regression
- Isotonic calibration satisfies a distribution-free calibration guarantee and is at least as predictive as the original predictor
Causal isotonic calibration
- Inspired by isotonic calibration, a doubly-robust calibration method for treatment effects is proposed, called causal isotonic calibration
- Takes a given predictor trained on some dataset and performs calibration using an independent (or hold-out) dataset
- Automatically learns uncalibrated regions of the given predictor
- Consolidates individual predictions within each region into a single value using a doubly-robust estimator of the ATE
- Introduces a novel data-efficient variant of calibration called crosscalibration
- Cross-fitted predictors are used and a single calibrated predictor is obtained using all available data
- Implemented using standard isotonic regression software
- Estimate χ 0 of χ 0 is obtained using E m
- Isotonic regression is used to find and refer to χ 0 (O) as a pseudo-outcome
- Calibrated predictor is given by θ n τ
- Sample splitting or cross-fitting is recommended to obtain pseudo-outcomes
- Algorithm 2 provides a means to fully utilize the entire dataset for both fitting an initial estimate of τ 0 and calibration
- Algorithm 3 is a computationally simpler variant of Algorithm 2
Sample theoretical properties
- Algorithm 1 and Algorithm 2 are presented for causal isotonic calibration
- Properties 1 and 2 are argued to be satisfied
- Data is split into a training dataset and a calibration dataset
- Conditions 1-5 are assumed
- Theorem 1 establishes the calibration rate of the calibrated predictor
- Theorem 2 states that the pointwise median preserves calibration
- Theorem 3 states that the mean squared error is not inflated much
Data-generating mechanisms
- Examined behavior of proposal under two data-generating mechanisms
- Scenario 1: binary outcome, 4 confounders, treatment interactions
- Scenario 2: continuous outcome, linear on covariates, 20 true confounders
- Propensity score follows logistic regression model
- Covariates independent and uniformly distributed on (-1, +1)
- Sample sizes of 1,000, 2,000, 5,000 and 10,000
Cate estimation
- Implemented GBRT, RF, GLMnet, GAM, and MARS for Scenario 1
- Implemented RF, GLMnet, and combination of variable screening with lasso regularization for Scenario 2
- Used R package sl3 for implementation of estimators
- Used causal isotonic cross-calibration for calibration
Performance metrics
- Compared performance of calibrated and uncalibrated versions of a causal isotonic calibrator
- Used 3 metrics to compare performance: calibration measure, mean squared error, and calibration bias within bins
- Estimated metric empirically using an independent sample of size 10,000
- Averaged metric estimates across 500 simulations
Simulation results
- GLMnet, RF, GAM, and MARS were well-calibrated and did not benefit from calibration
- GBRT benefited from calibration, reducing calibration error and improving mean squared error
- RF and GBRT with GLMnet screening were poorly calibrated and benefited from calibration
- Cross-calibration improved mean squared error and calibration more than conventional calibration
Conclusion
- Proposed causal isotonic calibration as a novel method to calibrate treatment effect predictors
- Established that the pointwise median of calibrated predictors is also calibrated
- Developed a data-efficient variant of causal isotonic calibration using cross-fitted predictors
- Calibration error vanishes at a fast rate of -2/3 with little or no loss in predictive power
- Directly calibrate HTE predictors without requiring trial data or parametric assumptions
- Potential applications of method include data-driven decision-making with strong robustness guarantees
- Limitations: need to estimate µ 0 or π 0 sufficiently well
- Found that calibration generally preserves predictive power, and in some cases improves accuracy
- Found that cross-calibration substantially improved mean squared error
- Theoretical arguments can be adapted to provide guarantees for isotonic calibration in regression and classification problems
- Implementation of algorithms in R provided in Github package causalCalibration