Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Proposed a new method for calibrating predictors of heterogeneous treatment effects
Introduced a data-efficient variant of calibration that avoids the need for hold-out calibration sets
Established that proposed method achieves fast doubly-robust calibration rates
Wrapping proposed method around any black-box learning algorithm provides strong calibration guarantees while preserving predictive performance

Paper Content

Introduction

Estimation of causal effects is important for understanding interventions and informing policy
Treatment effect heterogeneity can provide more insights than overall population effects
Applications of HTEs include prioritizing treatment and individualizing treatment assignments
CATE estimation is of great interest in statistics and data science
CATE estimators build upon estimators of the conditional mean outcome and the probability of treatment given covariates
CATE estimation can be challenging due to non-smooth, high-dimensional nuisance parameters
Predictions from a given treatment effect predictor can still be useful for decision-making
Theoretical guarantees for rational decision-making typically hinge on the predictor being a good approximation of the true CATE
Calibration is a desirable property of a treatment effect predictor
Calibration has been widely used to enhance prediction models for classification and regression
Little research has been done on calibration of treatment effect predictors
This paper proposes a nonparametric doubly-robust method for calibrating treatment effect predictors

Statistical setup

Notation and definitions

Data unit O consists of three components: W, A, and Y
W is a vector of baseline covariates
A is a binary indicator of treatment
Y is an outcome
Dn is the observed dataset
π0 is the propensity score
µ0 is the potential outcome
Higher values of Y1-Y0 are desirable
τ0 is the true CATE
γ0 is the conditional mean of the individual treatment effect
Solution to isotonic regression problem is non-unique
Solution follows Groeneboom and Lopuhaa (1993)

Measuring calibration and the calibration-distortion decomposition

Various definitions of risk predictor calibration have been proposed
Outline definition of calibration and rationale
Best predictor of individual treatment effect is w → γ 0 (τ, w)
Perfect calibration cannot be achieved in finite samples
Calibration measure is 2 -expected calibration error
Calibration measure plays role in mean squared error between treatment predictor and true CATE
Calibration-distortion decomposition shows better-calibrated treatment effect predictors have lower mean-squared error

Calibrating predictors: desiderata and classical methods

Calibration methods aim to find a function that makes a predictor more accurate
Platt’s scaling is used for binary outcomes and is based on strong parametric assumptions
Histogram binning partitions the sorted values of the predictor into a fixed number of bins
Bayesian binning considers multiple binning models and their combinations
Isotonic calibration learns the bins from data using isotonic regression
Isotonic calibration satisfies a distribution-free calibration guarantee and is at least as predictive as the original predictor

Causal isotonic calibration

Inspired by isotonic calibration, a doubly-robust calibration method for treatment effects is proposed, called causal isotonic calibration
Takes a given predictor trained on some dataset and performs calibration using an independent (or hold-out) dataset
Automatically learns uncalibrated regions of the given predictor
Consolidates individual predictions within each region into a single value using a doubly-robust estimator of the ATE
Introduces a novel data-efficient variant of calibration called crosscalibration
Cross-fitted predictors are used and a single calibrated predictor is obtained using all available data
Implemented using standard isotonic regression software
Estimate χ 0 of χ 0 is obtained using E m
Isotonic regression is used to find and refer to χ 0 (O) as a pseudo-outcome
Calibrated predictor is given by θ n τ
Sample splitting or cross-fitting is recommended to obtain pseudo-outcomes
Algorithm 2 provides a means to fully utilize the entire dataset for both fitting an initial estimate of τ 0 and calibration
Algorithm 3 is a computationally simpler variant of Algorithm 2

Sample theoretical properties

Algorithm 1 and Algorithm 2 are presented for causal isotonic calibration
Properties 1 and 2 are argued to be satisfied
Data is split into a training dataset and a calibration dataset
Conditions 1-5 are assumed
Theorem 1 establishes the calibration rate of the calibrated predictor
Theorem 2 states that the pointwise median preserves calibration
Theorem 3 states that the mean squared error is not inflated much

Data-generating mechanisms

Examined behavior of proposal under two data-generating mechanisms
Scenario 1: binary outcome, 4 confounders, treatment interactions
Scenario 2: continuous outcome, linear on covariates, 20 true confounders
Propensity score follows logistic regression model
Covariates independent and uniformly distributed on (-1, +1)
Sample sizes of 1,000, 2,000, 5,000 and 10,000

Cate estimation

Implemented GBRT, RF, GLMnet, GAM, and MARS for Scenario 1
Implemented RF, GLMnet, and combination of variable screening with lasso regularization for Scenario 2
Used R package sl3 for implementation of estimators
Used causal isotonic cross-calibration for calibration

Performance metrics

Compared performance of calibrated and uncalibrated versions of a causal isotonic calibrator
Used 3 metrics to compare performance: calibration measure, mean squared error, and calibration bias within bins
Estimated metric empirically using an independent sample of size 10,000
Averaged metric estimates across 500 simulations

Simulation results

GLMnet, RF, GAM, and MARS were well-calibrated and did not benefit from calibration
GBRT benefited from calibration, reducing calibration error and improving mean squared error
RF and GBRT with GLMnet screening were poorly calibrated and benefited from calibration
Cross-calibration improved mean squared error and calibration more than conventional calibration

Conclusion

Proposed causal isotonic calibration as a novel method to calibrate treatment effect predictors
Established that the pointwise median of calibrated predictors is also calibrated
Developed a data-efficient variant of causal isotonic calibration using cross-fitted predictors
Calibration error vanishes at a fast rate of -2/3 with little or no loss in predictive power
Directly calibrate HTE predictors without requiring trial data or parametric assumptions
Potential applications of method include data-driven decision-making with strong robustness guarantees
Limitations: need to estimate µ 0 or π 0 sufficiently well
Found that calibration generally preserves predictive power, and in some cases improves accuracy
Found that cross-calibration substantially improved mean squared error
Theoretical arguments can be adapted to provide guarantees for isotonic calibration in regression and classification problems
Implementation of algorithms in R provided in Github package causalCalibration

Link to paper#

Abstract#

Paper Content#

Introduction#

Statistical setup#

Notation and definitions#

Measuring calibration and the calibration-distortion decomposition#

Calibrating predictors: desiderata and classical methods#

Causal isotonic calibration#

Sample theoretical properties#

Data-generating mechanisms#

Cate estimation#

Performance metrics#

Simulation results#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Statistical setup

Notation and definitions

Measuring calibration and the calibration-distortion decomposition

Calibrating predictors: desiderata and classical methods

Causal isotonic calibration

Sample theoretical properties

Data-generating mechanisms

Cate estimation

Performance metrics

Simulation results

Conclusion