Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Unit selection problem aims to identify objects that exhibit desired behavior when subjected to stimuli
  • Existing work focuses on bounding a specific class of objective functions
  • Proposed algorithm for finding optimal units given a broad class of causal objective functions and a fully specified structural causal model
  • Unit selection under this class of objective functions is $\text{NP}^\text{PP}$-complete
  • Treewidth-based complexity bounds on proposed algorithm

Paper Content

Introduction

  • Theory of causality based on two parallel hierarchies: information and reasoning
  • Three levels of reasoning: associational, interventional and counterfactual
  • Knowledge encoded as associational, causal and functional models
  • Unit selection problem: selecting customers to target with an encouragement offer
  • Four types of customers: responders, always-takers, always-deniers, contrarians
  • Benefit function to score customers and identify most promising ones
  • Contrast with classical loss functions
  • Structured units: decisions, policies, people, situations, regions, activities
  • Fully specified SCM to obtain point values for any causal objective function
  • Computational problem of finding units that optimize causal objective functions
  • Exact algorithm to solve unit optimization problem: Reverse-MAP
  • Complexity of algorithm characterized by treewidth

Counterfactual queries on structural causal models

  • Structural causal models (SCMs) are used to define the unit selection problem

Causal objective functions and unit selection

  • Causal objective functions involve observational, interventional or counterfactual probabilities
  • Goal is to find objects (units) that optimize the function
  • Linear combination of counterfactual probabilities
  • Unit variables are exogenous in the SCM

The complexity of unit selection

  • Unit selection is NP-PP-complete for the class of causal objective functions given in Equation (1).
  • Unit selection is NP-complete when unit variables correspond to all exogenous variables in the SCM.
  • We can evaluate the objective L(u) by evaluating a single observational probability involving unit variables U.
  • We can optimize the objective function L(u) on an SCM G by computing the instantiation argmax u Pr (y, w|x, v, e, u) on an objective model G.
  • D-Reverse-MAP is NP PP -complete.
  • D-Reverse-MAP is NP-complete if its target variables are all the SCM root variables.
  • Unit selection is NP-complete when the unit variables are all the SCM exogenous (root) variables.

Unit selection using variable elimination

  • Reduction from unit selection on SCM to Reverse-MAP on objective model
  • Variable elimination algorithm for Reverse-MAP to solve unit selection
  • Analyzed complexity of method and compared to Reverse-MAP on SCM

Reverse-map using variable elimination

  • VE algorithm for MAP uses factors to map variables to non-negative numbers
  • SCM distribution is given by multiplying all factors
  • MAP probability is given by maximizing out target variables from a factor
  • Naive evaluation of MAP probability has complexity of O(n exp(n))
  • VE algorithm for MAP has complexity of O(n exp(w)) where w is the width of the used elimination order
  • VE algorithm for Reverse-MAP runs two passes of elimination
  • First pass sums out variables under evidence e1, e2
  • Second pass sums out variables under evidence e2
  • Divide factors from first pass by factors from second pass to obtain MAP probability
  • Complexity of Reverse-MAP VE is O(n exp(w))

Bounding the complexity of unit selection using variable elimination

  • RMAP VE is expected to be more expensive on an objective model than an SCM
  • Treewidth is used to analyze elimination algorithms
  • An elimination order for an objective model can be constructed from an SCM
  • Theorem 14 provides a bound on the width of an elimination order for an objective model
  • Corollary 15 states that the treewidth of an objective model is less than or equal to 3 times the treewidth of an SCM
  • U-constrained elimination orders must place the mixture variable before unit variables
  • Theorem 17 provides a bound on the U-constrained treewidth of an objective model
  • Corollary 18 states that the U-constrained treewidth of an objective model is less than or equal to the maximum of 3 times the treewidth of an SCM and the number of unit variables
  • The bound on U-constrained treewidth can be tighter depending on the objective function properties
  • An experiment was conducted to compare the complexities of MAP VE, RMAP VE and a bruteforce method
  • The gap between the complexities of MAP VE and RMAP VE narrows as the size of the problem grows
  • The gap between the complexities of RMAP VE and the bruteforce method grows

Conclusion

  • Studied unit selection problem in a computational setting
  • Assumed a fully specified structural causal model
  • Computed point values of causal objective functions
  • Unit selection problem with this class of objective functions is NP PP -complete
  • Identified an intuitive condition under which it is NP-complete
  • Provided an exact algorithm for the unit selection problem
  • Characterized complexity in terms of treewidth
  • Defined a new inference problem, Reverse-MAP, which is also NP PP -complete
  • Lemma 20 complements Theorem 13
  • Lemma 22 concerns the augmentation of an SCM
  • Theorem 17 holds for augmented objective model
  • Lemmas 20 and 22 used in proof
  • Time complexity of RMAP VE is O(n 1 โ€ข exp(w 1 ))
  • Bruteforce method time complexity is O(n 2 โ€ข exp(w 2 ))
  • Class of problems for which U-constrained treewidth is no smaller than number of unit variables
  • MAP VE and RMAP VE must be exponential in number of unit variables
  • Baseline method can be significantly worse