Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Unit selection problem aims to identify objects that exhibit desired behavior when subjected to stimuli
- Existing work focuses on bounding a specific class of objective functions
- Proposed algorithm for finding optimal units given a broad class of causal objective functions and a fully specified structural causal model
- Unit selection under this class of objective functions is $\text{NP}^\text{PP}$-complete
- Treewidth-based complexity bounds on proposed algorithm
Paper Content
Introduction
- Theory of causality based on two parallel hierarchies: information and reasoning
- Three levels of reasoning: associational, interventional and counterfactual
- Knowledge encoded as associational, causal and functional models
- Unit selection problem: selecting customers to target with an encouragement offer
- Four types of customers: responders, always-takers, always-deniers, contrarians
- Benefit function to score customers and identify most promising ones
- Contrast with classical loss functions
- Structured units: decisions, policies, people, situations, regions, activities
- Fully specified SCM to obtain point values for any causal objective function
- Computational problem of finding units that optimize causal objective functions
- Exact algorithm to solve unit optimization problem: Reverse-MAP
- Complexity of algorithm characterized by treewidth
Counterfactual queries on structural causal models
- Structural causal models (SCMs) are used to define the unit selection problem
Causal objective functions and unit selection
- Causal objective functions involve observational, interventional or counterfactual probabilities
- Goal is to find objects (units) that optimize the function
- Linear combination of counterfactual probabilities
- Unit variables are exogenous in the SCM
The complexity of unit selection
- Unit selection is NP-PP-complete for the class of causal objective functions given in Equation (1).
- Unit selection is NP-complete when unit variables correspond to all exogenous variables in the SCM.
- We can evaluate the objective L(u) by evaluating a single observational probability involving unit variables U.
- We can optimize the objective function L(u) on an SCM G by computing the instantiation argmax u Pr (y, w|x, v, e, u) on an objective model G.
- D-Reverse-MAP is NP PP -complete.
- D-Reverse-MAP is NP-complete if its target variables are all the SCM root variables.
- Unit selection is NP-complete when the unit variables are all the SCM exogenous (root) variables.
Unit selection using variable elimination
- Reduction from unit selection on SCM to Reverse-MAP on objective model
- Variable elimination algorithm for Reverse-MAP to solve unit selection
- Analyzed complexity of method and compared to Reverse-MAP on SCM
Reverse-map using variable elimination
- VE algorithm for MAP uses factors to map variables to non-negative numbers
- SCM distribution is given by multiplying all factors
- MAP probability is given by maximizing out target variables from a factor
- Naive evaluation of MAP probability has complexity of O(n exp(n))
- VE algorithm for MAP has complexity of O(n exp(w)) where w is the width of the used elimination order
- VE algorithm for Reverse-MAP runs two passes of elimination
- First pass sums out variables under evidence e1, e2
- Second pass sums out variables under evidence e2
- Divide factors from first pass by factors from second pass to obtain MAP probability
- Complexity of Reverse-MAP VE is O(n exp(w))
Bounding the complexity of unit selection using variable elimination
- RMAP VE is expected to be more expensive on an objective model than an SCM
- Treewidth is used to analyze elimination algorithms
- An elimination order for an objective model can be constructed from an SCM
- Theorem 14 provides a bound on the width of an elimination order for an objective model
- Corollary 15 states that the treewidth of an objective model is less than or equal to 3 times the treewidth of an SCM
- U-constrained elimination orders must place the mixture variable before unit variables
- Theorem 17 provides a bound on the U-constrained treewidth of an objective model
- Corollary 18 states that the U-constrained treewidth of an objective model is less than or equal to the maximum of 3 times the treewidth of an SCM and the number of unit variables
- The bound on U-constrained treewidth can be tighter depending on the objective function properties
- An experiment was conducted to compare the complexities of MAP VE, RMAP VE and a bruteforce method
- The gap between the complexities of MAP VE and RMAP VE narrows as the size of the problem grows
- The gap between the complexities of RMAP VE and the bruteforce method grows
Conclusion
- Studied unit selection problem in a computational setting
- Assumed a fully specified structural causal model
- Computed point values of causal objective functions
- Unit selection problem with this class of objective functions is NP PP -complete
- Identified an intuitive condition under which it is NP-complete
- Provided an exact algorithm for the unit selection problem
- Characterized complexity in terms of treewidth
- Defined a new inference problem, Reverse-MAP, which is also NP PP -complete
- Lemma 20 complements Theorem 13
- Lemma 22 concerns the augmentation of an SCM
- Theorem 17 holds for augmented objective model
- Lemmas 20 and 22 used in proof
- Time complexity of RMAP VE is O(n 1 โข exp(w 1 ))
- Bruteforce method time complexity is O(n 2 โข exp(w 2 ))
- Class of problems for which U-constrained treewidth is no smaller than number of unit variables
- MAP VE and RMAP VE must be exponential in number of unit variables
- Baseline method can be significantly worse