Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Explanations of AI models must be both human-intelligible and consistent with the model’s internal structure.
- Theory of causal abstraction provides the mathematical foundations for these explanations.
- Contributions include generalizing causal abstraction to cyclic structures, using multi-source interventions, defining approximate causal abstraction, and formalizing XAI methods.
Paper Content
Introduction
- XAI seeks to explain why deep learning models make the predictions they do
- Causal analysis is the gold standard for explaining model behavior and internal reasoning
- Low-level causal explanations of behavior and internal reasoning can be easily provided, but are not interpretable to humans
- High-level explanations are easier to interpret, but difficult to trust
- Causal abstraction provides a framework for analyzing a system at multiple levels of detail simultaneously
- Causal abstraction has been applied to deep learning AI models, weather patterns, and human brains
- This paper develops the theory of causal abstraction as a mathematical framework for XAI
- Low-level variables are partitioned into clusters, each associated with a high-level variable
- Approximate causal abstraction is explored, connecting interchange intervention analysis with existing definitions
Faithful and interpretable causal explanations of ai
- Causal explanations are privileged when explaining how an artifact works
- Causal explanations allow for manipulation and control of the system
- Appropriate level of abstraction is important for causal explanations
- Intervention is a fundamental operation of causal explanations
- Causal abstraction supports interpretable explanations of AI
- Faithfulness is defined as the degree to which an explanation accurately represents the ’true reasoning process behind a model’s behavior'
Methods for explaining ai behavior
- AI model behavior is a function from inputs to outputs
- Behavior can be represented by a two-variable causal model
- XAI methods learn interpretable models to approximate uninterpretable models
- XAI methods are model-agnostic and provide same explanations for models with same behavior
- Need to ground notions of faithfulness in causality to compare XAI methods
Methods for explaining the internal structure of ai
- AI models have internal reasoning that can be represented as a program or algorithm
- Recent research aims to understand the causal mechanisms inside black box models
- Causal abstraction provides a mathematical foundation for understanding the high-level semantics of neural representations
- Interchange interventions are used to show that neural representations represent propositional content
- Iterative nullspace projection is used to evaluate whether neural representations encode concepts with ‘mental’ causes and effects
- Causal mediation analysis is used to analyze gender bias in pretrained language models
- Circuit-based explanations reverse engineer the mechanisms of a network at the level of individual neurons
- Probing is used to determine whether a concept is present in a neural representation
- Feature attribution methods ascribe scores to neural representations to capture their ‘impact’ on model behavior
Causal models
- Notation: V denotes a set of variables, X denotes a variable, x denotes a value, Val(X) denotes the range of possible values for X
- No two variables can take on the same value
- Capital letters denote variables, lower case letters denote values, bold letters denote sets of variables/values
- Domain(f), Uniform(X), ½[ϕ] are useful constructs
- Projection: given a partial setting u for a set of variables U, Proj(u, X) is the restriction of u to the variables in X
- Definition 4: causal model is a pair (V, F) where V is a set of variables and F is a set of structural functions
- Remark 5: no explicit reference to a graphical structure defining a causal ordering on the variables
- Remark 6: acyclic model notation
- Definition 7: set of solutions is the set of all v ∈ Val(V) such that all equations v = f V (v) are satisfied
- Definition 8: intervention is a partial setting i ∈ Val(I) for I ⊆ V, M i is just like M except f X is replaced with constant function v → Proj(i, X) for each X ∈ I
Example of causal models: a symbolic algorithm and neural network
- Two causal models are defined to demonstrate potential to model a variety of computational processes
- The first model is a tree-structured algorithm
- The second model is a fully-connected feed-forward neural network
- Both models solve the same task
Hierarchical equality task
- Hierarchical equality task is to determine if two pairs of objects have identical relations
- Input is two pairs of objects, output is True if both pairs are equal or unequal, False otherwise
- Domain of objects consists of triangle, square, and pentagon
- Obvious tree-structured symbolic algorithm solves the task
- Equality reasoning is ubiquitous and has been studied for broader questions about relational reasoning
- Hierarchical equality serves as a case study for explaining how abstract tree-structured composition can be implemented by a fully-connected neural network
- Neural network is trained to implement the hierarchical equality task
A tree-structured algorithm for hierarchical equality
- Algorithm A consists of four input variables and one output variable
- Acyclic causal graph is depicted in Figure 1a
- Each f Xi is a constant function
- Default total setting is [ , , , , True, True, True]
- Counterfactual result is [ , , △, , True, True, True]
A fully connected neural network for hierarchical equality
- Neural network N consists of 8 input neurons
- Values for each variable are real numbers R
- 4 sets of variables for first 4 layers
- Constant function f R k for 1 ≤ k ≤ 8
- Output neurons determined by network weights
- Network outputs True/False based on output logit values
Causal abstraction and interchange intervention analysis
- Structural conditions must be in place for H to be a high-level abstraction of the low-level model L
- N and A must be present from the previous section
Alignments between causal models
- Abstraction involves associating high-level variables with clusters of low-level variables
- Alignment between low-level and high-level causal models is introduced
- Alignment consists of a partition and a family of maps
- Alignment induces a unique translation
- Translation is a partial function from low-level interventions to high-level interventions
- Low-level interventions that correspond to high-level interventions are defined by cell-wise maps
Causal consistency and constructive abstraction
- Definition 10: An alignment between two models is consistent if the high-level intervention corresponding to a low-level intervention results in the same high-level total settings.
- Definition 11: A constructive abstraction is when the causal consistency condition is satisfied.
- Remark 12: Constructive abstraction was introduced in Beckers and Halpern (2019) and Beckers et al. (2019).
- Remark 13: Abstract interpretation is a special case of causal abstraction.
- Definition 14: A typing is a function that assigns types to variables and an equivalence relation between values of the same type.
- Definition 15: Typed causal abstraction is when an alignment is both causally and type consistent.
Interchange intervention analysis
- Interchange intervention analysis is a method of operationalizing claims of causal abstraction.
- Geiger et al. (2022b) provides a specialized theory of interchange interventions that only covers cases with a single intermediate variable.
- This paper expands on this work, presenting a general theory of interchange interventions for high-level causal models with multiple intermediate variables.
- Alignment between input and output variables is stipulated by the researcher.
- Alignment between intermediate variables must be searched for.
- Interchange interventions are limited to those that fix the entirety of a low-level partition cell or fix none of the cell.
Explanation and generalization
- Interchange interventions set variables to values based on an input.
- Definition 10 requires a commuting diagram to hold for all interventions in the range of the partial function τ.
- Explanations should generalize to unseen real-world input.
- Generalizing from training to testing data is a central question of machine learning.
Decomposing constructive causal abstraction
- Marginalization removes a set of variables from a causal model
- Variable merge collapses a partition of variables from a causal model
- Value merge collapses a partition of values for each variable from a causal model
- Marginalization links the parents and children of each variable
- Variable merge and value merge are valid only if the partition cells respect the causal dynamics of the model
- Marginalization guarantees perfect insensitivity/stability
- Value merge alters the value space of each variable
- Variable merge determines the children of their partition
- Value merge is viable when collapsed values play the same role in the model
- Constructive abstraction is a matter of being able to construct the high-level model from the low-level model with marginalization, variable merge, and value merge
Example of causal abstraction: tree-structure in neural computation
- Hierarchical equality task is important for understanding relationship between artificial and biological neural networks and modular symbolic algorithms
- Tree-structured algorithm and neural network trained to implement it (Geiger et al., 2022b)
- Causal abstraction theory explains implementation relationship between network and algorithm
An alignment between the algorithm and the neural network
- Neural network parameters have no obvious relationship to algorithm A
- Network N was constructed to be abstracted by algorithm A
- Intervention i has output values from real numbers and input values from {r , r , r △ } 4
- Intermediate neurons are assigned high-level alignment by stipulation
- Constructive abstraction will hold only if alignments to intermediate variables do not violate causal laws of A
The algorithm abstracts the neural network
- Inputs are a sequence of four shapes from the set {△, , }
- Domain of τ is restricted to 34 input interventions
- Neural network was created using interchange intervention training with the alignment Π, τ and the high-level model A
- Relation of constructive causal abstraction holds between the high-level model A and the low-level model N
- Code provided for interchange intervention training and verifying that the network N is abstracted by the algorithm A
- Interchange intervention performed on A and N with the base input ( , , △, ) and a single source input ( , , △, △)
- Network and algorithm have the same counterfactual behavior
The algorithm can be constructed from the neural network
- Network N can be transformed into algorithm A
- Transformation involves marginalization, variable merge, and value merge
- Visual depiction of transformation in Figure 5
Approximate abstraction and interchange intervention accuracy
- Constructive causal abstraction is an all-or-nothing notion
- Early applications of interchange interventions found subsets of the input space on which the causal abstraction holds
- Geiger et al. proposed interchange intervention accuracy, which is the proportion of interchange interventions where the neural network and high-level algorithm have the same input-output behavior
- Geiger et al. proposed a new notion of approximate abstraction, α-on-average constructive abstraction, which is tightly connected to interchange intervention accuracy
- Theorem 31 states that if interchange intervention accuracy is α, then the high-level algorithm is a α-on-average constructive abstraction of the neural network
Xai methods grounded in causal abstraction
- Causal abstraction can be used as a general theoretical foundation for XAI.
- Many popular XAI methods can be viewed as special cases of causal abstraction analysis.
- Causal abstraction can capture a variety of popular XAI methods with high-level models containing no more than three variables.
Lime: behavioral fidelity as approximate abstraction by a two-variable chain
- LIME is a popular XAI method
- LIME learns an interpretable model A that approximates the behavior of an uninterpretable model N
- The fidelity of the explainer model A is a measure of how the input-output behavior of A differs from that of N
- Iterative nullspace projection attempts to determine whether a concept is used by a model
- Deep learning models are highly non-linear and can make decisions using information that is not linearly accessible
- Elazar et al. (2022) and Lovering and Pavlick (2022) present methods to mitigate these concerns
Causal effect estimation as abstraction by a two-variable chain
- CEBaB benchmark evaluates explainer models on their ability to estimate the causal effect of changing the quality of food, service, ambiance, and noise in a real-world dining experience on the prediction of a sentiment classifier.
- CEBaB is represented by a single causal model with real-valued vectors for input data, prediction output, and neural representations.
- Interested in the causal effect of food quality on model output, the model is marginalized to two endogenous variables.
Causal mediation as abstraction by a three-variable chain
- Changing the value of a variable X affects a second variable Y
- Causal mediation analysis determines how this effect is mediated by a third variable Z
- Total, direct, and indirect effects can be defined with interchange interventions
- This method has been applied to the analysis of neural networks
- Goal is to identify sets of neurons that completely mediate the causal effect
- Models in causal effect estimation are probabilistic models
- Partial mediation is approximate abstraction by a three variable chain
Iterative nullspace projection as abstraction by a three variable chain
- Iterative nullspace projection is a method of removing a concept C from a target hidden representation H of a neural network N.
- The performance of N is measured to determine if it makes use of the concept C.
- A three-variable causal model is used to model iterative nullspace projection as abstraction.
- The high-level causal model is an abstraction of the low-level neural model under alignment.
Operationalizing circuit-based explanations with causal abstraction
- Linear combinations of neural activations encode high-level concepts
- Circuits defined by model weights encode meaningful algorithms over high-level concepts
- Low-level causal model encodes neural representations and circuits
- High-level causal model encodes high-level concepts and meaningful algorithms
Interchange interventions from integrated gradients
- Integrated gradients is a neural network analysis method that assigns values to neurons based on their impact on model predictions.
- Integrated gradients can be used to compute interchange interventions.
Future applications: types, infinite variables, and cycles
- Causal abstraction is a general purpose framework
- Existing XAI methods are limited to finite and acyclic models
- Demonstrating expressive capacity of causal abstraction to support arbitrary symbolic algorithms
- Articulating conditions for recursive deep learning model to implement bubble sort algorithm
- Defining a causal model S with infinite variables and values
- Structural equations of S defined for any natural numbers
- Abstraction of S can be verified through behavioral evaluation
- Abstraction with a model that encodes variables with types of integer and Boolean
Coda: abstraction for probabilistic models
- Focus on deterministic neural models
- Probabilistic models are quadruples (V, U, F, P)
- Solutions to probabilistic models are probability distributions on total settings
- Existing treatments of causal abstraction for probabilistic models require agreement between low-level and high-level models with respect to interventional distributions
- Counterfactual distributions include important quantities such as probability of necessity
- Preservation of interventional probabilities is not enough to preserve causal explanations
- Example of two models with different explanations for outcomes
- Alternative characterizations of abstraction relation explored in future work
Conclusion
- Causal abstraction is a theoretical framework for XAI
- Constructive causal abstraction can be decomposed into operations of marginalizing variables, merging variables, and merging values
- Interchange interventions provide high-level interpretations for neural representations
- Popular XAI methods can be seen as special cases of causal abstraction analysis
- Causal abstraction lays groundwork for future development of XAI methods
- Constructive probabilistic abstraction involves sets of interventions
- Counterfactual quantities reduce to interventional quantities in the deterministic setting
- Any sequence of variable merges, value merges, and marginalizations will produce a model that is a constructive abstraction of the original
- Constructive abstraction is transitive
- Aligned interchange intervention can be used to compare low-level and high-level models
- Causal structure of ∆(Π(ǫB∪Y(S))) represents input-output behavior of bubble sort
- Definition 42 (Constructive Probabilistic Abstraction) proposed
- For all sets of interventions I ⊆ Domain(τ ), the following commutes: π(Solve(M i )) = Solve(Π(M) π(i) )