Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • State-of-the-art causal models assume inherent node factors that govern link formation.
  • Link formation can be path-dependent, meaning the outcome of link interventions depends on existing links.
  • Existing causal methods are impractical in these scenarios.
  • This work develops the first causal model capable of dealing with path dependencies in link prediction.

Paper Content

Introduction

  • Charles Spearman published The Abilities of Man in 1927, which described mathematical tools to uncover latent common factors of intelligence.
  • Spearman’s work started a revolution that gave us matrix and tensor factorizations, PCA, and ICA.
  • Spearman warned against interpreting the factors of subject i as innate rather than acquired abilities.
  • Two competing causal hypotheses exist to describe link formation between young children and their abilities: path-dependent and innate factors.
  • Current causal models for link prediction operate under the assumption of innate factors.
  • Causal lifting is an invariance property of causal models that is able to serve link prediction under interventions in both path-dependent and innate factor models.
  • Observe graph data at pre-trial time t0 from unknown causal graph generation process
  • Denote observed graph at pre-trial time by G (t0) with corresponding adjacency matrix a (t0)
  • Probe into relation of (I, J) ∼ µ(G (t0) )
  • Probe induces intervention in graph formation process
  • Task is to predict what would have happened had we probed into relation of different pair (U, V ) ∼ µ(G (t0) )
  • Challenges include unknown data generating process, spillover and carryover effects
  • Introduce concept of causal lifting and causal link prediction learning task
  • Introduce three situations where theoretical results can be leveraged
  • Invariant pairwise embedding methods outperform node embedding ones in causal link prediction tasks
  • Link prediction has been studied since the early 1900s
  • Link prediction is either treated as an associational or causal task
  • Self-supervised learning task is used to predict true edges and nonedges in the adjacency matrix
  • Matrix factorization and GNN node embeddings are used to encode associations between graph topology and node/edge attributes
  • Model-based methods fit a random graph model to the observed graph
  • Mechanistic methods assume links are more likely created by nodes with many common neighbors
  • Inner factors literature merges matrix factorization and interventional data
  • Recommendations as treatments literature evaluates the effectiveness of a recommendation algorithm
  • Causal effect estimation on networks estimates the difference in potential outcomes of two treatments given in a network

Causal lifting

  • Causal lifting is a causal extension to the associational definition of lifting in probabilistic inference
  • Definition 1 is used to avoid unnecessary computations in probabilistic inference algorithms
  • Definition 2 and 3 extend lifting to interventional and counterfactual distributions
  • Interventional lifting can play a key role in identifying causal link prediction tasks
  • Goal is to show how interventional lifting can serve as an identification strategy and experimental design tool for causal link prediction task
  • Main result (Theorem 1) presented in context of idealized single probe task
  • Sections 4.2 and 4.3 provide more details behind concepts used in Theorem 1
  • Sections 4.4 and 4.5 show extra causal mechanism assumptions to learn general task of Equation (5) with multiple probes
  • Interventional lifting can allow us to answer counterfactual query of Equation (2).
  • Two node pairs (i, j), (u, v) from G (t0) are isomorphic if they are structurally indistinguishable.
  • Under certain conditions, interventional lifting can be obtained.
  • The observed graph G (t0) is sufficiently expressive of the causal link formation process.
  • Theorem 1 shows that if (u, v) is in (i, j)’s orbit in G (t0), the probe outcome Y uv is the same.
  • The invariances in causal mechanisms ensure interventional lifting.
  • Corollary 1 provides an identification and estimation procedure for causal link prediction.
  • The hidden parameters associated with the orbit O (t0) ij of a node pair (i, j) in G (t0) d-separates the probe from the generating process of G (t0).
  • Theorem 1 relies on a universal class of SCMs for graphs and a few symmetry conditions in their causal mechanisms.

A universal family of causal models for graphs

  • SCM generates edges and nonedges sequentially
  • SCM decides which pair of nodes will have an edge or nonedge
  • SCM can generate attributed graphs
  • SCM takes sequence of all previously generated edges and nonedges as input
  • X is invariant to the SCM intermediate states between the time the intervention probe is performed and the instant before we see its effect
  • X is invariant to the order in which edges and nonedges have been generated
  • X is invariant to which pairs of nodes were generated as non-links or were not generated at all
  • X is invariant to permutations of the node identifiers
  • Outcome of the probe is invariant to whatever happened between the observation of the graph and the probe
  • Outcome of the probe is invariant to whether an observed non-link is indeed a non-link generated by the causal model or a node pair not-yet-executed
  • Exogenous variables between node pairs in an SCM are independent and identical
  • We can identify symmetric causal link prediction queries under certain conditions.
  • Corollary 1 shows how a probe in (i, j) can be used to estimate a counterfactual query on a pair (u, v) in (i, j)’s orbit.
  • We can extend our causal assumptions to learn to answer counterfactual queries about any pair using multiple probes as training data.
  • Non-interfering probes is an invariance condition depending on both the causal mechanisms and on the pairs (I (tm) , J (tm) )’s we probe at each time t m .
  • Non-interference tends to be satisfied with higher probability if the probes are performed in pairs far away in the graph.
  • We can use multiple probes to estimate counterfactual queries, but are still restricted to probing in the orbit of the first queried pair.
  • We can sample pairs according to an arbitrary policy µ(G (t0) ) and learn a model capable of predicting our counterfactual query for any (U, V ) ∼ µ(G (t0) ).
  • We need two extra assumptions: i.i.d. exogenous variables and shared parameters of the probes’ distributions across all orbits.
  • We can approximate the equation with a MAP estimate and a prior P (W Γ ).
  • Proposition 1 describes when it is possible to estimate a causal link prediction task with multiple probes in different orbits.
  • A supervised learning-based approach is used, with probed pairs and their observed outcomes as labels.
  • Prediction is given by a graph embedding model.
  • Equation (13) estimates parameters and relies on assumptions from Proposition 1.
  • Node embedding methods build a graph by computing the node embeddings of the two nodes in the represented pair and merging them with a binding function.
  • Structural (joint) pairwise embeddings have lower bias than structural node embeddings and represent the causal structure of the task.
  • Proposition 1 and Figure 3 show that the natural representation for the causal link prediction task is a most-expressive pairwise representation.
  • Theorem 1(iv) states that most-expressive graph models capture all causal invariances in the task.
  • Structural node embeddings are a popular attempt to design structural representations of node pairs.
  • Structural joint pairwise embeddings are defined to distinguish structural representations acting jointly on the node pair from structural representations acting separately on its nodes.
  • Structural pairwise embeddings capture properties such as distances and common neighbors which play a central role in link formation mechanisms.
  • Joint embeddings and Graph Neural Networks (GNNs) are permutation-invariant node embeddings
  • Structural node embeddings encompass GNNs and classical structural node roles
  • Structural pairwise embeddings have lower bias than structural node embeddings
  • Pairwise symmetric graphs are important in link prediction tasks
  • Theorem 1 states when structural pairwise embeddings have lower bias than structural node embeddings
  • Positional node embeddings are samples from an equivariant distribution
  • Strictly positional node embeddings assign distinct representations to every node in a symmetric graph
  • Matrix factorization and other factor models reflect some notion of node distances in the graph
  • SVD and neural matrix factorization graph embeddings are strictly positional
  • SVD assigns the same embedding to a node pair if they are isomorphic and have the exact same neighborhood
  • Strictly positional node embeddings assume an incorrect causal structure for the causal link prediction task

Experimental results

  • Evaluated theoretical findings on identifying and estimating counterfactual query from Equation (5)
  • Investigated accuracy of positional node embeddings, structural node embeddings, and pairwise invariant embeddings
  • Investigated three questions: (Q1) How do structural node embeddings perform when test set contains tuples that are nodewise isomorphic to training? (Q2) How do positional node embeddings perform when test set contains tuples that did not participate in training? (Q3) How do positional and structural embedding solutions perform in real-world scenarios?
  • Evaluated Q1 and Q2 in knowledge base completion task, Q2 in covariance matrix estimation task, and Q2 and Q3 in two user-item recommendation tasks
  • Used Nonnegative Matrix Factorization (NMF), SVD, positional GCN, classical GCN, UniMP, TransE, DistMult, ComplEX, SEAL, and Neo-GNNs

Impact on knowledge graph queries

  • Task is causal link prediction in knowledge graphs
  • Constructed knowledge base of 100 non-isomorphic family trees
  • Dataset contains 28 relation types
  • Goal is to predict other family relations from parentOf relations
  • Model satisfies conditions from Theorem 1 and Proposition 1
  • 30% of family trees contain two isomorphic subtrees
  • Structural node embeddings assign same representation to isomorphic nodes
  • Positional node embeddings can distinguish relations but may not make same prediction for isomorphic pairs
  • Structural pairwise embeddings do not suffer from these problems
  • Figure 5 confirms that structural pairwise embeddings perform better than positional node embeddings and structural node embeddings

Impact on covariance matrix estimation

  • Task is mostly unexplored in causal link prediction literature
  • Consider observing samples from a joint distribution (e.g. list of measurements from a patient)
  • Construct estimated covariance matrix for attribute variables
  • Counterfactual query: what would have been the re-estimated values for entries of attributes not collected?
  • Experiments are made simultaneously, samples are i.i.d.
  • Assumption needed is that exogenous variables from patients are i.i.d.
  • Evaluated methods: Label-GCN, positional node embeddings, structural node embeddings

Impact on recommender systems

  • Two user-item recommendation datasets used: Amazon Electronics and Last FM
  • Graph built from user-item interactions between 11/24/2015 and 12/24/2015 in AE and between 2007 and 2013 in LFM
  • Counterfactual problem: what would other users consume if exposed to items?
  • Male users probed in both datasets
  • Interactions between 12/24/2015 and 12/31/2015 for AE and in 2014 for LFM
  • Nonedge probe examples selected by sampling nodes uniformly at random
  • Unclear whether causal modeling assumptions hold
  • Node embedding methods doomed to failure
  • Structural pairwise embeddings present better performance
  • GCN achieves reasonable performance in AE dataset