Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Recommender systems (RSs) are used to estimate user interests and predict their future behaviors.
- Traditional RSs do not consider the causal reasons that lead to observed user behaviors, leading to biases in generated recommendations.
- Recent years have seen an upsurge of interest in enhancing traditional RSs with causal inference techniques.
- This survey provides an overview of causal RSs and discusses how different causal inference techniques can be introduced to address challenges.
Paper Content
Introduction
- Recommender systems (RSs) are used to deliver items to users based on their personalized interests
- Traditional RSs can be categorized into three classes: collaborative filtering-based, content-based, and hybrid methods
- Traditional RSs can only estimate user interests and predict future recommendations based on correlations in the observational user historical behaviors and user/item features
- Causal questions ask about the effects of an intervention or a counterfactual outcome, rather than mere associations in the observational data
- Failing to address causal questions may incur bias in recommendations
- Understanding the cause of user activities can help improve the explainability of recommendations
- Causal inference allows us to identify and base recommendations on causal relations that are stable and invariant
- This survey provides an overview of recent advances in causal RS research
Recommender system basics
- RSs have users and items
- Data for RSs is represented by a user-item rating matrix
- Non-zero elements in the matrix denote user’s rating to item
- Zero elements indicate missing rating
- RSs are trained on observed ratings
- RSs have access to user and item side information
- Main purpose of RSs is to predict users’ ratings for uninteracted items
Collaborative filtering
- Collaborative filtering-based RSs use user ratings to recommend new items
- Three widely-used collaborative filtering-based RSs are Matrix Factorization, Deep Matrix Factorization and Auto-encoder-based RSs
- Ratings are assumed to be generated from user and item latent variables
- Models learn latent variables and associated functions by fitting on observed ratings
- Ratings are predicted for previously uninteracted items
- Models capture co-occurrence patterns in users’ past behaviors, not causal influence
Content-based recommender systems
- Personalized content-based RSs estimate user interests based on the features of the items they have interacted with.
- Ratings are generated by matching user interests with item content.
- Training of personalized CBRSs follows similar steps as collaborative filtering.
- Key step of building a CBRS is to create item features that can best reflect user interests.
- Factors other than users’ interests in the item content can create an undesirable association between item content and user ratings.
Hybrid recommendation
- Hybrid RSs combine user/item side information with collaborative filtering to enhance recommendations.
- Hybrid strategies adjust user and item latent variables to make them compatible in the model.
- Factorization machines and extensions can be viewed as learning a bi-linear function.
- Hybrid strategies cannot break correlational reasoning limitation of collaborative filtering and content-based RSs.
- Side information combined with domain knowledge can help form causal relations among variables of interest.
Causal recommender systems: preliminaries
- Traditional RSs have limitations due to correlational reasoning on observational user behaviors
- Two causal inference frameworks, RCM and SCM, can be used to build RSs with causal reasoning ability
- RCM and SCM are best suited for different tasks and questions
Rubin’s potential outcome framework
- Traditional RSs can only answer the question “what the rating would be if we observe an item was exposed to the user”
- Item exposure is not randomized in the collected dataset
- RCM-based RSs draw inspiration from clinical trials
- RCM-based RSs aim to estimate the causal effects of the treatments (exposing a user to an item) on the outcomes (user ratings)
- Traditional collaborative filtering-based RSs estimate conditional distribution ( |u , v ) from observed ratings
- Causal graph 1 of Fig. 4-(a) is tacitly assumed
- Causal path → is confounded by (e.g., item popularity)
- Conditional distribution ( |u , v ) estimated from the confounded data can be calculated as: (4)
- Classic solution from the RCM-based framework to address the exposure bias is to find user and item covariates
- Conditional unconfoundedness assumption is formulated
- Utilizing the law of total probability
- Uncontrolled confounder leaves open a backdoor path (i.e., non-causal path) between and
- RCM-based RSs need two extra assumptions to identify the causal effects of item exposures on ratings: SUTVA and positivity assumption
Pearl’s structural causal model
- RCM uses rating potential outcomes to reason with causality and attributes biases in observed user behaviors to non-randomized item exposures
- SCM delves deep into the causal mechanism that generates the observed outcomes and biases and represents it with a causal graph
- Causal graph nodes specify variables of interest, such as user interests, item attributes, observed ratings, and other important covariates
- Directed edges between nodes represent causal relations determined by researchers’ domain knowledge
- SCM is more flexible than RCM as it can represent and reason with the causal effects between any subset of nodes and along specific paths
- SCM-based causal RSs involve user and item latent variables that are inferred alongside the estimation of structural equations
- Causal graphs should describe the causal mechanism that generates the observed data to distinguish stable, causal relations from other undesirable correlations
- Atomic graph structures include chains, forks, and V-structures
- Confounders can lead to non-causal dependencies among variables in the observational dataset
- SCM allows for interventions to calculate the causal effect of two variables on an outcome
- SCM-based causal inference methods can estimate causal effects with unknown confounders
- Causal graphs allow for debiasing, causal disentanglement, and causal generalization
Causal recommender systems: the state-of-the-art
- Bias mitigation, explainability promotion, and generalization improvement are topics of focus in state-of-the-art causal RSs
- Traditional RSs have limitations due to correlational reasoning that can be addressed by these topics
Causal debiasing for recommendations
- Traditional RSs can inherit multiple types of biases in the observational user behaviors.
- These biases can lead to reduced recommendation quality, offensive recommendations, etc.
- Causal inference can distinguish stable causal relations from spurious correlations and biases.
- Exposure bias in RSs broadly refers to the bias in observed ratings due to nonrandomized item exposures.
- The Balancing Property of Propensity Scores can be used to prove the unbiasedness of IPW-based RSs.
- IPW-based RSs reweight the biased observational dataset to create a pseudo randomized dataset.
- Confounder adjustment-based methods estimate confounders and adjust their effects in the rating prediction model.
- Poisson factorization can be used to crudely approximate propensity scores.
- Logistic regression can be used to estimate propensity scores if user/item features are available.
- Substitute confounder estimation-based causal RSs can address exposure bias with weaker assumptions.
- Deep neural networks can be used to infer user-specific substitute confounders from bundle treatment.
- Popularity bias can be addressed with IPW-based methods or the structural causal model.
Causal explanation in recommendations
- Causality is used to explain user decision process
- Question is to disentangle user intent from past behaviors
- Popularity of each item is calculated
- Causal relation between user interests, user conformity and observed ratings is represented as a V-structure
- DICE exploits colliding effect to achieve disentanglement
- Ratings can be decomposed into conformity part and user interests part
- Triplets in dataset are split into two parts based on popularity of positive and negative items
- Inequalities are formed to disentangle user interests from conformity
- Embeddings and match functions are learned from dataset
- Alternative explanations can be used for other recommendation tasks
Causal generalization of recommendations
- RSs can be improved with causal intervention and disentanglement
- PD algorithm can be used to improve generalization of RSs in dynamic environment
- PD disentangles causal influences of user interests and item popularity on ratings
- RSs can mistakenly capture influence of current popularity level of items on ratings
- Causal disentanglement can promote generalization of RSs by identifying and basing recommendations on causes that are more robust to potential changes in the environment
Evaluation strategies for causal rss
- Causal inference techniques can address multiple types of biases, entanglement, and generalization problems in traditional RSs.
- Evaluating causal models is difficult because the groundtruths are usually infeasible.
- Strategies exist to reliably evaluate causal RSs with biased real-world data.
- Real-world datasets are available to eliminate exposure bias.
Evaluation strategies for traditional rss
- Split observed ratings into training and test sets
- Train RS on ratings in training set
- Predict missing ratings in test set
- Evaluate model performance with accuracy-based and ranking-based metrics
Challenges for the evaluation of causal rss
- Evaluation of causal RSs is not directly applicable because ratings in R may have same spurious correlation and bias as ratings in R
- To unbiasedly evaluate effectiveness of causal RSs, ideal to have biased/entangled training set R and unbiased/disentangled test set R
- Difficult to acquire and expansive to establish unbiased/disentangled test set R
- Introduce common data simulation strategies for causal RS evaluation
- Discuss how real-world datasets can be used to promote credibility of causal RS research
Evaluation based on simulated datasets
- Dataset simulation strategy should have clear, credible design and be adjustable
- Utilize real-world information as much as possible
- Deep generative models can be used to simulate exposure bias
- Training phase uses VAEs to generate item exposures and user ratings
- Generation phase uses confounder to simulate exposure bias
- Test set intervention can be used to evaluate popularity bias and disentanglement of user interests and conformity
Evaluation based on real-world datasets
- Establishing bias-free real-world datasets is expensive and user-unfriendly
- Industry has increased interest in causal RS research
- Coat dataset: 300 users, 290 items, 24 coats rated by each user, 16 random coats rated for unbiased test set
- Yahoo! R3 dataset: 300,000 ratings from 15,400 users to 1,000 items, 5,400 users rate 10 random items for unbiased test set
- KuaiRec dataset: 7,176 users, 10,728 items, 1,411 users rate 3,327 items for unbiased test set
- Criteo Ads and Open Bandit datasets for related topics
- Coat dataset is small-scale
- Yahoo! R3 dataset has large training set but small-scale randomized experiment
- KuaiRec dataset has large-scale experiment for unbiased test set
- Case studies for qualitative model evaluations
Future directions
- Causal RS research is still in its emerging stage
- Existing causal RSs may rely on assumptions that don’t hold in reality
- Lack of universal causal model for RSs
- Positive side of biases in RSs is seldom investigated
- Real-world datasets and online tests needed to demonstrate practical utility of causal RSs
Summary
- Traditional RSs rely on correlations in observed user behaviors and user/item features
- Rubin’s RCM and Pearl’s SCM provide deeper insights into issues of traditional RSs and the foundation for moving traditional RSs to the upper rungs of the Ladder of Causality
- State-of-the-art causal RS models lead to enhanced robustness to various biases and improved explainability
- Causal RSs can base recommendations on causal relationships that are stable and invariant, leading to improved generalization abilities
- Evaluation strategies for causal RSs focus on how to reliably estimate the model performance based on biased real-world data
- Growing attention to causal RSs from the industry
- Open problems in causal RSs need to be addressed
- Fig. 1 provides an overview of the structure of the survey