Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Recommender systems (RSs) are used to estimate user interests and predict their future behaviors.
  • Traditional RSs do not consider the causal reasons that lead to observed user behaviors, leading to biases in generated recommendations.
  • Recent years have seen an upsurge of interest in enhancing traditional RSs with causal inference techniques.
  • This survey provides an overview of causal RSs and discusses how different causal inference techniques can be introduced to address challenges.

Paper Content

Introduction

  • Recommender systems (RSs) are used to deliver items to users based on their personalized interests
  • Traditional RSs can be categorized into three classes: collaborative filtering-based, content-based, and hybrid methods
  • Traditional RSs can only estimate user interests and predict future recommendations based on correlations in the observational user historical behaviors and user/item features
  • Causal questions ask about the effects of an intervention or a counterfactual outcome, rather than mere associations in the observational data
  • Failing to address causal questions may incur bias in recommendations
  • Understanding the cause of user activities can help improve the explainability of recommendations
  • Causal inference allows us to identify and base recommendations on causal relations that are stable and invariant
  • This survey provides an overview of recent advances in causal RS research

Recommender system basics

  • RSs have users and items
  • Data for RSs is represented by a user-item rating matrix
  • Non-zero elements in the matrix denote user’s rating to item
  • Zero elements indicate missing rating
  • RSs are trained on observed ratings
  • RSs have access to user and item side information
  • Main purpose of RSs is to predict users’ ratings for uninteracted items

Collaborative filtering

  • Collaborative filtering-based RSs use user ratings to recommend new items
  • Three widely-used collaborative filtering-based RSs are Matrix Factorization, Deep Matrix Factorization and Auto-encoder-based RSs
  • Ratings are assumed to be generated from user and item latent variables
  • Models learn latent variables and associated functions by fitting on observed ratings
  • Ratings are predicted for previously uninteracted items
  • Models capture co-occurrence patterns in users’ past behaviors, not causal influence

Content-based recommender systems

  • Personalized content-based RSs estimate user interests based on the features of the items they have interacted with.
  • Ratings are generated by matching user interests with item content.
  • Training of personalized CBRSs follows similar steps as collaborative filtering.
  • Key step of building a CBRS is to create item features that can best reflect user interests.
  • Factors other than users’ interests in the item content can create an undesirable association between item content and user ratings.

Hybrid recommendation

  • Hybrid RSs combine user/item side information with collaborative filtering to enhance recommendations.
  • Hybrid strategies adjust user and item latent variables to make them compatible in the model.
  • Factorization machines and extensions can be viewed as learning a bi-linear function.
  • Hybrid strategies cannot break correlational reasoning limitation of collaborative filtering and content-based RSs.
  • Side information combined with domain knowledge can help form causal relations among variables of interest.

Causal recommender systems: preliminaries

  • Traditional RSs have limitations due to correlational reasoning on observational user behaviors
  • Two causal inference frameworks, RCM and SCM, can be used to build RSs with causal reasoning ability
  • RCM and SCM are best suited for different tasks and questions

Rubin’s potential outcome framework

  • Traditional RSs can only answer the question “what the rating would be if we observe an item was exposed to the user”
  • Item exposure is not randomized in the collected dataset
  • RCM-based RSs draw inspiration from clinical trials
  • RCM-based RSs aim to estimate the causal effects of the treatments (exposing a user to an item) on the outcomes (user ratings)
  • Traditional collaborative filtering-based RSs estimate conditional distribution ( |u , v ) from observed ratings
  • Causal graph 1 of Fig. 4-(a) is tacitly assumed
  • Causal path → is confounded by (e.g., item popularity)
  • Conditional distribution ( |u , v ) estimated from the confounded data can be calculated as: (4)
  • Classic solution from the RCM-based framework to address the exposure bias is to find user and item covariates
  • Conditional unconfoundedness assumption is formulated
  • Utilizing the law of total probability
  • Uncontrolled confounder leaves open a backdoor path (i.e., non-causal path) between and
  • RCM-based RSs need two extra assumptions to identify the causal effects of item exposures on ratings: SUTVA and positivity assumption

Pearl’s structural causal model

  • RCM uses rating potential outcomes to reason with causality and attributes biases in observed user behaviors to non-randomized item exposures
  • SCM delves deep into the causal mechanism that generates the observed outcomes and biases and represents it with a causal graph
  • Causal graph nodes specify variables of interest, such as user interests, item attributes, observed ratings, and other important covariates
  • Directed edges between nodes represent causal relations determined by researchers’ domain knowledge
  • SCM is more flexible than RCM as it can represent and reason with the causal effects between any subset of nodes and along specific paths
  • SCM-based causal RSs involve user and item latent variables that are inferred alongside the estimation of structural equations
  • Causal graphs should describe the causal mechanism that generates the observed data to distinguish stable, causal relations from other undesirable correlations
  • Atomic graph structures include chains, forks, and V-structures
  • Confounders can lead to non-causal dependencies among variables in the observational dataset
  • SCM allows for interventions to calculate the causal effect of two variables on an outcome
  • SCM-based causal inference methods can estimate causal effects with unknown confounders
  • Causal graphs allow for debiasing, causal disentanglement, and causal generalization

Causal recommender systems: the state-of-the-art

  • Bias mitigation, explainability promotion, and generalization improvement are topics of focus in state-of-the-art causal RSs
  • Traditional RSs have limitations due to correlational reasoning that can be addressed by these topics

Causal debiasing for recommendations

  • Traditional RSs can inherit multiple types of biases in the observational user behaviors.
  • These biases can lead to reduced recommendation quality, offensive recommendations, etc.
  • Causal inference can distinguish stable causal relations from spurious correlations and biases.
  • Exposure bias in RSs broadly refers to the bias in observed ratings due to nonrandomized item exposures.
  • The Balancing Property of Propensity Scores can be used to prove the unbiasedness of IPW-based RSs.
  • IPW-based RSs reweight the biased observational dataset to create a pseudo randomized dataset.
  • Confounder adjustment-based methods estimate confounders and adjust their effects in the rating prediction model.
  • Poisson factorization can be used to crudely approximate propensity scores.
  • Logistic regression can be used to estimate propensity scores if user/item features are available.
  • Substitute confounder estimation-based causal RSs can address exposure bias with weaker assumptions.
  • Deep neural networks can be used to infer user-specific substitute confounders from bundle treatment.
  • Popularity bias can be addressed with IPW-based methods or the structural causal model.

Causal explanation in recommendations

  • Causality is used to explain user decision process
  • Question is to disentangle user intent from past behaviors
  • Popularity of each item is calculated
  • Causal relation between user interests, user conformity and observed ratings is represented as a V-structure
  • DICE exploits colliding effect to achieve disentanglement
  • Ratings can be decomposed into conformity part and user interests part
  • Triplets in dataset are split into two parts based on popularity of positive and negative items
  • Inequalities are formed to disentangle user interests from conformity
  • Embeddings and match functions are learned from dataset
  • Alternative explanations can be used for other recommendation tasks

Causal generalization of recommendations

  • RSs can be improved with causal intervention and disentanglement
  • PD algorithm can be used to improve generalization of RSs in dynamic environment
  • PD disentangles causal influences of user interests and item popularity on ratings
  • RSs can mistakenly capture influence of current popularity level of items on ratings
  • Causal disentanglement can promote generalization of RSs by identifying and basing recommendations on causes that are more robust to potential changes in the environment

Evaluation strategies for causal rss

  • Causal inference techniques can address multiple types of biases, entanglement, and generalization problems in traditional RSs.
  • Evaluating causal models is difficult because the groundtruths are usually infeasible.
  • Strategies exist to reliably evaluate causal RSs with biased real-world data.
  • Real-world datasets are available to eliminate exposure bias.

Evaluation strategies for traditional rss

  • Split observed ratings into training and test sets
  • Train RS on ratings in training set
  • Predict missing ratings in test set
  • Evaluate model performance with accuracy-based and ranking-based metrics

Challenges for the evaluation of causal rss

  • Evaluation of causal RSs is not directly applicable because ratings in R may have same spurious correlation and bias as ratings in R
  • To unbiasedly evaluate effectiveness of causal RSs, ideal to have biased/entangled training set R and unbiased/disentangled test set R
  • Difficult to acquire and expansive to establish unbiased/disentangled test set R
  • Introduce common data simulation strategies for causal RS evaluation
  • Discuss how real-world datasets can be used to promote credibility of causal RS research

Evaluation based on simulated datasets

  • Dataset simulation strategy should have clear, credible design and be adjustable
  • Utilize real-world information as much as possible
  • Deep generative models can be used to simulate exposure bias
  • Training phase uses VAEs to generate item exposures and user ratings
  • Generation phase uses confounder to simulate exposure bias
  • Test set intervention can be used to evaluate popularity bias and disentanglement of user interests and conformity

Evaluation based on real-world datasets

  • Establishing bias-free real-world datasets is expensive and user-unfriendly
  • Industry has increased interest in causal RS research
  • Coat dataset: 300 users, 290 items, 24 coats rated by each user, 16 random coats rated for unbiased test set
  • Yahoo! R3 dataset: 300,000 ratings from 15,400 users to 1,000 items, 5,400 users rate 10 random items for unbiased test set
  • KuaiRec dataset: 7,176 users, 10,728 items, 1,411 users rate 3,327 items for unbiased test set
  • Criteo Ads and Open Bandit datasets for related topics
  • Coat dataset is small-scale
  • Yahoo! R3 dataset has large training set but small-scale randomized experiment
  • KuaiRec dataset has large-scale experiment for unbiased test set
  • Case studies for qualitative model evaluations

Future directions

  • Causal RS research is still in its emerging stage
  • Existing causal RSs may rely on assumptions that don’t hold in reality
  • Lack of universal causal model for RSs
  • Positive side of biases in RSs is seldom investigated
  • Real-world datasets and online tests needed to demonstrate practical utility of causal RSs

Summary

  • Traditional RSs rely on correlations in observed user behaviors and user/item features
  • Rubin’s RCM and Pearl’s SCM provide deeper insights into issues of traditional RSs and the foundation for moving traditional RSs to the upper rungs of the Ladder of Causality
  • State-of-the-art causal RS models lead to enhanced robustness to various biases and improved explainability
  • Causal RSs can base recommendations on causal relationships that are stable and invariant, leading to improved generalization abilities
  • Evaluation strategies for causal RSs focus on how to reliably estimate the model performance based on biased real-world data
  • Growing attention to causal RSs from the industry
  • Open problems in causal RSs need to be addressed
  • Fig. 1 provides an overview of the structure of the survey