Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Recommender systems (RSs) are used to estimate user interests and predict their future behaviors.
Traditional RSs do not consider the causal reasons that lead to observed user behaviors, leading to biases in generated recommendations.
Recent years have seen an upsurge of interest in enhancing traditional RSs with causal inference techniques.
This survey provides an overview of causal RSs and discusses how different causal inference techniques can be introduced to address challenges.

Paper Content

Introduction

Recommender systems (RSs) are used to deliver items to users based on their personalized interests
Traditional RSs can be categorized into three classes: collaborative filtering-based, content-based, and hybrid methods
Traditional RSs can only estimate user interests and predict future recommendations based on correlations in the observational user historical behaviors and user/item features
Causal questions ask about the effects of an intervention or a counterfactual outcome, rather than mere associations in the observational data
Failing to address causal questions may incur bias in recommendations
Understanding the cause of user activities can help improve the explainability of recommendations
Causal inference allows us to identify and base recommendations on causal relations that are stable and invariant
This survey provides an overview of recent advances in causal RS research

Recommender system basics

RSs have users and items
Data for RSs is represented by a user-item rating matrix
Non-zero elements in the matrix denote user’s rating to item
Zero elements indicate missing rating
RSs are trained on observed ratings
RSs have access to user and item side information
Main purpose of RSs is to predict users’ ratings for uninteracted items

Collaborative filtering

Collaborative filtering-based RSs use user ratings to recommend new items
Three widely-used collaborative filtering-based RSs are Matrix Factorization, Deep Matrix Factorization and Auto-encoder-based RSs
Ratings are assumed to be generated from user and item latent variables
Models learn latent variables and associated functions by fitting on observed ratings
Ratings are predicted for previously uninteracted items
Models capture co-occurrence patterns in users’ past behaviors, not causal influence

Content-based recommender systems

Personalized content-based RSs estimate user interests based on the features of the items they have interacted with.
Ratings are generated by matching user interests with item content.
Training of personalized CBRSs follows similar steps as collaborative filtering.
Key step of building a CBRS is to create item features that can best reflect user interests.
Factors other than users’ interests in the item content can create an undesirable association between item content and user ratings.

Hybrid recommendation

Hybrid RSs combine user/item side information with collaborative filtering to enhance recommendations.
Hybrid strategies adjust user and item latent variables to make them compatible in the model.
Factorization machines and extensions can be viewed as learning a bi-linear function.
Hybrid strategies cannot break correlational reasoning limitation of collaborative filtering and content-based RSs.
Side information combined with domain knowledge can help form causal relations among variables of interest.

Causal recommender systems: preliminaries

Traditional RSs have limitations due to correlational reasoning on observational user behaviors
Two causal inference frameworks, RCM and SCM, can be used to build RSs with causal reasoning ability
RCM and SCM are best suited for different tasks and questions

Rubin’s potential outcome framework

Traditional RSs can only answer the question “what the rating would be if we observe an item was exposed to the user”
Item exposure is not randomized in the collected dataset
RCM-based RSs draw inspiration from clinical trials
RCM-based RSs aim to estimate the causal effects of the treatments (exposing a user to an item) on the outcomes (user ratings)
Traditional collaborative filtering-based RSs estimate conditional distribution ( |u , v ) from observed ratings
Causal graph 1 of Fig. 4-(a) is tacitly assumed
Causal path → is confounded by (e.g., item popularity)
Conditional distribution ( |u , v ) estimated from the confounded data can be calculated as: (4)
Classic solution from the RCM-based framework to address the exposure bias is to find user and item covariates
Conditional unconfoundedness assumption is formulated
Utilizing the law of total probability
Uncontrolled confounder leaves open a backdoor path (i.e., non-causal path) between and
RCM-based RSs need two extra assumptions to identify the causal effects of item exposures on ratings: SUTVA and positivity assumption

Pearl’s structural causal model

RCM uses rating potential outcomes to reason with causality and attributes biases in observed user behaviors to non-randomized item exposures
SCM delves deep into the causal mechanism that generates the observed outcomes and biases and represents it with a causal graph
Causal graph nodes specify variables of interest, such as user interests, item attributes, observed ratings, and other important covariates
Directed edges between nodes represent causal relations determined by researchers’ domain knowledge
SCM is more flexible than RCM as it can represent and reason with the causal effects between any subset of nodes and along specific paths
SCM-based causal RSs involve user and item latent variables that are inferred alongside the estimation of structural equations
Causal graphs should describe the causal mechanism that generates the observed data to distinguish stable, causal relations from other undesirable correlations
Atomic graph structures include chains, forks, and V-structures
Confounders can lead to non-causal dependencies among variables in the observational dataset
SCM allows for interventions to calculate the causal effect of two variables on an outcome
SCM-based causal inference methods can estimate causal effects with unknown confounders
Causal graphs allow for debiasing, causal disentanglement, and causal generalization

Causal recommender systems: the state-of-the-art

Bias mitigation, explainability promotion, and generalization improvement are topics of focus in state-of-the-art causal RSs
Traditional RSs have limitations due to correlational reasoning that can be addressed by these topics

Causal debiasing for recommendations

Traditional RSs can inherit multiple types of biases in the observational user behaviors.
These biases can lead to reduced recommendation quality, offensive recommendations, etc.
Causal inference can distinguish stable causal relations from spurious correlations and biases.
Exposure bias in RSs broadly refers to the bias in observed ratings due to nonrandomized item exposures.
The Balancing Property of Propensity Scores can be used to prove the unbiasedness of IPW-based RSs.
IPW-based RSs reweight the biased observational dataset to create a pseudo randomized dataset.
Confounder adjustment-based methods estimate confounders and adjust their effects in the rating prediction model.
Poisson factorization can be used to crudely approximate propensity scores.
Logistic regression can be used to estimate propensity scores if user/item features are available.
Substitute confounder estimation-based causal RSs can address exposure bias with weaker assumptions.
Deep neural networks can be used to infer user-specific substitute confounders from bundle treatment.
Popularity bias can be addressed with IPW-based methods or the structural causal model.

Causal explanation in recommendations

Causality is used to explain user decision process
Question is to disentangle user intent from past behaviors
Popularity of each item is calculated
Causal relation between user interests, user conformity and observed ratings is represented as a V-structure
DICE exploits colliding effect to achieve disentanglement
Ratings can be decomposed into conformity part and user interests part
Triplets in dataset are split into two parts based on popularity of positive and negative items
Inequalities are formed to disentangle user interests from conformity
Embeddings and match functions are learned from dataset
Alternative explanations can be used for other recommendation tasks

Causal generalization of recommendations

RSs can be improved with causal intervention and disentanglement
PD algorithm can be used to improve generalization of RSs in dynamic environment
PD disentangles causal influences of user interests and item popularity on ratings
RSs can mistakenly capture influence of current popularity level of items on ratings
Causal disentanglement can promote generalization of RSs by identifying and basing recommendations on causes that are more robust to potential changes in the environment

Evaluation strategies for causal rss

Causal inference techniques can address multiple types of biases, entanglement, and generalization problems in traditional RSs.
Evaluating causal models is difficult because the groundtruths are usually infeasible.
Strategies exist to reliably evaluate causal RSs with biased real-world data.
Real-world datasets are available to eliminate exposure bias.

Evaluation strategies for traditional rss

Split observed ratings into training and test sets
Train RS on ratings in training set
Predict missing ratings in test set
Evaluate model performance with accuracy-based and ranking-based metrics

Challenges for the evaluation of causal rss

Evaluation of causal RSs is not directly applicable because ratings in R may have same spurious correlation and bias as ratings in R
To unbiasedly evaluate effectiveness of causal RSs, ideal to have biased/entangled training set R and unbiased/disentangled test set R
Difficult to acquire and expansive to establish unbiased/disentangled test set R
Introduce common data simulation strategies for causal RS evaluation
Discuss how real-world datasets can be used to promote credibility of causal RS research

Evaluation based on simulated datasets

Dataset simulation strategy should have clear, credible design and be adjustable
Utilize real-world information as much as possible
Deep generative models can be used to simulate exposure bias
Training phase uses VAEs to generate item exposures and user ratings
Generation phase uses confounder to simulate exposure bias
Test set intervention can be used to evaluate popularity bias and disentanglement of user interests and conformity

Evaluation based on real-world datasets

Establishing bias-free real-world datasets is expensive and user-unfriendly
Industry has increased interest in causal RS research
Coat dataset: 300 users, 290 items, 24 coats rated by each user, 16 random coats rated for unbiased test set
Yahoo! R3 dataset: 300,000 ratings from 15,400 users to 1,000 items, 5,400 users rate 10 random items for unbiased test set
KuaiRec dataset: 7,176 users, 10,728 items, 1,411 users rate 3,327 items for unbiased test set
Criteo Ads and Open Bandit datasets for related topics
Coat dataset is small-scale
Yahoo! R3 dataset has large training set but small-scale randomized experiment
KuaiRec dataset has large-scale experiment for unbiased test set
Case studies for qualitative model evaluations

Future directions

Causal RS research is still in its emerging stage
Existing causal RSs may rely on assumptions that don’t hold in reality
Lack of universal causal model for RSs
Positive side of biases in RSs is seldom investigated
Real-world datasets and online tests needed to demonstrate practical utility of causal RSs

Summary

Traditional RSs rely on correlations in observed user behaviors and user/item features
Rubin’s RCM and Pearl’s SCM provide deeper insights into issues of traditional RSs and the foundation for moving traditional RSs to the upper rungs of the Ladder of Causality
State-of-the-art causal RS models lead to enhanced robustness to various biases and improved explainability
Causal RSs can base recommendations on causal relationships that are stable and invariant, leading to improved generalization abilities
Evaluation strategies for causal RSs focus on how to reliably estimate the model performance based on biased real-world data
Growing attention to causal RSs from the industry
Open problems in causal RSs need to be addressed
Fig. 1 provides an overview of the structure of the survey

Link to paper#

Abstract#

Paper Content#

Introduction#

Recommender system basics#

Collaborative filtering#

Content-based recommender systems#

Hybrid recommendation#

Causal recommender systems: preliminaries#

Rubin’s potential outcome framework#

Pearl’s structural causal model#

Causal recommender systems: the state-of-the-art#

Causal debiasing for recommendations#

Causal explanation in recommendations#

Causal generalization of recommendations#

Evaluation strategies for causal rss#

Evaluation strategies for traditional rss#

Challenges for the evaluation of causal rss#

Evaluation based on simulated datasets#

Evaluation based on real-world datasets#

Future directions#

Summary#

Link to paper

Abstract

Paper Content

Introduction

Recommender system basics

Collaborative filtering

Content-based recommender systems

Hybrid recommendation

Causal recommender systems: preliminaries

Rubin’s potential outcome framework

Pearl’s structural causal model

Causal recommender systems: the state-of-the-art

Causal debiasing for recommendations

Causal explanation in recommendations

Causal generalization of recommendations

Evaluation strategies for causal rss

Evaluation strategies for traditional rss

Challenges for the evaluation of causal rss

Evaluation based on simulated datasets

Evaluation based on real-world datasets

Future directions

Summary