Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- AI models lack understanding of cause-effect relationships in the real world.
- AI models do not generalize to unseen data, produce unfair results, and are difficult to interpret.
- Causal modeling and inference methods have been developed to improve trustworthiness of AI models.
Paper Content
Causality and robustness
- Pre-processing methods use causality to create data augmentations
- Adversarial examples are artificially perturbed input values that can fool machine learning models
- Data augmentation methods use causal graphs to motivate data augmentation
- Problem abstraction methods simplify the problem for machine learning agents
- In-processing methods use causality-aware optimization objectives or architecture design choices
- Post-processing methods alter predictions or enable causality-informed model selection
- Causal models can help prevent attacks that expose users’ private information
Causality and privacy
- Pre-processing data augmentation to reduce heterogeneity across user data distribution
- In-processing using invariant risk minimization to defend against membership inference and property inference attacks
- Post-processing using test data specific normalization to improve generalization and provide better privacy guarantees
- Evaluation of success of membership inference attack measured in terms of accuracy and advantage
- Causal models used to improve privacy by reducing overfitting and providing better differential privacy guarantees
- Ex-ante impact assessment to identify potential negative effects before deployment
- Ex-post impact assessment to identify potential negative effects after deployment
Ex-ante impact assessment
- Ex-ante impact assessment predicts risks and impacts of proposed systems
- Used to assess environmental, financial, social, and human rights ramifications
- Environmental impact assessment uses deductive and inductive causal inference
- Social and fiscal impact assessment uses deductive and inductive methods to discover causal relationships
- Economic impact assessment looks at effects of introducing new economic policies or changing existing ones
Ex-post impact assessment
- Ex-ante impact assessments are limited and often cannot identify all risks and impacts
- Ex-post impact assessment is used to detect risks and impacts on the go
- Ex-ante assessments have clear guidelines and metrics for specific types of impacts
- Ex-post assessments are broader and need to define what constitutes an impact in real-time
- Causal inference is used to tackle various categories of risks and impacts
- Temporal and long-term effects can be seen in real-world systems
- Causality can help assess systems in real-time and find out elements responsible
- Causality can help identify root cause of system failure and system misuse
- Strategic risks and effects can be analyzed with causality
- Causal methods have been demonstrated to be advantageous in healthcare
Causality in healthcare through scm framework
- SCMs are used in healthcare and personalized medicine
- Causality is used to explain outcomes of medical models
- Causality is used to discover causal relationships in medical imaging
- Causality is used to repurpose drugs for new diseases
- Causality is used to identify causal factors of clinical conditions
- Causality is used to make AI algorithms fair and robust
Causality in healthcare through the po framework
- PO framework is commonly used in medical field
- Provides methods to conduct causal analysis from a statistical perspective
- Removes selection bias in historical data
- Shi and Norgeot review different research works to estimate treatment effects
- Used to test if a drug is beneficial or harmful
- Graham et al. used propensity score matching to examine risk of death in elderly patients
- Ozer et al. used propensity score matching to investigate benefits of chemotherapy
- Friedrich and Friede used several propensity score-based methods and other approaches
- Ziff et al. used PO framework-based analysis to evaluate safety and efficacy of drug
- Privacy is a main foundation for a trustworthy AI system
- Generated a new large synthetic dataset to imitate real-world data distributions and preserve individual patients’ privacy
- Identifiability metric estimates probability that an individual is identifiable
- Surveyed causal modeling and reasoning tools for enhancing trustworthy aspects of AI models
- Curated list of datasets used for recent Causal ML publications
- Overview of useful causal and non-causal tools and packages
- Overview of publicly available real-world datasets
- Benchmarks and packages for Causal Machine Learning
- Well-established tools to compare to non-causal machine learning
- CANDLE, MIND, MovieLens, Netflix Prize, WMT 14, OpenSubtitles, LAMA, ImageNet, Adult (Census Income), Human Activity Recognition, Yelp, Amazon (Product) Data, Sangiovese Grapes, WikiText-2, Jigsaw Toxicity Detection, RTGender, CrowS-Pairs, Professions, WinoBias, Winogender Schemas, English UD Treebank, Gender-Neutral GloVe Word Embeddings, Biographies