Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Models trained from real-world data can replicate and increase social biases.
- There are methods to reduce biases, but they require knowledge of the types of biases and the social groups associated with the data.
- This paper proposes a debiasing method that does not need prior knowledge of the demographics in the dataset.
- The method detects biased examples and reduces their weight during training.
- Results show that it is possible to reduce social biases without costly demographic annotation.
Paper Content
Introduction
- Neural NLP models suffer from social biases
- Numerous debiasing methods have been proposed
- These methods require knowledge of the biases and often require manual annotations
- Proposed new debiasing method does not require demographic attribute
- Method relies on Debiased Focal Loss
- Method uses a success detector to down-weight examples
- Method is demonstrated on two NLP tasks
- Method is effective in bias mitigation
Methodology
Problem formulation
- Consider general multiclass-classification problems
- Dataset consists of triples of input, label, and protected attribute
- Protected attribute corresponds to demographic group
- Goal is to learn a mapping that is robust to differences in demographics
- Robustness of model is measured using fairness metrics
Fairness metrics
- Fairness metric is a mapping from model predictions and protected attributes to a numerical measure of bias
- Aim to have absolute value of metric as close to 0 as possible
- Practical fairness metrics described in Section 3.2
Debiased focal loss for social bias
- Model fM is composed of two functions: g (feature extractor) and h (classifier)
- Loss term is DFL (Karimi Mahabadi et al., 2020)
- Separate model (fB) acts as bias detector
- Loss on parameters of main model and biased model (θM and θB) is defined by equation 1
Debiasing without demographic annotations
- Debiasing without demographic attributes is less effective than with them.
- Debiasing still reduces bias while maintaining small reduction in performance.
- Control model results are statistically indistinguishable from baseline.
Debiasing with demographic annotations
- Propose to define bias as amount of demographic information extractable from example’s internal representation
- Bias detector receives features extracted from g and classifies it for demographic attributes
- DFL uses biased model to re-weight loss of main model, not reverse gradients from biased model
- Examples with high success in predicting demographics are down-weighted, examples with low success are up-weighted
- Encourages model to focus on examples with less demographic information to prevent learning demographics-task correlations
Experiments
Tasks and models
- Experiment with two classification tasks and bias types: Occupation Prediction and Gender, and Sentiment Analysis and Race
- Dataset contains 400K biographies scraped from the internet
- Task is predicting one’s occupation based on a subset of their biography
- Protected attribute is gender, each instance assigned binary genders based on pronouns in text
- Subset of 100k examples used for Sentiment Analysis and Race
- Proxy for writer’s racial identity is prediction of whether tweet is written in African American English or Standard American English
Metrics
- Performance gap metrics measure the difference in performance between two demographic groups
- Statistical metrics evaluate the statistical dependence between variables such as model predictions, gold labels, and protected attributes
Training and evaluating
- Used BERT and DeBERTa V1 architectures
- Added temperature hyperparameter to softmax function
- Grid searched γ and t using validation set
- Used ‘distance to optimum’ to balance task accuracy and fairness
- Tested models on balanced dataset to assess bias
Baselines and competitive systems
- Finetuned: Model architecture optimized to solve task without debiasing
- RLACE: Method to linearly remove information from neural representations
- DFL-demog: DFL trained with demographic annotations
- DFL-no demog: DFL trained without demographic annotations
- DFL-no demog+: DFL trained without demographic annotations, tuned with validation set with demographic annotations
Results
- Reported accuracy
- Reported fairness metrics
Sentiment classification
- Vanilla fine-tuning baseline yields best accuracy but worst bias
- DFL with demographic attributes leads to significant reduction of bias with minor drop in accuracy
- DFL without demographic attributes leads to significant reduction of bias in BERT
- DFL with demographic information for hyper-parameter choice leads to significant bias reduction with minimal accuracy drop
- Control model not significantly different from vanilla fine-tuned model
Occupation prediction
- Fine-tuned model performs better or as well as other methods in terms of accuracy, but has high bias
- DFL leads to a statistically significant reduction of bias with minor drop in accuracy
- INLP and RLACE are much less effective in reducing bias
- As γ increases, bias metrics and accuracy both tend to decrease
- Softmax temperature mostly affects the stability of training
Effect of debiasing on internal model representations
- DFL is a method of debiasing models
- A probe model is used to measure the amount of gender or racial information embedded in the model’s internal representations
- MDL probes are used to measure the compression of the internal representations
- Results show that the amount of demographic information decreases as γ increases
- Debiasing with or without demographic attributes both cause the models to encode less information on these demographics
Previous work
- NLP models can be debiased from social biases in various ways
- Methods require defining the bias to be addressed
- Focal Loss and Debiased Focal Loss are proposed methods for addressing class imbalances
- DFL is used to improve NLU models on out-of-distribution data
- Focal Loss is used to improve NLI robustness
Discussion
- Demonstrated reduction of racial and gender biases without prior knowledge
- Applied method to any dataset, making bias reduction feasible
- Weakness is not possible to debias without annotated validation set
- Less effective than other methods that used demographic attributes
- Logical next step to investigate intersectional biases
A bias metrics
- DTO is measured using the L2-distance from a utopia point
- Accuracy and fairness are candidate points for the utopia point
- Definition of candidate points and utopia points should reflect application’s needs and priorities
F.2 full probing results
- Proposed debiasing methods
- Chosen hyper-parameters based on accuracy only
- Improved robustness in NLP models without prior knowledge of bias issues
- Weak learners to identify biased examples