Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Models trained from real-world data can replicate and increase social biases.
There are methods to reduce biases, but they require knowledge of the types of biases and the social groups associated with the data.
This paper proposes a debiasing method that does not need prior knowledge of the demographics in the dataset.
The method detects biased examples and reduces their weight during training.
Results show that it is possible to reduce social biases without costly demographic annotation.

Paper Content

Introduction

Neural NLP models suffer from social biases
Numerous debiasing methods have been proposed
These methods require knowledge of the biases and often require manual annotations
Proposed new debiasing method does not require demographic attribute
Method relies on Debiased Focal Loss
Method uses a success detector to down-weight examples
Method is demonstrated on two NLP tasks
Method is effective in bias mitigation

Methodology

Problem formulation

Consider general multiclass-classification problems
Dataset consists of triples of input, label, and protected attribute
Protected attribute corresponds to demographic group
Goal is to learn a mapping that is robust to differences in demographics
Robustness of model is measured using fairness metrics

Fairness metrics

Fairness metric is a mapping from model predictions and protected attributes to a numerical measure of bias
Aim to have absolute value of metric as close to 0 as possible
Practical fairness metrics described in Section 3.2

Model fM is composed of two functions: g (feature extractor) and h (classifier)
Loss term is DFL (Karimi Mahabadi et al., 2020)
Separate model (fB) acts as bias detector
Loss on parameters of main model and biased model (θM and θB) is defined by equation 1

Debiasing without demographic annotations

Debiasing without demographic attributes is less effective than with them.
Debiasing still reduces bias while maintaining small reduction in performance.
Control model results are statistically indistinguishable from baseline.

Debiasing with demographic annotations

Propose to define bias as amount of demographic information extractable from example’s internal representation
Bias detector receives features extracted from g and classifies it for demographic attributes
DFL uses biased model to re-weight loss of main model, not reverse gradients from biased model
Examples with high success in predicting demographics are down-weighted, examples with low success are up-weighted
Encourages model to focus on examples with less demographic information to prevent learning demographics-task correlations

Experiments

Tasks and models

Experiment with two classification tasks and bias types: Occupation Prediction and Gender, and Sentiment Analysis and Race
Dataset contains 400K biographies scraped from the internet
Task is predicting one’s occupation based on a subset of their biography
Protected attribute is gender, each instance assigned binary genders based on pronouns in text
Subset of 100k examples used for Sentiment Analysis and Race
Proxy for writer’s racial identity is prediction of whether tweet is written in African American English or Standard American English

Metrics

Performance gap metrics measure the difference in performance between two demographic groups
Statistical metrics evaluate the statistical dependence between variables such as model predictions, gold labels, and protected attributes

Training and evaluating

Used BERT and DeBERTa V1 architectures
Added temperature hyperparameter to softmax function
Grid searched γ and t using validation set
Used ‘distance to optimum’ to balance task accuracy and fairness
Tested models on balanced dataset to assess bias

Baselines and competitive systems

Finetuned: Model architecture optimized to solve task without debiasing
RLACE: Method to linearly remove information from neural representations
DFL-demog: DFL trained with demographic annotations
DFL-no demog: DFL trained without demographic annotations
DFL-no demog+: DFL trained without demographic annotations, tuned with validation set with demographic annotations

Results

Reported accuracy
Reported fairness metrics

Sentiment classification

Vanilla fine-tuning baseline yields best accuracy but worst bias
DFL with demographic attributes leads to significant reduction of bias with minor drop in accuracy
DFL without demographic attributes leads to significant reduction of bias in BERT
DFL with demographic information for hyper-parameter choice leads to significant bias reduction with minimal accuracy drop
Control model not significantly different from vanilla fine-tuned model

Occupation prediction

Fine-tuned model performs better or as well as other methods in terms of accuracy, but has high bias
DFL leads to a statistically significant reduction of bias with minor drop in accuracy
INLP and RLACE are much less effective in reducing bias
As γ increases, bias metrics and accuracy both tend to decrease
Softmax temperature mostly affects the stability of training

Effect of debiasing on internal model representations

DFL is a method of debiasing models
A probe model is used to measure the amount of gender or racial information embedded in the model’s internal representations
MDL probes are used to measure the compression of the internal representations
Results show that the amount of demographic information decreases as γ increases
Debiasing with or without demographic attributes both cause the models to encode less information on these demographics

Previous work

NLP models can be debiased from social biases in various ways
Methods require defining the bias to be addressed
Focal Loss and Debiased Focal Loss are proposed methods for addressing class imbalances
DFL is used to improve NLU models on out-of-distribution data
Focal Loss is used to improve NLI robustness

Discussion

Demonstrated reduction of racial and gender biases without prior knowledge
Applied method to any dataset, making bias reduction feasible
Weakness is not possible to debias without annotated validation set
Less effective than other methods that used demographic attributes
Logical next step to investigate intersectional biases

A bias metrics

DTO is measured using the L2-distance from a utopia point
Accuracy and fairness are candidate points for the utopia point
Definition of candidate points and utopia points should reflect application’s needs and priorities

F.2 full probing results

Proposed debiasing methods
Chosen hyper-parameters based on accuracy only
Improved robustness in NLP models without prior knowledge of bias issues
Weak learners to identify biased examples

Link to paper#

Abstract#

Paper Content#

Introduction#

Methodology#

Problem formulation#

Fairness metrics#

Debiased focal loss for social bias#

Debiasing without demographic annotations#

Debiasing with demographic annotations#

Experiments#

Tasks and models#

Metrics#

Training and evaluating#

Baselines and competitive systems#

Results#

Sentiment classification#

Occupation prediction#

Effect of debiasing on internal model representations#

Previous work#

Discussion#

A bias metrics#

F.2 full probing results#

Link to paper

Abstract

Paper Content

Introduction

Methodology

Problem formulation

Fairness metrics

Debiased focal loss for social bias

Debiasing without demographic annotations

Debiasing with demographic annotations

Experiments

Tasks and models

Metrics

Training and evaluating

Baselines and competitive systems

Results

Sentiment classification

Occupation prediction

Effect of debiasing on internal model representations

Previous work

Discussion

A bias metrics

F.2 full probing results