Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Models trained from real-world data can replicate and increase social biases.
  • There are methods to reduce biases, but they require knowledge of the types of biases and the social groups associated with the data.
  • This paper proposes a debiasing method that does not need prior knowledge of the demographics in the dataset.
  • The method detects biased examples and reduces their weight during training.
  • Results show that it is possible to reduce social biases without costly demographic annotation.

Paper Content

Introduction

  • Neural NLP models suffer from social biases
  • Numerous debiasing methods have been proposed
  • These methods require knowledge of the biases and often require manual annotations
  • Proposed new debiasing method does not require demographic attribute
  • Method relies on Debiased Focal Loss
  • Method uses a success detector to down-weight examples
  • Method is demonstrated on two NLP tasks
  • Method is effective in bias mitigation

Methodology

Problem formulation

  • Consider general multiclass-classification problems
  • Dataset consists of triples of input, label, and protected attribute
  • Protected attribute corresponds to demographic group
  • Goal is to learn a mapping that is robust to differences in demographics
  • Robustness of model is measured using fairness metrics

Fairness metrics

  • Fairness metric is a mapping from model predictions and protected attributes to a numerical measure of bias
  • Aim to have absolute value of metric as close to 0 as possible
  • Practical fairness metrics described in Section 3.2

Debiased focal loss for social bias

  • Model fM is composed of two functions: g (feature extractor) and h (classifier)
  • Loss term is DFL (Karimi Mahabadi et al., 2020)
  • Separate model (fB) acts as bias detector
  • Loss on parameters of main model and biased model (θM and θB) is defined by equation 1

Debiasing without demographic annotations

  • Debiasing without demographic attributes is less effective than with them.
  • Debiasing still reduces bias while maintaining small reduction in performance.
  • Control model results are statistically indistinguishable from baseline.

Debiasing with demographic annotations

  • Propose to define bias as amount of demographic information extractable from example’s internal representation
  • Bias detector receives features extracted from g and classifies it for demographic attributes
  • DFL uses biased model to re-weight loss of main model, not reverse gradients from biased model
  • Examples with high success in predicting demographics are down-weighted, examples with low success are up-weighted
  • Encourages model to focus on examples with less demographic information to prevent learning demographics-task correlations

Experiments

Tasks and models

  • Experiment with two classification tasks and bias types: Occupation Prediction and Gender, and Sentiment Analysis and Race
  • Dataset contains 400K biographies scraped from the internet
  • Task is predicting one’s occupation based on a subset of their biography
  • Protected attribute is gender, each instance assigned binary genders based on pronouns in text
  • Subset of 100k examples used for Sentiment Analysis and Race
  • Proxy for writer’s racial identity is prediction of whether tweet is written in African American English or Standard American English

Metrics

  • Performance gap metrics measure the difference in performance between two demographic groups
  • Statistical metrics evaluate the statistical dependence between variables such as model predictions, gold labels, and protected attributes

Training and evaluating

  • Used BERT and DeBERTa V1 architectures
  • Added temperature hyperparameter to softmax function
  • Grid searched γ and t using validation set
  • Used ‘distance to optimum’ to balance task accuracy and fairness
  • Tested models on balanced dataset to assess bias

Baselines and competitive systems

  • Finetuned: Model architecture optimized to solve task without debiasing
  • RLACE: Method to linearly remove information from neural representations
  • DFL-demog: DFL trained with demographic annotations
  • DFL-no demog: DFL trained without demographic annotations
  • DFL-no demog+: DFL trained without demographic annotations, tuned with validation set with demographic annotations

Results

  • Reported accuracy
  • Reported fairness metrics

Sentiment classification

  • Vanilla fine-tuning baseline yields best accuracy but worst bias
  • DFL with demographic attributes leads to significant reduction of bias with minor drop in accuracy
  • DFL without demographic attributes leads to significant reduction of bias in BERT
  • DFL with demographic information for hyper-parameter choice leads to significant bias reduction with minimal accuracy drop
  • Control model not significantly different from vanilla fine-tuned model

Occupation prediction

  • Fine-tuned model performs better or as well as other methods in terms of accuracy, but has high bias
  • DFL leads to a statistically significant reduction of bias with minor drop in accuracy
  • INLP and RLACE are much less effective in reducing bias
  • As γ increases, bias metrics and accuracy both tend to decrease
  • Softmax temperature mostly affects the stability of training

Effect of debiasing on internal model representations

  • DFL is a method of debiasing models
  • A probe model is used to measure the amount of gender or racial information embedded in the model’s internal representations
  • MDL probes are used to measure the compression of the internal representations
  • Results show that the amount of demographic information decreases as γ increases
  • Debiasing with or without demographic attributes both cause the models to encode less information on these demographics

Previous work

  • NLP models can be debiased from social biases in various ways
  • Methods require defining the bias to be addressed
  • Focal Loss and Debiased Focal Loss are proposed methods for addressing class imbalances
  • DFL is used to improve NLU models on out-of-distribution data
  • Focal Loss is used to improve NLI robustness

Discussion

  • Demonstrated reduction of racial and gender biases without prior knowledge
  • Applied method to any dataset, making bias reduction feasible
  • Weakness is not possible to debias without annotated validation set
  • Less effective than other methods that used demographic attributes
  • Logical next step to investigate intersectional biases

A bias metrics

  • DTO is measured using the L2-distance from a utopia point
  • Accuracy and fairness are candidate points for the utopia point
  • Definition of candidate points and utopia points should reflect application’s needs and priorities

F.2 full probing results

  • Proposed debiasing methods
  • Chosen hyper-parameters based on accuracy only
  • Improved robustness in NLP models without prior knowledge of bias issues
  • Weak learners to identify biased examples