Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Designing measures of social bias that can be trusted.
  • Prior work has introduced several measures, but none have gained widespread trust.
  • Cross-disciplinary theory of measurement modeling used to design bias measures.
  • Explicitly define social bias, grounded in principles from social science research.
  • Proposed a general bias measurement framework DivDist, with 5 concrete bias measures.
  • Rigorous testing protocol with 8 testing criteria proposed to validate measures.
  • Evidence to trust measures, overcoming deficiencies present in prior measures.

Paper Content

Introduction

  • Language technologies are increasingly important and have direct, immediate, and significant impact.
  • Social bias is a central consideration and can cause harm.
  • Measurement is essential to reducing bias.
  • Many works have proposed bias measures, but no standard exists for trusting them.
  • Measurement modeling is used to design and validate measures of social constructs.
  • Measurement modeling requires defining the theoretical construct of social bias.
  • DivDist is a measurement framework that is compatible and allows for multi-group measurement.
  • DivDist makes explicit that bias is a relative phenomenon.
  • DivDist is tested against 8 desiderata to trust the measures.

Principles for social bias

  • Social bias is defined in terms of social groups and a target concept
  • Social science theories define bias as the target concept’s differential association with each group
  • Bias is a systematic asymmetry that pertains to broader social groups

Bias is relative

  • Bias in machine translation models is relative to the training data and a societal reference.
  • Many bias measures in NLP portray bias as absolute.
  • Bias is an inherently relative construct that requires a reference to be specified.
  • Effective bias measurement could quantify the relative contribution of sources such as data selection, data annotation, and model training.

Defining social bias

  • Social bias is the divergence in the observed associations between a target concept and a set of social groups from corresponding reference associations.
  • DivDist is a two-stage measurement framework that yields a bias measure.
  • Three functions are specified to map from the abstract framework to a concrete bias measure: SoA, normalize, and D.
  • Inputs for DivDist are the target concept, social groups, and reference.
  • DivDist is proven to be a general framework for prior bias measures.
  • DivDist is instantiated for text, word embeddings, and contextualized representations.

Text

  • Social bias manifests in distributional statistics in text
  • Quantify associations in text based on these statistics
  • Contexts are three-sentence spans in the corpus
  • Mentions are judged by human domain-experts or automated by requiring a word list
  • Associations in a context are judged by humans or automated by requiring a word from a group’s word list

Word embeddings

  • Cosine similarity is the standard metric for word embeddings
  • Association for word embeddings is quantified as the cosine similarity between the average target word embedding and average group word embedding

Contextualized representations

  • Most prior measures compute a single bias value for contextualized representations
  • Contextualized representations are highly context-sensitive
  • Bias in contextualized representations will depend on the context in which they are used
  • Two context-sensitive approaches to quantify strength of association in contextualized representations
  • Reduction approach and Probing approach

Normalization and divergence parameters

  • Normalization and divergence are required to fully instantiate bias measures
  • Default settings for normalization is dividing a vector by its sum and divergence is the 1 distance

Testing protocol

  • Proposed a new bias measure
  • Measurement modeling used to build trust in measures of complex social constructs like bias
  • Following Messick (1987) and Jackman (2008)

Testing protocol for validity

  • Measure passes basic sanity checks
  • Measure reflects theoretical understanding of construct
  • Measure correlates with other credible measures of same construct
  • Measure predicts other credible measures of related constructs
  • Measure enables scientific inquiry related to construct
  • Measure’s eventual usage amounts to desirable social impact
  • Inter-annotator agreement
  • Measurements are stable up to difference in annotators
  • Measurements are stable up to difference in (hyper)parameters
  • Measure reflects theoretical understanding of underlying construct
  • Measure patterns similarly to other measures of same construct
  • Measure is predictive of measures of related constructs
  • Measure is useful for addressing scientific hypotheses
  • Measure is implemented as default metrics in HELM benchmark
  • Measure has been used to evaluate language models to understand model biases

Testing protocol for reliability

  • Inter-annotator agreement is required for reliable measures.
  • 5 NLP researchers were recruited to annotate 40 contexts for binary gender.
  • Fleiss’ κ was reported as 0.79.
  • Measures are stable to variations in word lists, normalization function, and distance function.
  • Social bias has been qualitatively characterized in social sciences.
  • Several quantitative measures have been proposed to measure bias in NLP datasets.
  • These measures have not been adopted to facilitate social science research.
  • Text corpora have been instrumental to the rise of language models.
  • Growing interest in dataset documentation and governance.
  • Applied measures to bias measurement on both sides of language modeling.
  • Bolukbasi et al. (2016) initiated the study of bias measurement for word embeddings.
  • Measures adapted to measure bias in contextualized representations.
  • Unified framework for text and representation bias measurement.
  • Measures permit multiclass bias measurement.
  • Measures for language models via probabilities assigned to words or sequences.
  • Future work should investigate predictive validity of upstream bias measures.

Discussion of measurement modeling

  • Measurement modeling is an interdisciplinary theory with a long history
  • Recent works use measurement modeling to identify failures in the validity and reliability of existing bias measures
  • Our work is the first to argue for the trustworthiness of social bias measures based on testing via measurement modeling
  • Measurement modeling can be a powerful general-purpose method in NLP
  • Measurement modeling provides a battle-tested set of well-studied desiderata for evaluating measures in NLP

Conclusion

  • Trustworthy bias measures are necessary for making progress on broader goals
  • DivDist is a general measurement framework to measure bias
  • Testing protocol based on measurement modeling
  • Code available at https://github.com/rishibommasani/BiasMeasures
  • Measurement modeling criteria tested for in testing protocol
  • Face, convergent, predictive and hypothesis validity experiments conducted
  • Results stable when single parameters/inputs are perturbed