Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Designing measures of social bias that can be trusted.
Prior work has introduced several measures, but none have gained widespread trust.
Cross-disciplinary theory of measurement modeling used to design bias measures.
Explicitly define social bias, grounded in principles from social science research.
Proposed a general bias measurement framework DivDist, with 5 concrete bias measures.
Rigorous testing protocol with 8 testing criteria proposed to validate measures.
Evidence to trust measures, overcoming deficiencies present in prior measures.

Paper Content

Introduction

Language technologies are increasingly important and have direct, immediate, and significant impact.
Social bias is a central consideration and can cause harm.
Measurement is essential to reducing bias.
Many works have proposed bias measures, but no standard exists for trusting them.
Measurement modeling is used to design and validate measures of social constructs.
Measurement modeling requires defining the theoretical construct of social bias.
DivDist is a measurement framework that is compatible and allows for multi-group measurement.
DivDist makes explicit that bias is a relative phenomenon.
DivDist is tested against 8 desiderata to trust the measures.

Social bias is defined in terms of social groups and a target concept
Social science theories define bias as the target concept’s differential association with each group
Bias is a systematic asymmetry that pertains to broader social groups

Bias is relative

Bias in machine translation models is relative to the training data and a societal reference.
Many bias measures in NLP portray bias as absolute.
Bias is an inherently relative construct that requires a reference to be specified.
Effective bias measurement could quantify the relative contribution of sources such as data selection, data annotation, and model training.

Social bias is the divergence in the observed associations between a target concept and a set of social groups from corresponding reference associations.
DivDist is a two-stage measurement framework that yields a bias measure.
Three functions are specified to map from the abstract framework to a concrete bias measure: SoA, normalize, and D.
Inputs for DivDist are the target concept, social groups, and reference.
DivDist is proven to be a general framework for prior bias measures.
DivDist is instantiated for text, word embeddings, and contextualized representations.

Text

Social bias manifests in distributional statistics in text
Quantify associations in text based on these statistics
Contexts are three-sentence spans in the corpus
Mentions are judged by human domain-experts or automated by requiring a word list
Associations in a context are judged by humans or automated by requiring a word from a group’s word list

Word embeddings

Cosine similarity is the standard metric for word embeddings
Association for word embeddings is quantified as the cosine similarity between the average target word embedding and average group word embedding

Contextualized representations

Most prior measures compute a single bias value for contextualized representations
Contextualized representations are highly context-sensitive
Bias in contextualized representations will depend on the context in which they are used
Two context-sensitive approaches to quantify strength of association in contextualized representations
Reduction approach and Probing approach

Normalization and divergence parameters

Normalization and divergence are required to fully instantiate bias measures
Default settings for normalization is dividing a vector by its sum and divergence is the 1 distance

Testing protocol

Proposed a new bias measure
Measurement modeling used to build trust in measures of complex social constructs like bias
Following Messick (1987) and Jackman (2008)

Testing protocol for validity

Measure passes basic sanity checks
Measure reflects theoretical understanding of construct
Measure correlates with other credible measures of same construct
Measure predicts other credible measures of related constructs
Measure enables scientific inquiry related to construct
Measure’s eventual usage amounts to desirable social impact
Inter-annotator agreement
Measurements are stable up to difference in annotators
Measurements are stable up to difference in (hyper)parameters
Measure reflects theoretical understanding of underlying construct
Measure patterns similarly to other measures of same construct
Measure is predictive of measures of related constructs
Measure is useful for addressing scientific hypotheses
Measure is implemented as default metrics in HELM benchmark
Measure has been used to evaluate language models to understand model biases

Testing protocol for reliability

Inter-annotator agreement is required for reliable measures.
5 NLP researchers were recruited to annotate 40 contexts for binary gender.
Fleiss’ κ was reported as 0.79.
Measures are stable to variations in word lists, normalization function, and distance function.

Social bias has been qualitatively characterized in social sciences.
Several quantitative measures have been proposed to measure bias in NLP datasets.
These measures have not been adopted to facilitate social science research.
Text corpora have been instrumental to the rise of language models.
Growing interest in dataset documentation and governance.
Applied measures to bias measurement on both sides of language modeling.
Bolukbasi et al. (2016) initiated the study of bias measurement for word embeddings.
Measures adapted to measure bias in contextualized representations.
Unified framework for text and representation bias measurement.
Measures permit multiclass bias measurement.
Measures for language models via probabilities assigned to words or sequences.
Future work should investigate predictive validity of upstream bias measures.

Discussion of measurement modeling

Measurement modeling is an interdisciplinary theory with a long history
Recent works use measurement modeling to identify failures in the validity and reliability of existing bias measures
Our work is the first to argue for the trustworthiness of social bias measures based on testing via measurement modeling
Measurement modeling can be a powerful general-purpose method in NLP
Measurement modeling provides a battle-tested set of well-studied desiderata for evaluating measures in NLP

Conclusion

Trustworthy bias measures are necessary for making progress on broader goals
DivDist is a general measurement framework to measure bias
Testing protocol based on measurement modeling
Code available at https://github.com/rishibommasani/BiasMeasures
Measurement modeling criteria tested for in testing protocol
Face, convergent, predictive and hypothesis validity experiments conducted
Results stable when single parameters/inputs are perturbed

Link to paper#

Abstract#

Paper Content#

Introduction#

Principles for social bias#

Bias is relative#

Defining social bias#

Text#

Word embeddings#

Contextualized representations#

Normalization and divergence parameters#

Testing protocol#

Testing protocol for validity#

Testing protocol for reliability#

Related work#

Discussion of measurement modeling#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Principles for social bias

Bias is relative

Defining social bias

Text

Word embeddings

Contextualized representations

Normalization and divergence parameters

Testing protocol

Testing protocol for validity

Testing protocol for reliability

Related work

Discussion of measurement modeling

Conclusion