Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Designing measures of social bias that can be trusted.
- Prior work has introduced several measures, but none have gained widespread trust.
- Cross-disciplinary theory of measurement modeling used to design bias measures.
- Explicitly define social bias, grounded in principles from social science research.
- Proposed a general bias measurement framework DivDist, with 5 concrete bias measures.
- Rigorous testing protocol with 8 testing criteria proposed to validate measures.
- Evidence to trust measures, overcoming deficiencies present in prior measures.
Paper Content
Introduction
- Language technologies are increasingly important and have direct, immediate, and significant impact.
- Social bias is a central consideration and can cause harm.
- Measurement is essential to reducing bias.
- Many works have proposed bias measures, but no standard exists for trusting them.
- Measurement modeling is used to design and validate measures of social constructs.
- Measurement modeling requires defining the theoretical construct of social bias.
- DivDist is a measurement framework that is compatible and allows for multi-group measurement.
- DivDist makes explicit that bias is a relative phenomenon.
- DivDist is tested against 8 desiderata to trust the measures.
Principles for social bias
- Social bias is defined in terms of social groups and a target concept
- Social science theories define bias as the target concept’s differential association with each group
- Bias is a systematic asymmetry that pertains to broader social groups
Bias is relative
- Bias in machine translation models is relative to the training data and a societal reference.
- Many bias measures in NLP portray bias as absolute.
- Bias is an inherently relative construct that requires a reference to be specified.
- Effective bias measurement could quantify the relative contribution of sources such as data selection, data annotation, and model training.
Defining social bias
- Social bias is the divergence in the observed associations between a target concept and a set of social groups from corresponding reference associations.
- DivDist is a two-stage measurement framework that yields a bias measure.
- Three functions are specified to map from the abstract framework to a concrete bias measure: SoA, normalize, and D.
- Inputs for DivDist are the target concept, social groups, and reference.
- DivDist is proven to be a general framework for prior bias measures.
- DivDist is instantiated for text, word embeddings, and contextualized representations.
Text
- Social bias manifests in distributional statistics in text
- Quantify associations in text based on these statistics
- Contexts are three-sentence spans in the corpus
- Mentions are judged by human domain-experts or automated by requiring a word list
- Associations in a context are judged by humans or automated by requiring a word from a group’s word list
Word embeddings
- Cosine similarity is the standard metric for word embeddings
- Association for word embeddings is quantified as the cosine similarity between the average target word embedding and average group word embedding
Contextualized representations
- Most prior measures compute a single bias value for contextualized representations
- Contextualized representations are highly context-sensitive
- Bias in contextualized representations will depend on the context in which they are used
- Two context-sensitive approaches to quantify strength of association in contextualized representations
- Reduction approach and Probing approach
Normalization and divergence parameters
- Normalization and divergence are required to fully instantiate bias measures
- Default settings for normalization is dividing a vector by its sum and divergence is the 1 distance
Testing protocol
- Proposed a new bias measure
- Measurement modeling used to build trust in measures of complex social constructs like bias
- Following Messick (1987) and Jackman (2008)
Testing protocol for validity
- Measure passes basic sanity checks
- Measure reflects theoretical understanding of construct
- Measure correlates with other credible measures of same construct
- Measure predicts other credible measures of related constructs
- Measure enables scientific inquiry related to construct
- Measure’s eventual usage amounts to desirable social impact
- Inter-annotator agreement
- Measurements are stable up to difference in annotators
- Measurements are stable up to difference in (hyper)parameters
- Measure reflects theoretical understanding of underlying construct
- Measure patterns similarly to other measures of same construct
- Measure is predictive of measures of related constructs
- Measure is useful for addressing scientific hypotheses
- Measure is implemented as default metrics in HELM benchmark
- Measure has been used to evaluate language models to understand model biases
Testing protocol for reliability
- Inter-annotator agreement is required for reliable measures.
- 5 NLP researchers were recruited to annotate 40 contexts for binary gender.
- Fleiss’ κ was reported as 0.79.
- Measures are stable to variations in word lists, normalization function, and distance function.
Related work
- Social bias has been qualitatively characterized in social sciences.
- Several quantitative measures have been proposed to measure bias in NLP datasets.
- These measures have not been adopted to facilitate social science research.
- Text corpora have been instrumental to the rise of language models.
- Growing interest in dataset documentation and governance.
- Applied measures to bias measurement on both sides of language modeling.
- Bolukbasi et al. (2016) initiated the study of bias measurement for word embeddings.
- Measures adapted to measure bias in contextualized representations.
- Unified framework for text and representation bias measurement.
- Measures permit multiclass bias measurement.
- Measures for language models via probabilities assigned to words or sequences.
- Future work should investigate predictive validity of upstream bias measures.
Discussion of measurement modeling
- Measurement modeling is an interdisciplinary theory with a long history
- Recent works use measurement modeling to identify failures in the validity and reliability of existing bias measures
- Our work is the first to argue for the trustworthiness of social bias measures based on testing via measurement modeling
- Measurement modeling can be a powerful general-purpose method in NLP
- Measurement modeling provides a battle-tested set of well-studied desiderata for evaluating measures in NLP
Conclusion
- Trustworthy bias measures are necessary for making progress on broader goals
- DivDist is a general measurement framework to measure bias
- Testing protocol based on measurement modeling
- Code available at https://github.com/rishibommasani/BiasMeasures
- Measurement modeling criteria tested for in testing protocol
- Face, convergent, predictive and hypothesis validity experiments conducted
- Results stable when single parameters/inputs are perturbed