Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Gender inclusivity in language is a topic of debate and research
  • Gender inclusivity in translation is largely unexplored
  • Gender-Neutral Translation (GNT) is a form of gender inclusivity in translation
  • MT models have been found to perpetuate gender bias and discrimination
  • Relevant institutional guidelines for Gender-Inclusive Language (GIL) are reviewed
  • GNT scenarios of use are discussed and a list of desiderata is devised
  • Main technical challenges to the implementation of GNT in MT are identified
  • Focus is on translation from English into Italian due to different rules for gender marking

Paper Content

Introduction

  • Gender bias and discrimination perpetuated through language is a topic of discussion
  • Language can reflect the perceived value, power and status associated with genders in society
  • Psycholinguists have investigated the influence of gendered forms on cognition
  • Demand for Gender-Inclusive Language has grown
  • Responses have been disparate
  • Debates have assumed a binary approach
  • Two main linguistic strategies have been employed: innovative linguistic elements and neutral language
  • English is a leader of change towards gender-inclusive language
  • Situation is more complicated for other languages due to less timely discussions and grammatical structures
  • Proposing a list of desiderata for Gender-Neutral Translation
  • Challenges of implementing desiderata in the context of Machine Translation

Background

  • Gender expression is socially relevant
  • Language reflects social change
  • Language interacts with perception and representation of individuals
  • Appropriate use of gender expressions is critical for human and automatically generated language

Gender and language

  • Gender is a complex concept that encompasses both social and individual aspects
  • Language expresses gender through personal pronouns, possessive adjectives, lexically gendered forms, and compounds
  • Gender representation in language can be discriminatory and reinforces social asymmetries
  • Androcentric normativity promotes the masculine gender as the human prototype
  • Stereotypes are reiterated and reinforced through associations of professional nouns and gender
  • Gender-Inclusive Language is a form of verbal hygiene to regulate language
  • Two strands of gender-related linguistic policies: non-sexist and non-heteronormative
  • Innovative approaches from grassroots efforts focus on direct forms of inclusive language
  • Gender-inclusive innovations are inconsistent across different languages
  • Gender-neutralization strategies are an actionable and acceptable form of GIL

Gender (bias) and machine translation

  • Language technologies can amplify biased behaviors
  • Gender bias is a tendency to discriminate against certain individuals or groups
  • Gender bias is more evident in cross-lingual scenarios
  • Gender bias has both technical and societal implications
  • Recent works have focused on non-binary identities in NLP
  • Neutral translation is a path towards avoiding gendered inferences

Review of guidelines for gender-inclusive language

  • Gender inclusivity is conceptualized differently in English and Italian guidelines
  • English guidelines adopt a non-heteronormative outlook
  • Italian guidelines address women and men only
  • Strategies to address discrimination at the linguistic level vary between English and Italian
  • Strategies to implement GIL are systematized in a multilingual perspective
  • Focus is largely on masculine generics
  • Discouraged androcentric forms
  • Avoid stereotypical associations
  • Focus on neutralization of pronouns in English and nouns in Italian
  • Neutralization strategies range from omissions to replacements of single words
  • Neutralization of short segments is preferable
  • Trade-off between neutrality and acceptability of text
  • Reformulation process resembles a rewriting process

Desiderata for a gender-inclusive translation

  • Monolingual guidelines exist for Gender-Inclusive Translation (GIL)
  • Search queries for GIL only provide a few tips and tricks blog posts
  • Gender-Neutral Translation (GNT) is a form of GIL
  • GNT does not mark gender of human referents if not assigned in source text
  • Trade-off between neutrality and linguistic acceptability
  • Desiderata to guide GNT: avoid expressing gender if not in source, use proper expressions when gender is in source, avoid propagating masculine generics
  • Respect speaker’s choice of gender expression when translating 1st person singular referent

Challenges and insights for a gender-neutral machine translation

  • Neutralization strategies systematized and converted into GNT desiderata
  • Technical challenges include dedicated data, metrics and architectures
  • Creation of dedicated benchmarks to determine advancements towards GNT
  • Benchmarks should comprise source sentences requiring GNT and aligned with GNT counterpart in target language
  • Evaluation protocol needs to be designed
  • Creation of multiple reference translations or GNT-oriented quality estimation metric
  • Training models without GNT examples
  • Neutrally-constrained MT could rely on bilingual dictionaries
  • Training methodology to reward highly-probable and low-cost outputs
  • Disambiguating gender through wider context
  • Disambiguating gender through external knowledge

Conclusions

  • Gender bias and discrimination in language is a rising concern in automatic translation
  • MT models have been found to amplify male visibility and stereotypes
  • This work focuses on the use of neutral forms devoid of gender marking for an English-Italian setting
  • An extensive review of gender neutralization strategies was conducted
  • A definition of gender-neutral translation suitable for cross-lingual contexts was outlined
  • Technical challenges for the implementation of a gender-neutral translation in MT were discussed