Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Gender inclusivity in language is a topic of debate and research
- Gender inclusivity in translation is largely unexplored
- Gender-Neutral Translation (GNT) is a form of gender inclusivity in translation
- MT models have been found to perpetuate gender bias and discrimination
- Relevant institutional guidelines for Gender-Inclusive Language (GIL) are reviewed
- GNT scenarios of use are discussed and a list of desiderata is devised
- Main technical challenges to the implementation of GNT in MT are identified
- Focus is on translation from English into Italian due to different rules for gender marking
Paper Content
Introduction
- Gender bias and discrimination perpetuated through language is a topic of discussion
- Language can reflect the perceived value, power and status associated with genders in society
- Psycholinguists have investigated the influence of gendered forms on cognition
- Demand for Gender-Inclusive Language has grown
- Responses have been disparate
- Debates have assumed a binary approach
- Two main linguistic strategies have been employed: innovative linguistic elements and neutral language
- English is a leader of change towards gender-inclusive language
- Situation is more complicated for other languages due to less timely discussions and grammatical structures
- Proposing a list of desiderata for Gender-Neutral Translation
- Challenges of implementing desiderata in the context of Machine Translation
Background
- Gender expression is socially relevant
- Language reflects social change
- Language interacts with perception and representation of individuals
- Appropriate use of gender expressions is critical for human and automatically generated language
Gender and language
- Gender is a complex concept that encompasses both social and individual aspects
- Language expresses gender through personal pronouns, possessive adjectives, lexically gendered forms, and compounds
- Gender representation in language can be discriminatory and reinforces social asymmetries
- Androcentric normativity promotes the masculine gender as the human prototype
- Stereotypes are reiterated and reinforced through associations of professional nouns and gender
- Gender-Inclusive Language is a form of verbal hygiene to regulate language
- Two strands of gender-related linguistic policies: non-sexist and non-heteronormative
- Innovative approaches from grassroots efforts focus on direct forms of inclusive language
- Gender-inclusive innovations are inconsistent across different languages
- Gender-neutralization strategies are an actionable and acceptable form of GIL
Gender (bias) and machine translation
- Language technologies can amplify biased behaviors
- Gender bias is a tendency to discriminate against certain individuals or groups
- Gender bias is more evident in cross-lingual scenarios
- Gender bias has both technical and societal implications
- Recent works have focused on non-binary identities in NLP
- Neutral translation is a path towards avoiding gendered inferences
Review of guidelines for gender-inclusive language
- Gender inclusivity is conceptualized differently in English and Italian guidelines
- English guidelines adopt a non-heteronormative outlook
- Italian guidelines address women and men only
- Strategies to address discrimination at the linguistic level vary between English and Italian
- Strategies to implement GIL are systematized in a multilingual perspective
- Focus is largely on masculine generics
- Discouraged androcentric forms
- Avoid stereotypical associations
- Focus on neutralization of pronouns in English and nouns in Italian
- Neutralization strategies range from omissions to replacements of single words
- Neutralization of short segments is preferable
- Trade-off between neutrality and acceptability of text
- Reformulation process resembles a rewriting process
Desiderata for a gender-inclusive translation
- Monolingual guidelines exist for Gender-Inclusive Translation (GIL)
- Search queries for GIL only provide a few tips and tricks blog posts
- Gender-Neutral Translation (GNT) is a form of GIL
- GNT does not mark gender of human referents if not assigned in source text
- Trade-off between neutrality and linguistic acceptability
- Desiderata to guide GNT: avoid expressing gender if not in source, use proper expressions when gender is in source, avoid propagating masculine generics
- Respect speaker’s choice of gender expression when translating 1st person singular referent
Challenges and insights for a gender-neutral machine translation
- Neutralization strategies systematized and converted into GNT desiderata
- Technical challenges include dedicated data, metrics and architectures
- Creation of dedicated benchmarks to determine advancements towards GNT
- Benchmarks should comprise source sentences requiring GNT and aligned with GNT counterpart in target language
- Evaluation protocol needs to be designed
- Creation of multiple reference translations or GNT-oriented quality estimation metric
- Training models without GNT examples
- Neutrally-constrained MT could rely on bilingual dictionaries
- Training methodology to reward highly-probable and low-cost outputs
- Disambiguating gender through wider context
- Disambiguating gender through external knowledge
Conclusions
- Gender bias and discrimination in language is a rising concern in automatic translation
- MT models have been found to amplify male visibility and stereotypes
- This work focuses on the use of neutral forms devoid of gender marking for an English-Italian setting
- An extensive review of gender neutralization strategies was conducted
- A definition of gender-neutral translation suitable for cross-lingual contexts was outlined
- Technical challenges for the implementation of a gender-neutral translation in MT were discussed