Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Neural Machine Translation (NMT) is the standard for machine translation applications.
  • NMT models can produce severely pathological translations, known as hallucinations.
  • It is important to implement strategies to guarantee proper functioning.
  • This paper addresses the problem of hallucination detection in NMT.
  • Hallucinations exhibit encoder-decoder attention patterns that are different from good quality translations.
  • An optimal transport formulation is proposed to detect hallucinations.
  • The detector outperforms previous model-based detectors and is competitive with detectors that use large models.

Paper Content

Introduction

  • Neural machine translation (NMT) is the mainstream method for automatic translation
  • Hallucinations are severely pathological translations that are detached from the source sequence content
  • There is a need to develop security mechanisms to address this issue
  • Leveraging the encoder-decoder attention mechanism to develop an on-the-fly hallucination detector
  • Anomaly detection with an optimal transport (OT) formulation to find translations with source attention mass distributions that are highly distant from those of good translations

Background

Neural machine translation with transformer models

  • Autoregressive NMT model M defines a probability distribution over an output space of hypotheses conditioned on a source sequence.
  • Model is parameterized by an encoder-decoder transformer model with a set of learned weights.
  • Multi-head encoder-decoder attention mechanism computes a distribution over all source sentence words for each generation step.
  • Scaled dot-product attention is computed using queries, keys, and values matrices.
  • Multi-head attention is used, with a representation for each head computed by invoking Equation 1 in parallel.
  • Attention maps are obtained by averaging across all heads of the last layer of the decoder module.

Optimal transport problem and wasserstein distance

  • The Wasserstein distance is a measure of the distance between two probability distributions.
  • The Wasserstein-1 distance, also known as Earth Mover’s Distance, is obtained with the choice c(u, v) = u − v 1.
  • The EMD represents the minimum amount of “work” required to transform one pile into the other, where the work is defined as the amount of mass moved multiplied by the distance it is moved.

Hallucinations in nmt

  • Hallucinations are extreme end of NMT pathologies
  • Hallucinations contain content detached from source sentence
  • Hallucinations can be categorized as largely fluent detached or oscillatory
  • Detached hallucinations can be split according to severity of detachment
  • Oscillatory hallucinations contain erroneous repetitions of words and phrases

Detection of hallucinations in nmt

Categorization of on-the-fly detectors

  • On-the-fly hallucinations detectors detect hallucinations without reference translations
  • Previous work on on-the-fly detection of hallucinations in NMT has focused on two categories of detectors: external and model-based
  • External detectors use large language models trained on millions or billions of samples
  • Model-based detectors only require access to the NMT model and leverage internal features
  • Model-based detectors can be predictive of hallucinations and outperform quality estimation models
  • This work proposes a new model-based detector that achieves greater improvements over all previously proposed detectors

Problem statement

  • Model-based detectors require obtaining internal features from a model M.
  • A scoring function s M and a threshold τ are used to build a binary rule g M .
  • If s M is an anomaly score, g M (x) = 0 implies a ’normal’ translation and g M (x) = 1 implies a ‘hallucination’ translation.

Unsupervised hallucination detection with optimal transport

  • Anomalous attention maps have been connected to the hallucinatory mode in several works
  • Our method uses the Wasserstein distance to estimate the cost of transforming a translation source mass distribution into a reference distribution
  • The higher the cost of transformation, the more distant and anomalous the attention of the translation is
  • We rely on the generated translation and its source mass distribution to decide whether the translation is an hallucination or not
  • We compute an anomaly score by measuring the Wasserstein distance between the source mass distribution and a reference distribution
  • The reference distribution is the uniform distribution
  • The cost function is the 0/1 cost function
  • We use a held-out dataset to construct a set of held-out source attention distributions
  • We apply a length filter to construct the sample reference set
  • We compute pairwise Wasserstein-1 distances between the source mass distribution and each element of the reference set
  • We obtain the anomaly score by averaging the bottom-k distances
  • We evaluate how anomalous a given translation is compared to the data distribution
  • We combine Wass-to-Unif and Wass-to-Data into a single detector

Model and data

  • Dataset of 3415 translations for WMT'18 DE-EN news translation data released by Guerreiro et al. (2022)
  • Structured annotations on critical errors and hallucinations
  • Dataset is the only available with real hallucinations produced naturally by a NMT model
  • Experiments use same Transformer model that generated translations in dataset

Baseline detectors

  • Attn-ign-SRC is a method that computes the proportion of source words with a total incoming attention mass lower than a threshold
  • Seq-Logprob is a method that computes the length-normalised sequence log-probability of the generated translation
  • CometKiwi is a reference-free model trained on nearly one million direct assessment annotations
  • LaBSE is a method that leverages cross-lingual sentence representations for the source sequence and translation

Evaluation metrics

  • Use AUROC and FPR@90TPR to evaluate performance of detectors
  • Threshold-independent evaluation metrics provide comprehensive view of performance without being influenced by threshold choice

Implementation details

  • Used library POT: Python Optimal Transport (Flamary et al., 2021)
  • 6 Results

Performance on on-the-fly detection

  • Wass-Combo is the best model-based detector
  • Wass-Combo outperforms most other methods
  • Data proximity is helpful to detect hallucinations
  • Combining model-based and data-driven methods brings further performance improvements
  • LaBSE outperforms the state-of-the-art quality estimation system CometKiwi

Do detectors specialize in different types of hallucinations?

  • Performance of different detectors for different types of hallucinations is analyzed
  • Fully detached hallucinations are easy to detect
  • Combining methods can be helpful
  • Strongly detached hallucinations are the hardest to detect
  • Oscillatory hallucinations are difficult to detect with model-based detectors
  • Combining methods can detect oscillatory hallucinations
  • Constructing reference set with translations of any quality improves performance
  • Length-filtering the distributions in reference set boosts performance significantly

Conclusions

  • Propose a novel plug-in model-based detector for hallucinations in NMT
  • Detector aims to find translations with source attention mass distribution that are distant from good quality translations
  • Outperforms all previous model-based detectors
  • Competitive with detectors that use large, pre-trained models
  • Does not require any training data
  • Can be easily deployed in real-world scenarios
  • Training quality estimation models with more negative examples can improve ability to penalize hallucinations
  • Ablations on Wass-to-Data and Wass-Combo for all relevant hyperparameters
  • Qualitative analysis on fixed-threshold scenario
  • Not able to detect fully detached hallucinations
  • Struggles with oscillatory hallucinations
  • Implemented binary heuristic to detect oscillatory hallucinations