Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Information retrieval tasks require large labeled datasets for fine-tuning.
  • Large language models can be used to generate large numbers of synthetic queries cheaply.
  • Reranker models are used to fine-tune the synthetic queries.
  • Boosts zero-shot accuracy in long-tail domains.
  • Lower latency than standard reranking methods.

Paper Content

Introduction

  • Neural IR has led to performance improvements on document and passage retrieval tasks
  • Neural retrievers benefit from fine-tuning on large labeled datasets
  • IR models can experience significant drops in accuracy due to distribution shifts
  • UDAPDR is an efficient strategy for using LLMs to facilitate unsupervised domain adaptation of neural retriever models
  • UDAPDR leads to large gains in zero-shot settings on a diverse range of domains
  • UDAPDR uses a powerful and expensive LLM to create an initial set of synthetic queries
  • These queries are used to train separate rerankers, which are distilled into a single Col-BERTv2 retriever
  • UDAPDR only requires 1000s of synthetic queries to prove effective
  • Code and synthetic datasets for UDAPDR will be publicly available

Data augmentation for neural ir

  • Generated datasets support domain adaptation in Transformer-based architectures
  • LLMs used to improve IR accuracy in new domains via synthetic datasets
  • Domain shift is the most pressing challenge for effective domain transfer
  • Different types of domain shifts can be addressed with synthetic data and indexing strategies

Pretraining objectives for ir

  • Pretraining objectives can help neural IR systems adapt to new domains without annotations
  • MLM and ICT are unsupervised approaches for helping retrieval models adapt to new domains
  • BFS and WLP are unsupervised pretraining tasks that use sampled in-domain sentences and passages
  • NVSM is an unsupervised pretraining task for news article retrieval
  • Contrastive learning objective for unsupervised training of dense retrievers
  • ICT paired with synthetic query data for domain adaptation
  • Contrastive learning objective paired with unsupervised Promptagator strategy
  • Unsupervised domain adaptation approach does not require any further pretraining

Methodology

  • UDAPDR strategy requires access to in-domain passages, but not queries or labels
  • Goal is to generate large numbers of synthetic queries for passages
  • Stage 1: X in-domain passages sampled from target domain, 5X synthetic queries generated using GPT-3 and 5 prompting strategies
  • Stage 2: Y corpus-adapted prompts created, varying according to demonstrations
  • Stage 3: Z queries generated with Flan-T5 XXL, quality filter applied
  • Stage 4: Y rerankers trained from scratch, N best rerankers selected
  • Stage 5: Multi-teacher distillation process used to create single ColBERTv2 retriever
  • Stage 6: Domain-adapted ColBERTv2 retriever tested on evaluation set for target domain

Experiments

Models

  • Leveraged Demonstrate-Search-Predict (DSP) codebase for experiments
  • Used DeBERTaV3-Large as crossencoder after comparison experiments
  • Used ColBERTv2 retriever for IR system

Datasets

  • Used LoTTE, NQ, and SQuAD for experiments
  • NQ and SQuAD were part of Flan-T5’s pretraining datasets
  • Wikipedia passages used in NQ and SQuAD were part of DeBERTaV3 and GPT-3’s pretraining datasets

Multi-reranker domain adaptation

  • UDAPDR accuracy is compared to two baselines in Table 1
  • Baseline 1 is a Zero-shot ColBERTv2 retriever with no distillation
  • Baseline 2 is a Zero-shot ColBERTv2 retriever paired with a single non-distilled passage reranker, trained on 100K synthetic queries
  • UDAPDR is far superior to Zero-shot ColBERTv2 across all domains
  • Two settings of UDAPDR are competitive with or superior to Baseline 2

Query latency

  • UDAPDR is highly effective
  • Table 1 does not take query latency into account
  • Table 2 reports latency evaluations
  • Zero-shot ColBERTv2 has low retrieval latency
  • Zero-shot ColBERTv2 has state-of-the-art accuracy
  • UDAPDR has the best accuracy and same latency as Zero-shot ColBERTv2
  • Zero-shot ColBERTv2 + Reranker models come close, but with higher latency

Impact of pretrained components

  • UDAPDR uses 3 pretrained components: GPT-3, Flan-T5 XXL, and DeBERTaV3-Large
  • Variants of UDAPDR were explored and results are summarized in Table 4
  • Primary setting of UDAPDR performs best
  • Very competitive performance can be obtained without GPT-3
  • Flan-T5 XL can be used instead of Flan-T5 XXL
  • DeBERTaV3-Base is still effective, but results in a 4.1 point drop in Success@5 compared to DeBERTaV3-Large

Different prompting strategies

  • Tested if simpler few-shot prompting strategy is better than corpus-adapted prompting approach for domain adaptation
  • Compared InPars few-shot prompt to corpus-adapted prompt approach for synthetic query generation and passage reranker distillation
  • Evaluated using query generation with FLAN XXL and GPT-3
  • Found that multi-reranker, corpus-adapted prompting strategy is more successful for domain adaptation, leading to 3.5 point increase in Success@5 after ColBERTv2 distillation

Additional results

  • Tables 1 and 2 explore a limited range of potential uses for UDAPDR.
  • Increasing the value of Z from 100K does not lead to improvements and can hurt performance.

Discussion & future work

  • Domain adaptation strategy effective for Col-BERTv2 model
  • Explore efficacy of strategy with larger encoders
  • Distillation strategies for shrinking reranker
  • Systematic approach for generating initial prompts

Conclusion

  • UDAPDR is a novel strategy for adapting retrieval models to new domains
  • UDAPDR uses synthetic queries created with generative models to train passage rerankers
  • ColBERTv2 is used to boost retrieval accuracy while keeping query latency competitive
  • UDAPDR is validated across LoTTE, NQ, and SQuAD datasets
  • UDAPDR can boost zeroshot retrieval accuracy without labeled training examples
  • UDAPDR uses GPU-based hardware and English passages