Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Information retrieval tasks require large labeled datasets for fine-tuning.
Large language models can be used to generate large numbers of synthetic queries cheaply.
Reranker models are used to fine-tune the synthetic queries.
Boosts zero-shot accuracy in long-tail domains.
Lower latency than standard reranking methods.

Paper Content

Introduction

Neural IR has led to performance improvements on document and passage retrieval tasks
Neural retrievers benefit from fine-tuning on large labeled datasets
IR models can experience significant drops in accuracy due to distribution shifts
UDAPDR is an efficient strategy for using LLMs to facilitate unsupervised domain adaptation of neural retriever models
UDAPDR leads to large gains in zero-shot settings on a diverse range of domains
UDAPDR uses a powerful and expensive LLM to create an initial set of synthetic queries
These queries are used to train separate rerankers, which are distilled into a single Col-BERTv2 retriever
UDAPDR only requires 1000s of synthetic queries to prove effective
Code and synthetic datasets for UDAPDR will be publicly available

Data augmentation for neural ir

Generated datasets support domain adaptation in Transformer-based architectures
LLMs used to improve IR accuracy in new domains via synthetic datasets
Domain shift is the most pressing challenge for effective domain transfer
Different types of domain shifts can be addressed with synthetic data and indexing strategies

Pretraining objectives for ir

Pretraining objectives can help neural IR systems adapt to new domains without annotations
MLM and ICT are unsupervised approaches for helping retrieval models adapt to new domains
BFS and WLP are unsupervised pretraining tasks that use sampled in-domain sentences and passages
NVSM is an unsupervised pretraining task for news article retrieval
Contrastive learning objective for unsupervised training of dense retrievers
ICT paired with synthetic query data for domain adaptation
Contrastive learning objective paired with unsupervised Promptagator strategy
Unsupervised domain adaptation approach does not require any further pretraining

Methodology

UDAPDR strategy requires access to in-domain passages, but not queries or labels
Goal is to generate large numbers of synthetic queries for passages
Stage 1: X in-domain passages sampled from target domain, 5X synthetic queries generated using GPT-3 and 5 prompting strategies
Stage 2: Y corpus-adapted prompts created, varying according to demonstrations
Stage 3: Z queries generated with Flan-T5 XXL, quality filter applied
Stage 4: Y rerankers trained from scratch, N best rerankers selected
Stage 5: Multi-teacher distillation process used to create single ColBERTv2 retriever
Stage 6: Domain-adapted ColBERTv2 retriever tested on evaluation set for target domain

Experiments

Models

Leveraged Demonstrate-Search-Predict (DSP) codebase for experiments
Used DeBERTaV3-Large as crossencoder after comparison experiments
Used ColBERTv2 retriever for IR system

Datasets

Used LoTTE, NQ, and SQuAD for experiments
NQ and SQuAD were part of Flan-T5’s pretraining datasets
Wikipedia passages used in NQ and SQuAD were part of DeBERTaV3 and GPT-3’s pretraining datasets

Multi-reranker domain adaptation

UDAPDR accuracy is compared to two baselines in Table 1
Baseline 1 is a Zero-shot ColBERTv2 retriever with no distillation
Baseline 2 is a Zero-shot ColBERTv2 retriever paired with a single non-distilled passage reranker, trained on 100K synthetic queries
UDAPDR is far superior to Zero-shot ColBERTv2 across all domains
Two settings of UDAPDR are competitive with or superior to Baseline 2

Query latency

UDAPDR is highly effective
Table 1 does not take query latency into account
Table 2 reports latency evaluations
Zero-shot ColBERTv2 has low retrieval latency
Zero-shot ColBERTv2 has state-of-the-art accuracy
UDAPDR has the best accuracy and same latency as Zero-shot ColBERTv2
Zero-shot ColBERTv2 + Reranker models come close, but with higher latency

Impact of pretrained components

UDAPDR uses 3 pretrained components: GPT-3, Flan-T5 XXL, and DeBERTaV3-Large
Variants of UDAPDR were explored and results are summarized in Table 4
Primary setting of UDAPDR performs best
Very competitive performance can be obtained without GPT-3
Flan-T5 XL can be used instead of Flan-T5 XXL
DeBERTaV3-Base is still effective, but results in a 4.1 point drop in Success@5 compared to DeBERTaV3-Large

Different prompting strategies

Tested if simpler few-shot prompting strategy is better than corpus-adapted prompting approach for domain adaptation
Compared InPars few-shot prompt to corpus-adapted prompt approach for synthetic query generation and passage reranker distillation
Evaluated using query generation with FLAN XXL and GPT-3
Found that multi-reranker, corpus-adapted prompting strategy is more successful for domain adaptation, leading to 3.5 point increase in Success@5 after ColBERTv2 distillation

Additional results

Tables 1 and 2 explore a limited range of potential uses for UDAPDR.
Increasing the value of Z from 100K does not lead to improvements and can hurt performance.

Discussion & future work

Domain adaptation strategy effective for Col-BERTv2 model
Explore efficacy of strategy with larger encoders
Distillation strategies for shrinking reranker
Systematic approach for generating initial prompts

Conclusion

UDAPDR is a novel strategy for adapting retrieval models to new domains
UDAPDR uses synthetic queries created with generative models to train passage rerankers
ColBERTv2 is used to boost retrieval accuracy while keeping query latency competitive
UDAPDR is validated across LoTTE, NQ, and SQuAD datasets
UDAPDR can boost zeroshot retrieval accuracy without labeled training examples
UDAPDR uses GPU-based hardware and English passages

Link to paper#

Abstract#

Paper Content#

Introduction#

Data augmentation for neural ir#

Pretraining objectives for ir#

Methodology#

Experiments#

Models#

Datasets#

Multi-reranker domain adaptation#

Query latency#

Impact of pretrained components#

Different prompting strategies#

Additional results#

Discussion & future work#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Data augmentation for neural ir

Pretraining objectives for ir

Methodology

Experiments

Models

Datasets

Multi-reranker domain adaptation

Query latency

Impact of pretrained components

Different prompting strategies

Additional results

Discussion & future work

Conclusion