Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Information retrieval tasks require large labeled datasets for fine-tuning.
- Large language models can be used to generate large numbers of synthetic queries cheaply.
- Reranker models are used to fine-tune the synthetic queries.
- Boosts zero-shot accuracy in long-tail domains.
- Lower latency than standard reranking methods.
Paper Content
Introduction
- Neural IR has led to performance improvements on document and passage retrieval tasks
- Neural retrievers benefit from fine-tuning on large labeled datasets
- IR models can experience significant drops in accuracy due to distribution shifts
- UDAPDR is an efficient strategy for using LLMs to facilitate unsupervised domain adaptation of neural retriever models
- UDAPDR leads to large gains in zero-shot settings on a diverse range of domains
- UDAPDR uses a powerful and expensive LLM to create an initial set of synthetic queries
- These queries are used to train separate rerankers, which are distilled into a single Col-BERTv2 retriever
- UDAPDR only requires 1000s of synthetic queries to prove effective
- Code and synthetic datasets for UDAPDR will be publicly available
Data augmentation for neural ir
- Generated datasets support domain adaptation in Transformer-based architectures
- LLMs used to improve IR accuracy in new domains via synthetic datasets
- Domain shift is the most pressing challenge for effective domain transfer
- Different types of domain shifts can be addressed with synthetic data and indexing strategies
Pretraining objectives for ir
- Pretraining objectives can help neural IR systems adapt to new domains without annotations
- MLM and ICT are unsupervised approaches for helping retrieval models adapt to new domains
- BFS and WLP are unsupervised pretraining tasks that use sampled in-domain sentences and passages
- NVSM is an unsupervised pretraining task for news article retrieval
- Contrastive learning objective for unsupervised training of dense retrievers
- ICT paired with synthetic query data for domain adaptation
- Contrastive learning objective paired with unsupervised Promptagator strategy
- Unsupervised domain adaptation approach does not require any further pretraining
Methodology
- UDAPDR strategy requires access to in-domain passages, but not queries or labels
- Goal is to generate large numbers of synthetic queries for passages
- Stage 1: X in-domain passages sampled from target domain, 5X synthetic queries generated using GPT-3 and 5 prompting strategies
- Stage 2: Y corpus-adapted prompts created, varying according to demonstrations
- Stage 3: Z queries generated with Flan-T5 XXL, quality filter applied
- Stage 4: Y rerankers trained from scratch, N best rerankers selected
- Stage 5: Multi-teacher distillation process used to create single ColBERTv2 retriever
- Stage 6: Domain-adapted ColBERTv2 retriever tested on evaluation set for target domain
Experiments
Models
- Leveraged Demonstrate-Search-Predict (DSP) codebase for experiments
- Used DeBERTaV3-Large as crossencoder after comparison experiments
- Used ColBERTv2 retriever for IR system
Datasets
- Used LoTTE, NQ, and SQuAD for experiments
- NQ and SQuAD were part of Flan-T5’s pretraining datasets
- Wikipedia passages used in NQ and SQuAD were part of DeBERTaV3 and GPT-3’s pretraining datasets
Multi-reranker domain adaptation
- UDAPDR accuracy is compared to two baselines in Table 1
- Baseline 1 is a Zero-shot ColBERTv2 retriever with no distillation
- Baseline 2 is a Zero-shot ColBERTv2 retriever paired with a single non-distilled passage reranker, trained on 100K synthetic queries
- UDAPDR is far superior to Zero-shot ColBERTv2 across all domains
- Two settings of UDAPDR are competitive with or superior to Baseline 2
Query latency
- UDAPDR is highly effective
- Table 1 does not take query latency into account
- Table 2 reports latency evaluations
- Zero-shot ColBERTv2 has low retrieval latency
- Zero-shot ColBERTv2 has state-of-the-art accuracy
- UDAPDR has the best accuracy and same latency as Zero-shot ColBERTv2
- Zero-shot ColBERTv2 + Reranker models come close, but with higher latency
Impact of pretrained components
- UDAPDR uses 3 pretrained components: GPT-3, Flan-T5 XXL, and DeBERTaV3-Large
- Variants of UDAPDR were explored and results are summarized in Table 4
- Primary setting of UDAPDR performs best
- Very competitive performance can be obtained without GPT-3
- Flan-T5 XL can be used instead of Flan-T5 XXL
- DeBERTaV3-Base is still effective, but results in a 4.1 point drop in Success@5 compared to DeBERTaV3-Large
Different prompting strategies
- Tested if simpler few-shot prompting strategy is better than corpus-adapted prompting approach for domain adaptation
- Compared InPars few-shot prompt to corpus-adapted prompt approach for synthetic query generation and passage reranker distillation
- Evaluated using query generation with FLAN XXL and GPT-3
- Found that multi-reranker, corpus-adapted prompting strategy is more successful for domain adaptation, leading to 3.5 point increase in Success@5 after ColBERTv2 distillation
Additional results
- Tables 1 and 2 explore a limited range of potential uses for UDAPDR.
- Increasing the value of Z from 100K does not lead to improvements and can hurt performance.
Discussion & future work
- Domain adaptation strategy effective for Col-BERTv2 model
- Explore efficacy of strategy with larger encoders
- Distillation strategies for shrinking reranker
- Systematic approach for generating initial prompts
Conclusion
- UDAPDR is a novel strategy for adapting retrieval models to new domains
- UDAPDR uses synthetic queries created with generative models to train passage rerankers
- ColBERTv2 is used to boost retrieval accuracy while keeping query latency competitive
- UDAPDR is validated across LoTTE, NQ, and SQuAD datasets
- UDAPDR can boost zeroshot retrieval accuracy without labeled training examples
- UDAPDR uses GPU-based hardware and English passages