Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- InPars introduced a method to use LLMs in information retrieval tasks
- InPars-v2 uses open-source LLMs and existing rerankers to generate synthetic query-document pairs
- BM25 retrieval pipeline and monoT5 reranker finetuned on InPars-v2 data achieves new state-of-the-art results on BEIR benchmark
- Code, synthetic data, and finetuned models open sourced
Paper Content
Introduction and background
- Data augmentation is a tool to improve AI models when there is not enough in-domain training data
- Previous work used LLMs to generate synthetic training data for information retrieval models
- Bonifacio et al. proposed InPars to generate queries from documents in the corpus using LLMs
- Promptagator model uses dataset-specific prompts, a larger LLM and a fully trainable retrieval pipeline
- This work extends Bonifacio et al. by using a reranker to select the best synthetically generated examples
- Open-source query generator is used and source code and data is provided to reproduce results on TPUs
Methodology
- Used GPT-J with 6B parameters to generate synthetic queries
- Sampled 100k documents from BEIR benchmark corpus
- Greedy decoding and “gbq” prompt template from InPars-v1
- Filtering step to select query-document pairs
- Used monoT5-3B to estimate relevancy score for each query-document pair
- Randomly sampled one document from top 1000 retrieved by BM25 for negatives
- Finetuned monoT5-3B on MS MARCO and synthetic data
- Evaluated using Pyserini’s flat indexes and BM25
- Finetuning on each synthetic dataset takes 10 minutes on TPU v3-8
Results
- BM25, monoT5-3B finetuned on MS MARCO, monoT5-3B finetuned on MS MARCO and further finetuned on InPars-v1, and monoT5-3B finetuned on MS MARCO and then finetuned on InPars-v2 data are compared.
- Results show that InPars-v2 is substantially better than InPars-v1 on TREC-News, Climate-FEVER, Robust and Touche.
- Results are better than Promptagator and RankT5 on average of all BEIR datasets.
Conclusion
- Improved version of InPars (InPars-v2) uses language model to generate queries
- Query-document pair selection process is better
- Results show effectiveness on par with state of the art on BEIR
- Synthetic data and finetuned models released publicly