Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • InPars introduced a method to use LLMs in information retrieval tasks
  • InPars-v2 uses open-source LLMs and existing rerankers to generate synthetic query-document pairs
  • BM25 retrieval pipeline and monoT5 reranker finetuned on InPars-v2 data achieves new state-of-the-art results on BEIR benchmark
  • Code, synthetic data, and finetuned models open sourced

Paper Content

Introduction and background

  • Data augmentation is a tool to improve AI models when there is not enough in-domain training data
  • Previous work used LLMs to generate synthetic training data for information retrieval models
  • Bonifacio et al. proposed InPars to generate queries from documents in the corpus using LLMs
  • Promptagator model uses dataset-specific prompts, a larger LLM and a fully trainable retrieval pipeline
  • This work extends Bonifacio et al. by using a reranker to select the best synthetically generated examples
  • Open-source query generator is used and source code and data is provided to reproduce results on TPUs

Methodology

  • Used GPT-J with 6B parameters to generate synthetic queries
  • Sampled 100k documents from BEIR benchmark corpus
  • Greedy decoding and “gbq” prompt template from InPars-v1
  • Filtering step to select query-document pairs
  • Used monoT5-3B to estimate relevancy score for each query-document pair
  • Randomly sampled one document from top 1000 retrieved by BM25 for negatives
  • Finetuned monoT5-3B on MS MARCO and synthetic data
  • Evaluated using Pyserini’s flat indexes and BM25
  • Finetuning on each synthetic dataset takes 10 minutes on TPU v3-8

Results

  • BM25, monoT5-3B finetuned on MS MARCO, monoT5-3B finetuned on MS MARCO and further finetuned on InPars-v1, and monoT5-3B finetuned on MS MARCO and then finetuned on InPars-v2 data are compared.
  • Results show that InPars-v2 is substantially better than InPars-v1 on TREC-News, Climate-FEVER, Robust and Touche.
  • Results are better than Promptagator and RankT5 on average of all BEIR datasets.

Conclusion

  • Improved version of InPars (InPars-v2) uses language model to generate queries
  • Query-document pair selection process is better
  • Results show effectiveness on par with state of the art on BEIR
  • Synthetic data and finetuned models released publicly