Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- In-context Learning (ICL) is a new paradigm for large language model (LLM) evaluation
- ICL adapts pre-trained models to unseen tasks without parameter updates
- OpenICL is an open-source toolkit for ICL and LLM evaluation
- OpenICL is research-friendly and provides various state-of-the-art retrieval and inference methods
- OpenICL has been validated on a wide range of NLP tasks
Paper Content
Introduction
- Rise of large language models (LLMs) has shown impressive emergent In-Context Learning (ICL) ability
- ICL can perform inference with model parameters frozen
- ICL yields comparable results to fine-tuned models in specific tasks
- OpenICL is an easy-to-use and extensible ICL framework for zero-/few-shot evaluation of language models
- OpenICL provides a wide range of ICL methods, LLMs, and tasks
Related work
- In-context Learning is a new paradigm that uses pre-trained language models to perform new tasks without any gradient-based training
- Chain-of-thoughts is a method that surpasses the previous state-of-the-art methods on many reasoning tasks
- ICL has been criticized for being sensitive to the choice and ordering of in-context examples
- Prompt Learning is a special case of ICL without any in-context examples
Openicl
- OpenICL has two design principles.
- OpenICL has two major components: Retriever and Inferencer.
Design principles
- Modularity: OpenICL should be designed to be easily modified and combined.
- Efficiency: OpenICL should be optimized for efficient parallel inference.
- Generality: OpenICL should have a flexible interface to work with various LLMs, tasks, retrieval methods, and inference approaches.
Architecture overview
- Retriever retrieves (x, y) pairs from index set (X, Y)
- Formatted according to user-defined prompt template and concatenated to form text sequence
- Inferencer digests sequences and feeds them into LLMs to obtain model prediction Ŷ
Modularity
- OpenICL adopts a loosely-coupled design between components
- Random selection of examples is popular when there are few demonstrations available
- Heuristic methods use semantic similarity to select examples
- Model-based methods use models’ confidence to select examples
- Inferencer invokes pretrained language model to generate predictions
- Direct, Perplexity and Channel methods are supported by Inferencer
- Inferencer can be used for multi-stage ICL methods
- Inferencer can be augmented with a scorer
Efficiency
- OpenICL implements data parallelism to improve the performance of both the retrieval and inference steps.
- OpenICL supports model parallelism to enable efficient inference for large-scale models.
Generality
- OpenICL supports decoder-only and encoder-decoder-based language models.
- OpenICL provides access to models via checkpoints or API.
- OpenICL supports classification and generation tasks.
- OpenICL provides broad support for retrieval and inference methods.
Toolkit walkthrough
- OpenICL can be used to develop a language classification pipeline with a few lines of code.
- The pipeline begins with a DatasetReader which loads the dataset from HuggingFace Dataset Hub or a local file path.
- A PromptTemplate is instantiated with a dictionary that defines the prompts for each class label.
Evaluation
- Conducted experiments on diverse datasets, LLMs, and ICL methods
- Tested GPT-Neo, GPT-3, and XGLM
- Results shown in Figure 5
Conclusion
- OpenICL is an open-source toolkit for In-context learning.
- OpenICL supports a wide range of LLMs, tasks, and ICL methods.
- OpenICL is highly extensible and will be updated to keep pace with the latest research.