Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- LLMs enable zero-shot task generalization
- Instruction learning has been approached as a fine-tuning problem
- In-Context Instruction Learning (ICIL) improves zero-shot task generalization
- ICIL uses a single fixed prompt to evaluate all tasks
Paper Content
Introduction
- LLMs can adapt to target tasks during inference
- LLMs have emergent capabilities, including the ability to generalize to unseen tasks by following instructions
- Instruction learning methods have been proposed to improve this ability
- In-Context Instruction Learning (ICIL) involves learning to follow instructions during inference
- ICIL uses a prompt that consists of multiple cross-task demonstrations
- ICIL is a zero-shot learning method
- ICIL significantly enhances the zero-shot task generalization performance of various pretrained LLMs
- ICIL improves the zero-shot instruction-following ability of LLMs
- LLMs learn the correspondence between the answer choice included in the instruction and output of each demonstration during inference
In-context instruction learning
- ICIL consists of cross-task demonstrations
- Demonstrations are a concatenation of instruction, input, and output instance
- Fixed demonstration set is constructed to evaluate various tasks in a zero-shot manner
- Advantages of applying ICIL during inference of LLMs mentioned
Demonstration set construction
- Filter tasks using heuristics
- Sample K tasks from N tasks
- Heuristics include task type, answer choice overlap, demonstration length, and demonstration ordering
In-context instruction learning during inference
- ICIL uses a single fixed prompt to adapt to different tasks
- ICIL improves zero-shot task generalization performance for various LLMs
- ICIL also assists LLMs for zero-shot generalization after instruction tuning or RLHF
- Model-generated demonstration set is effective for ICIL
Experiments
Experimental setup
- Constructed demonstrations for ICIL from English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark
- Used held-out tasks from SUPERNI for testing, consisting of 119 tasks across 12 different categories
- Selected SUPERNI as evaluation benchmark because it offers diverse set of tasks with varying levels of complexity
- Evaluated 4 LLMs with various model sizes, including GPT-3, OPT, GPT-NeoX, and GPT-J
Results
- Pretrained LLMs benefit from ICIL
- ICIL increases performance of pretrained LLMs by over 50%
- ICIL outperforms LLMs with much larger parameters
- ICIL gain is comparable to instruction tuning
- ICIL improves performance of LLMs fine-tuned through instruction tuning or RLHF
- Irrelevant ICIL does not harm performance much
Analysis
- ICIL significantly improves the zero-shot task generalization performance of both pretrained and instruction-fine-tuned LLMs
- Constructing the demonstration set with classification tasks is important for ICIL
- LLMs learn the correspondence between answer choice in the instruction and the label of the demonstrations during ICIL
- ICIL reinforces the correspondence between the instruction and the label of the demonstrations during inference
- ICIL does not require any backpropagation and uses the pretrained model checkpoint without any gradient update
- Increasing the number of demonstrations improves the performance
- Ordering the demonstrations by the number of answer choices reduces the variance
- Answer choice overlap between demonstrations harms the performance
- ICIL is effective for machine-generated demonstrations
- Performance of ICIL is comparable to adaptive in-context learning methods
- There is still a large gap between ICIL and few-shot in-context learning