Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

LLMs can adapt to target tasks during inference
LLMs have emergent capabilities, including the ability to generalize to unseen tasks by following instructions
Instruction learning methods have been proposed to improve this ability
In-Context Instruction Learning (ICIL) involves learning to follow instructions during inference
ICIL uses a prompt that consists of multiple cross-task demonstrations
ICIL is a zero-shot learning method
ICIL significantly enhances the zero-shot task generalization performance of various pretrained LLMs
ICIL improves the zero-shot instruction-following ability of LLMs
LLMs learn the correspondence between the answer choice included in the instruction and output of each demonstration during inference

ICIL consists of cross-task demonstrations
Demonstrations are a concatenation of instruction, input, and output instance
Fixed demonstration set is constructed to evaluate various tasks in a zero-shot manner
Advantages of applying ICIL during inference of LLMs mentioned

Filter tasks using heuristics
Sample K tasks from N tasks
Heuristics include task type, answer choice overlap, demonstration length, and demonstration ordering

ICIL uses a single fixed prompt to adapt to different tasks
ICIL improves zero-shot task generalization performance for various LLMs
ICIL also assists LLMs for zero-shot generalization after instruction tuning or RLHF
Model-generated demonstration set is effective for ICIL

Constructed demonstrations for ICIL from English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark
Used held-out tasks from SUPERNI for testing, consisting of 119 tasks across 12 different categories
Selected SUPERNI as evaluation benchmark because it offers diverse set of tasks with varying levels of complexity
Evaluated 4 LLMs with various model sizes, including GPT-3, OPT, GPT-NeoX, and GPT-J

ICIL significantly improves the zero-shot task generalization performance of both pretrained and instruction-fine-tuned LLMs
Constructing the demonstration set with classification tasks is important for ICIL
LLMs learn the correspondence between answer choice in the instruction and the label of the demonstrations during ICIL
ICIL reinforces the correspondence between the instruction and the label of the demonstrations during inference
ICIL does not require any backpropagation and uses the pretrained model checkpoint without any gradient update
Increasing the number of demonstrations improves the performance
Ordering the demonstrations by the number of answer choices reduces the variance
Answer choice overlap between demonstrations harms the performance
ICIL is effective for machine-generated demonstrations
Performance of ICIL is comparable to adaptive in-context learning methods
There is still a large gap between ICIL and few-shot in-context learning