Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Large “instruction-tuned” language models can generalize zero-shot to new tasks.
- Human-written instruction data is limited in quantity, diversity, and creativity.
- Self-Instruct is a framework for improving the instruction-following capabilities of pretrained language models.
- Applying Self-Instruct to GPT3 results in a 33% absolute improvement over the original model.
- Human evaluation shows Self-Instruct outperforms existing public instruction datasets by a large margin.
Paper Content
Introduction
- Recent NLP literature has seen a lot of activity in building models that can follow natural language instructions
- Comparisons are made with the text-davinci-001 engine
- Code and data will be available at a specified website
- Two key components are large pre-trained language models and human-written instruction data
- SELF-INSTRUCT is a semi-automated process for instruction-tuning a pretrained LM
- Iterative bootstrapping algorithm is used
- Evaluated with various other models on NLP tasks and novel usage of instruction-following models
- Contributions are SELF-INSTRUCT, effectiveness demonstrated, and a large synthetic dataset released
Related work
- Vanilla language models can be effective at following general language instructions if tuned with annotated “instructional” data
- There is a direct correlation between the size and diversity of the “instructional” data and the generalizability of resulting models
- Human annotated “instructional” data poses a bottleneck for progress toward more generalizable models
- Our work aims to reduce the dependence on human annotators
- Language models have been used for data generation and augmentation
- Knowledge distillation involves the transfer of knowledge from larger models to smaller ones
- SELF-INSTRUCT is a form of knowledge distillation, but with the same source and target
- Specialized methods have been used to bootstrap some inferences using language models
- Recent works generate instructions of a task given a few examples
- SELF-INSTRUCT generates new tasks from scratch, task-agnostically
Method
- Generating tasks with a vanilla pretrained language model
- Tuning language model to follow instructions better
Defining instruction data
- Generate instruction data with tasks defined in natural language
- Each task has one or more input-output instances
- Model expected to produce output given task instruction and instance input
- Instruction and instance input do not have strict boundary in many cases
- Allow instructions that do not require additional input
Automatic instruction data generation
- Pipeline for generating instruction data consists of 4 steps
- 175 tasks written by authors used as seed
- 8 tasks sampled from pool for each step
- Identify if instruction is a classification task
- Generate instances for each instruction independently
- Input-first approach used for non-classification tasks
- Output-first approach used for classification tasks
- Filtering and postprocessing to encourage diversity
Finetuning the lm to follow instructions
- Used instruction data to finetune a language model
- Used multiple templates to encode instruction and instance input together
Diversity
- Identified verb-noun structure in generated instructions
- 26,559 out of 52,445 instructions contain verb-noun structure
- Top 20 most common root verbs and their top 4 direct noun objects account for 14% of entire set
- Generated instructions differ from seed instructions used to prompt generation
Quality
- 200 instructions were randomly sampled and 1 instance per instruction was randomly selected
- An expert annotator labeled each instance as correct or not
- Most of the generated instructions are meaningful
- Generated instances may contain more noise
- Most of the generations are in the correct format or partially correct
Experimental results
- Conducted experiments to measure and compare quality of models under various instruction tuning setups
- Used instruction-generated instruction data to conduct instruction tuning for GPT3 model
- Used various templates to concatenate instruction and input
- Trained model to generate output using OpenAI finetuning API
- Used default hyper-parameters, except prompt loss weight set to 0 and trained model for 2 epochs
Baselines
- Off-the-shelf language models (T5-LM and GPT3) are evaluated as baselines
- InstructGPT (OpenAI) is evaluated to follow human instructions better
- GPT3 is finetuned with T0 and SUPERNI training data
- Results show SELF-INSTRUCT boosts GPT3’s instruction-following ability
- GPT3 SELF-INST nearly matches performance of InstructGPT 001
- SELF-INSTRUCT brings additional gains when combined with SUPERNI training set
Experiment 2: generalization to user-oriented instructions on novel tasks
- Curated a new set of instructions for user-oriented applications
- Instructions are diverse in style and format
- 252 instructions with 1 instance per instruction
- Four-level rating system for categorizing the quality of the models’ outputs
- Human feedback is an optional aspect of instruction-tuning
Broader impact
- Introduce SELF-INSTRUCT, a task-agnostic method to improve instruction-following capabilities of language models
- Performance of SELF-INSTRUCT on par with InstructGPT 001
- SELF-INSTRUCT outperforms existing public instruction datasets by a large margin
- Limitations of SELF-INSTRUCT discussed
- Generated data statistics and data quality review presented
- Evaluation results on unseen tasks from SUPER-NATURALINSTRUCTIONS
- SELF-INSTRUCT boosts GPT3 performance by 33.1% and nearly matches performance of InstructGPT 001