Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Large “instruction-tuned” language models can generalize zero-shot to new tasks.
Human-written instruction data is limited in quantity, diversity, and creativity.
Self-Instruct is a framework for improving the instruction-following capabilities of pretrained language models.
Applying Self-Instruct to GPT3 results in a 33% absolute improvement over the original model.
Human evaluation shows Self-Instruct outperforms existing public instruction datasets by a large margin.

Paper Content

Introduction

Recent NLP literature has seen a lot of activity in building models that can follow natural language instructions
Comparisons are made with the text-davinci-001 engine
Code and data will be available at a specified website
Two key components are large pre-trained language models and human-written instruction data
SELF-INSTRUCT is a semi-automated process for instruction-tuning a pretrained LM
Iterative bootstrapping algorithm is used
Evaluated with various other models on NLP tasks and novel usage of instruction-following models
Contributions are SELF-INSTRUCT, effectiveness demonstrated, and a large synthetic dataset released

Vanilla language models can be effective at following general language instructions if tuned with annotated “instructional” data
There is a direct correlation between the size and diversity of the “instructional” data and the generalizability of resulting models
Human annotated “instructional” data poses a bottleneck for progress toward more generalizable models
Our work aims to reduce the dependence on human annotators
Language models have been used for data generation and augmentation
Knowledge distillation involves the transfer of knowledge from larger models to smaller ones
SELF-INSTRUCT is a form of knowledge distillation, but with the same source and target
Specialized methods have been used to bootstrap some inferences using language models
Recent works generate instructions of a task given a few examples
SELF-INSTRUCT generates new tasks from scratch, task-agnostically

Method

Generating tasks with a vanilla pretrained language model
Tuning language model to follow instructions better

Defining instruction data

Generate instruction data with tasks defined in natural language
Each task has one or more input-output instances
Model expected to produce output given task instruction and instance input
Instruction and instance input do not have strict boundary in many cases
Allow instructions that do not require additional input

Automatic instruction data generation

Pipeline for generating instruction data consists of 4 steps
175 tasks written by authors used as seed
8 tasks sampled from pool for each step
Identify if instruction is a classification task
Generate instances for each instruction independently
Input-first approach used for non-classification tasks
Output-first approach used for classification tasks
Filtering and postprocessing to encourage diversity

Finetuning the lm to follow instructions

Used instruction data to finetune a language model
Used multiple templates to encode instruction and instance input together

Diversity

Identified verb-noun structure in generated instructions
26,559 out of 52,445 instructions contain verb-noun structure
Top 20 most common root verbs and their top 4 direct noun objects account for 14% of entire set
Generated instructions differ from seed instructions used to prompt generation

Quality

200 instructions were randomly sampled and 1 instance per instruction was randomly selected
An expert annotator labeled each instance as correct or not
Most of the generated instructions are meaningful
Generated instances may contain more noise
Most of the generations are in the correct format or partially correct

Experimental results

Conducted experiments to measure and compare quality of models under various instruction tuning setups
Used instruction-generated instruction data to conduct instruction tuning for GPT3 model
Used various templates to concatenate instruction and input
Trained model to generate output using OpenAI finetuning API
Used default hyper-parameters, except prompt loss weight set to 0 and trained model for 2 epochs

Baselines

Off-the-shelf language models (T5-LM and GPT3) are evaluated as baselines
InstructGPT (OpenAI) is evaluated to follow human instructions better
GPT3 is finetuned with T0 and SUPERNI training data
Results show SELF-INSTRUCT boosts GPT3’s instruction-following ability
GPT3 SELF-INST nearly matches performance of InstructGPT 001
SELF-INSTRUCT brings additional gains when combined with SUPERNI training set

Experiment 2: generalization to user-oriented instructions on novel tasks

Curated a new set of instructions for user-oriented applications
Instructions are diverse in style and format
252 instructions with 1 instance per instruction
Four-level rating system for categorizing the quality of the models’ outputs
Human feedback is an optional aspect of instruction-tuning

Broader impact

Introduce SELF-INSTRUCT, a task-agnostic method to improve instruction-following capabilities of language models
Performance of SELF-INSTRUCT on par with InstructGPT 001
SELF-INSTRUCT outperforms existing public instruction datasets by a large margin
Limitations of SELF-INSTRUCT discussed
Generated data statistics and data quality review presented
Evaluation results on unseen tasks from SUPER-NATURALINSTRUCTIONS
SELF-INSTRUCT boosts GPT3 performance by 33.1% and nearly matches performance of InstructGPT 001

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Method#

Defining instruction data#

Automatic instruction data generation#

Finetuning the lm to follow instructions#

Diversity#

Quality#

Experimental results#

Baselines#

Experiment 2: generalization to user-oriented instructions on novel tasks#

Broader impact#

Link to paper

Abstract

Paper Content

Introduction

Related work

Method

Defining instruction data

Automatic instruction data generation

Finetuning the lm to follow instructions

Diversity

Quality

Experimental results

Baselines

Experiment 2: generalization to user-oriented instructions on novel tasks

Broader impact