Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Task of text generation with constraints specified in natural language
Created benchmark Cognac with knowledge-intensive constraints from databases
State-of-the-art language models fail on this task
Proposed solution CognacGen to leverage language model’s internal knowledge
Three forms of guidance and prefix-tuning approaches to distill guidance
Empirical evaluations demonstrate CognacGen can generalize to unseen instructions and outperform baselines

Paper Content

Introduction

Language models are becoming increasingly good at generating text
A key question is how to control them to produce what is required while preventing unwanted generations
This is especially important for reducing issues of toxicity, bias, and misinformation
Prior work has used special control codes or classifiers to modify the LM’s probability distribution
This paper considers the problem of controlling generation in LMs with constraints specified in natural language
A new benchmark called COGNAC is created with two datasets based on WordNet and Wikidata
State-of-the-art LMs fail to follow simple language constraints
A new language model generation method called COGNAC-GEN is developed to follow linguistic guidance without retraining
COGNAC-GEN outperforms prior methods and other strong baselines by a significant margin
It is able to improve generation even with imperfect guidance and can generalize to unseen instructions

Task setup

Problem of conditional text generation with topics and constraints provided in natural language
Input includes a topic, example generations, and a constraint
Goal is to train LMs to generate fluent on-topic content while respecting the constraint
Model has to understand the topic and constraint specified in natural language
Topics and constraints are knowledge-intensive
Model has to respect both the topic and the constraint simultaneously

Dataset collection

Two new datasets based on WordNet and Wikidata created for COGNAC benchmark
WordNet dataset constructed using five root nodes and their leaf nodes as topics and constraints
Wikidata dataset constructed using five properties and their values as topics and constraints
35 unique templates created to reflect diverse nature of instructions
107,555 and 18,900 unique instructions created for WordNet and Wikidata respectively

Evaluation metrics

Evaluated different generation methods of LMs using metrics that test for correctness and fluency
Main metric is whether the generation conforms to the constraint while staying on topic
Reported on-topic score, constraint violation score, Copy-BLEU score, repetition, and perplexity

Overview

COGNAC model will benefit from querying itself
Conditional probability is explicitly factorized
Probability conditioned on demonstrations
Probability evaluated if task is performed
Probability evaluated if constraint is conformed
Generation model can be modeled with pre-trained LM

Guided generation

COGNACGEN updates the next token prediction probability from the generation model by modifying the logits using guidance.
Greedy decoding is used to generate from the modified probability.

Guidance model

Construct a guidance model to modify the guidance probabilities of a language model
Three variations of guidance model: binary verifier, top-k token, and textual example
Binary verifier evaluates the probability of a token being a type of a given constraint
Top-k token uses the next token probability distribution from the guidance model
Textual example uses a trie tree based approach to decide what guidance to apply at each step

Tackling diverse natural language instructions

Method assumes topic and constraint are given
Model takes natural language instruction and demonstrations as input
Model needs to infer topic and constraint entities from full input
Model is finetuned using prefix-tuning
Model’s own knowledge is distilled and generalized to unseen instructions

Experimental setup

Evaluations performed under two settings for both datasets in COGNAC
Control code setting: binary verifier, top-k token, and textual example guidance
NL instruction setting: unseen instruction templates
Baselines: fine-tuned model, self-debiasing technique
Model details: GPT-2 XL, top-k token, textual example guidance, GPT-3, InstructGPT

Results

COGNACGEN (textual example) outperforms self-debiasing baseline by 12 points on WordNet and 16 points on Wikidata
Fine-tuned baseline achieves lowest IC scores across both datasets
Textual example guidance performs better than top-k token guidance and binary verifier
COGNACGEN textual example achieves higher performance than GPT-3 (legacy) by 10.0 points on WordNet and 11.6 points on Wikidata
COGNACGEN struggles to avoid violating constraints for ‘Art’ category
Struggles to stay on topic for knowledge-heavy categories
Oracle model has access to knowledge base and achieves IC of 73 compared to COGNACGEN’s 35.8
Ablating away trie-based generation causes drastic performance degradation
Existing approaches limited to high-level concepts and predetermined binary or categorical codes

Conclusion

Introduced a new task for controllable generation in language models with constraints specified in natural language
Developed COGNAC, a new benchmark containing knowledge-based constraints
State-of-the-art language models like GPT-3 fail to conform to the provided instructions
Developed COGNACGEN, a method to use knowledge internal to a language model to guide its own generations
Innovations such as guidance self-distillation using prefix-tuning and a trie-based decoding scheme based on the guidance of textual examples
Generates on-topic text that violates constraints less frequently compared to several baselines
Method requires training only the prefix parameters and can easily be scaled to larger models
Room to improve on COGNAC
Reducing undesirable generations in LMs while promoting desirable text
Benchmark limited by the comprehensiveness of the underlying knowledge bases used
Implicit biases might cause unfairness to the end users of the model

Link to paper#

Abstract#

Paper Content#

Introduction#

Task setup#

Dataset collection#

Evaluation metrics#

Overview#

Guided generation#

Guidance model#

Tackling diverse natural language instructions#

Experimental setup#

Results#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Task setup

Dataset collection

Evaluation metrics

Overview

Guided generation

Guidance model

Tackling diverse natural language instructions

Experimental setup

Results

Conclusion