Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

LLMs can generate fluent text when the output follows natural language patterns.
LLMs struggle when the output is confined to a limited ontology.
MSP is a parameter-efficient procedure for generating data in a controlled manner.
MSP produces diverse and natural text while preserving label semantics.
MSP achieves state-of-the-art results on three benchmarks.

Paper Content

Introduction

Complex NLU systems require large amounts of labeled data to be useful
Low resource settings are common when expanding a system into a new domain
Domain-adaptive fewshot learning is the task of learning a target domain from limited data
Large language models are effective classifiers in low resource settings
Data augmentation techniques can be used to tackle limited data issues
LLMs can be used as a tool for controlled data generation
MSP is a novel method for combining the Mixture of Soft Prompts to generate diverse, class-conditioned training data
MSP outperforms a model of the same size by up to 30%

Task formulation

User needs to (re)train model when expanding product or adding new feature
Few-shot natural language understanding can take many forms
NLU tasks in real life are complex
Given dataset with n training examples from group of s source domains
Each training example has natural language input and structured output label
Goal is to expand into target domain t with m examples, where m « n

Few-shot direct prediction

Pre-training a large neural network can help with low-resource scenarios.
LLMs have shown good performance in few-shot tasks.

Data-centered alternative

Use data augmentation to produce additional training examples
Combine original seed data with synthesized data to train a downstream model
Benefits of using LLMs as a data augmentation tool: inspectable, flexible, faster inference, transferable across model types

Prompt construction

Soft prompt tuning is a method to leverage the power of LLMs without the onerous computation requirements of training from scratch.
Soft prompts are initialized with the name and description of an attribute.
The full input contains four parts: instruction prefix, soft prompts, meta-data, and exemplars.

Attribute mixing

Model is conditioned on desired attributes during generation
Prior works focus on single attribute constraint
Task contains multiple attributes
Five different methods of composing attributes for data generation
Methods include Concat, Pooling, Attention, Bottleneck, and CNN Mixture

Data denoising

Generate 20% more data and filter to reduce noise
Set keep-rate inversely proportional to how often attributes occur to balance data
Weight examples according to distance from seed example to improve label preservation

Experimental setup

Datasets and tasks

Tested on 3 diverse, multi-attribute natural language understanding datasets
Task 1: Multi-aspect intent detection, measured by F1 score
Task 2: Cross-domain named entity recognition
Generated utterances typically preserve desired semantic attributes and lexical entities
Under-sampling over-represented attributes balances generated data

Baseline methods

FLAN-T5 XXL is the base model used for generating data
GODEL is the smaller downstream LLM used
Data augmentation techniques used include EDA, masked in-filling, BART-large, RTT, CLM, DExperts, and CVAE

Automatic evaluation

Evaluated synthesized data quantitatively with three metrics
Distinct@K measures diversity of text based on unique n-grams
Perplexity measures text fluency with GPT2-large
Correctness checks how well synthetic data preserves attribute labels

Implementation details

Instruction prefix set to 100 tokens
Attribute token length set to 20
Learning rate for teacher model set to 3e-2
Learning rate for student set to 3e-5
Augmentation methods generate 4 new datapoints per seed example

Main results

MSP achieves state-of-the-art results across all three datasets
MSP leverages LLMs for data augmentation
MSP outperforms data synthesis baselines on 8 out of 9 domains
Meta-learning and data augmentation can be combined for better results
All DA and CTG methods outperform naive GODEL baseline
RTT leads to drop in performance for CrossNER
Problems with RTT persist in TOPv2
MSP is able to reliably handle lexical, semantic and structural constraints

Synthesized data quality

LLMs used as intermediate data augmentation tool for few-shot learning
Interpretability, flexibility and modularity of MSP
MSP yields higher quality data than other data augmentation methods
MSP yields better performance in downstream tasks
MSP leverages LLMs to generate data rather than direct predictions
MSP related to techniques that combine multiple prompts
MSP controls generation as a means to an end
MSP uses BLEU score as a proxy for measuring model convergence
Oracle attribute classifier based on DeBERTa-XLarge used for automatic evaluation
Oracle attribute classifier reaches over 90% accuracy
K=2 works best for number of exemplars
Num generations set to 4
Learning rate set to 0.3
Batch size set to 8 and gradient accumulation set to 3 steps
Downstream task learning rate set to 3e-5
Data augmentation promotes diversity and label preservation
Temperature parameter of LLM can be increased to increase diversity
Exemplars can be shuffled or excluded to minimize copying behavior
Novel attribute combinations can be composed

Link to paper#

Abstract#

Paper Content#

Introduction#

Task formulation#

Few-shot direct prediction#

Data-centered alternative#

Prompt construction#

Attribute mixing#

Data denoising#

Experimental setup#

Datasets and tasks#

Baseline methods#

Automatic evaluation#

Implementation details#

Main results#

Synthesized data quality#

Link to paper

Abstract

Paper Content

Introduction

Task formulation

Few-shot direct prediction

Data-centered alternative

Prompt construction

Attribute mixing

Data denoising

Experimental setup

Datasets and tasks

Baseline methods

Automatic evaluation

Implementation details

Main results

Synthesized data quality