Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Prompt-based learning methods in semi-supervised learning settings have been effective on NLU datasets and tasks.
Designing multiple prompts and verbalizers requires domain knowledge and human effort, making it difficult to scale.
Two methods proposed to automatically design multiple prompts and integrate automatic verbalizer without sacrificing performance.
Best average accuracy of 73.2% obtained with proposed methods.

Paper Content

Introduction

Pre-training large language models with text corpora and fine-tuning on downstream tasks has shown superior performance
Discrepancy between pre-training and fine-tuning tasks can lead to unexpected behaviors
Prompt-tuning transforms NLU tasks into cloze tasks to mimic pre-training objective
Prompt-based learning predicts tokens at masked position and verbalizer maps them to classes
Few-shot learning environment works well with prompt-based learning
Limitation of prompt-based learning is handcrafting work is expensive and not scalable
Continuous prompt-based learning eliminates need for human intervention
Two methods: search for discrete prompt tokens or learn numerical prompt embeddings
Automatic selection of label words, soft verbalizer, and prototypical verbalizer reduce human efforts
Propose methods to generate various prompts with continuous prompt tokens for SSL settings
Eliminate human involvement in designing multiple prompts and verbalizers in SSL settings
Automatic verbalizer with manual prompts can achieve similar performance to manual verbalizers

Methodology

PET is a semi-supervised learning setting
PET transforms input sequence to cloze question with single MASK token
PLM fills in value of MASK token and verbalizers map output tokens to class labels
Semisupervised framework produces soft labels on unlabeled data
PET fine-tunes multiple PLMs with different prompts
This paper uses continuous and automatic prompts and verbalizers, eliminating need for human involvement

Overall pipeline

Proposed pipeline uses automatic prompts and verbalizers
Labeler models are trained with labeled dataset in few-shot settings
Probability of label is calculated for each trained model
Average of probabilities from each model is taken as ground-truth probability
Final classifier is finetuned with KL divergence loss

Automatic verbalizers

Automatic verbalizers eliminate need for human intervention
Experiments conducted with 3 types of automatic verbalizers
Prototypical verbalizer performs better than other two in SSL settings
Prototype vectors for each class learned using contrastive learning
Probability distribution of MASK token for each class calculated by cosine similarity

Training and inference strategy

Parameters in the model are randomly initialized
Parameters in the continuous prompts and PLMs are updated with the loss Lc
Parameters in the verbalizers are optimized with the losses Lins and Lproto
Training strategy is to first freeze parameters in the prototypical verbalizer and then train parameters in the reparameterization block and PLM with the cross-entropy loss Lc
Parameters in the prototypical verbalizers are trained with instance-instance loss Lins and instance-prototype loss Lproto
Final language model classifier is fine-tuned with Ldiv
During inference, the final fine-tuned language model F is used to predict on the test dataset

Experiments

Conducted semi-supervised learning experiments
Compared to several strong baseline frameworks
Used NLU benchmarks

Dataset collection

Experimented with 5 datasets
Performed multiple experiments in few-shot settings
Used 1-20 examples per class for datasets, 32 examples for CB and RTE
Reported average accuracy across 3 runs of each experiment with 3 random seeds

Proposed models

Replace manual verbalizer with prototypical verbalizer and manual prompts with demonstration examples and continuous prompt tokens
Introduce diversity by varying number of continuous prompt tokens and use prototypical verbalizer across multiple labeler models

Models for comparison

Design several strong baseline experiments and perform an ablation study
Fine-tune RoBERTa-large PLM with training examples in different few-shot settings
Prototypical Verbalizer PET semi-supervised learning method
Manual PET semi-supervised learning method
UDA and MixText data augmentation methods not chosen for comparison

Implementation details

Used RoBERTa-Large model as PLM
AdamW as optimizer with learning rate of 1e-5 and weight decay of 0.01
Reparameterization block contains 2-layer bidirectional LSTM and 2 linear layers with ReLU activation function
Prototypical verbalizer based on Pytorch2, Huggingface transformer3, and OpenPrompt4 frameworks
Demo+Soft Tokens PET: each labeler model learns 5 soft tokens with different demonstrations
Vary Soft Tokens PET: 5 prompts with number of soft tokens ranging from 1 to 5
Experiments with 3 automatic verbalizers: soft, search, and prototypical
Prototypical verbalizer performs best on 3 out of 5 datasets

Comparison with manual pet

Automatic verbalizer can replace manual verbalizer with only a small performance sacrifice
Automatic prompt design methods can achieve better performance than manual PET method
Vary Soft PET and Demo+Soft Tokens PET methods achieve better performance than Manual PET method
Randomly sampled demonstration examples can result in high-variance performance

Ablation study

Semi-supervised learning methods perform better than supervised learning methods
Traditional fine-tuning methods perform the worst
Demo+Soft in SL method performs better than fine-tuning method
SSL prompting models perform better than supervised learning methods

Automatic prompts and verbalizers

Shin et al. (2020a) used a gradient-guided search to find tokens for prompts
Li and Liang (2021) and Lester et al. (2021) attached prefix vectors and kept LM model parameters frozen
Liu et al. (2021c) proposed P-tuning to replace input embeddings with differentiable output embeddings
Vu et al. (2021) proposed to learn soft prompt embeddings and transfer them to target task
Several automatic verbalizers have been proposed to automate design of verbalizer mapping function

Conclusions

Automatic prompts and verbalizers can be used in semi-supervised learning settings
Performance is similar or better than SoTA Manual PET method
Methods are scalable with multiple tasks and datasets
Semi-supervised learning methods take advantage of large amounts of unlabeled data
Plan to investigate freezing PLMs’ parameters and tuning verbalizer and prompt parameters
Plan to combine two proposed methods Demo+Soft PET and Vary Soft PET
Experiments are only in English language

Link to paper#

Abstract#

Paper Content#

Introduction#

Methodology#

Overall pipeline#

Automatic verbalizers#

Training and inference strategy#

Experiments#

Dataset collection#

Proposed models#

Models for comparison#

Implementation details#

Comparison with manual pet#

Ablation study#

Automatic prompts and verbalizers#

Conclusions#

Link to paper

Abstract

Paper Content

Introduction

Methodology

Overall pipeline

Automatic verbalizers

Training and inference strategy

Experiments

Dataset collection

Proposed models

Models for comparison

Implementation details

Comparison with manual pet

Ablation study

Automatic prompts and verbalizers

Conclusions