Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Prompt-based learning methods in semi-supervised learning settings have been effective on NLU datasets and tasks.
- Designing multiple prompts and verbalizers requires domain knowledge and human effort, making it difficult to scale.
- Two methods proposed to automatically design multiple prompts and integrate automatic verbalizer without sacrificing performance.
- Best average accuracy of 73.2% obtained with proposed methods.
Paper Content
Introduction
- Pre-training large language models with text corpora and fine-tuning on downstream tasks has shown superior performance
- Discrepancy between pre-training and fine-tuning tasks can lead to unexpected behaviors
- Prompt-tuning transforms NLU tasks into cloze tasks to mimic pre-training objective
- Prompt-based learning predicts tokens at masked position and verbalizer maps them to classes
- Few-shot learning environment works well with prompt-based learning
- Limitation of prompt-based learning is handcrafting work is expensive and not scalable
- Continuous prompt-based learning eliminates need for human intervention
- Two methods: search for discrete prompt tokens or learn numerical prompt embeddings
- Automatic selection of label words, soft verbalizer, and prototypical verbalizer reduce human efforts
- Propose methods to generate various prompts with continuous prompt tokens for SSL settings
- Eliminate human involvement in designing multiple prompts and verbalizers in SSL settings
- Automatic verbalizer with manual prompts can achieve similar performance to manual verbalizers
Methodology
- PET is a semi-supervised learning setting
- PET transforms input sequence to cloze question with single MASK token
- PLM fills in value of MASK token and verbalizers map output tokens to class labels
- Semisupervised framework produces soft labels on unlabeled data
- PET fine-tunes multiple PLMs with different prompts
- This paper uses continuous and automatic prompts and verbalizers, eliminating need for human involvement
Overall pipeline
- Proposed pipeline uses automatic prompts and verbalizers
- Labeler models are trained with labeled dataset in few-shot settings
- Probability of label is calculated for each trained model
- Average of probabilities from each model is taken as ground-truth probability
- Final classifier is finetuned with KL divergence loss
Automatic verbalizers
- Automatic verbalizers eliminate need for human intervention
- Experiments conducted with 3 types of automatic verbalizers
- Prototypical verbalizer performs better than other two in SSL settings
- Prototype vectors for each class learned using contrastive learning
- Probability distribution of MASK token for each class calculated by cosine similarity
Training and inference strategy
- Parameters in the model are randomly initialized
- Parameters in the continuous prompts and PLMs are updated with the loss Lc
- Parameters in the verbalizers are optimized with the losses Lins and Lproto
- Training strategy is to first freeze parameters in the prototypical verbalizer and then train parameters in the reparameterization block and PLM with the cross-entropy loss Lc
- Parameters in the prototypical verbalizers are trained with instance-instance loss Lins and instance-prototype loss Lproto
- Final language model classifier is fine-tuned with Ldiv
- During inference, the final fine-tuned language model F is used to predict on the test dataset
Experiments
- Conducted semi-supervised learning experiments
- Compared to several strong baseline frameworks
- Used NLU benchmarks
Dataset collection
- Experimented with 5 datasets
- Performed multiple experiments in few-shot settings
- Used 1-20 examples per class for datasets, 32 examples for CB and RTE
- Reported average accuracy across 3 runs of each experiment with 3 random seeds
Proposed models
- Replace manual verbalizer with prototypical verbalizer and manual prompts with demonstration examples and continuous prompt tokens
- Introduce diversity by varying number of continuous prompt tokens and use prototypical verbalizer across multiple labeler models
Models for comparison
- Design several strong baseline experiments and perform an ablation study
- Fine-tune RoBERTa-large PLM with training examples in different few-shot settings
- Prototypical Verbalizer PET semi-supervised learning method
- Manual PET semi-supervised learning method
- UDA and MixText data augmentation methods not chosen for comparison
Implementation details
- Used RoBERTa-Large model as PLM
- AdamW as optimizer with learning rate of 1e-5 and weight decay of 0.01
- Reparameterization block contains 2-layer bidirectional LSTM and 2 linear layers with ReLU activation function
- Prototypical verbalizer based on Pytorch2, Huggingface transformer3, and OpenPrompt4 frameworks
- Demo+Soft Tokens PET: each labeler model learns 5 soft tokens with different demonstrations
- Vary Soft Tokens PET: 5 prompts with number of soft tokens ranging from 1 to 5
- Experiments with 3 automatic verbalizers: soft, search, and prototypical
- Prototypical verbalizer performs best on 3 out of 5 datasets
Comparison with manual pet
- Automatic verbalizer can replace manual verbalizer with only a small performance sacrifice
- Automatic prompt design methods can achieve better performance than manual PET method
- Vary Soft PET and Demo+Soft Tokens PET methods achieve better performance than Manual PET method
- Randomly sampled demonstration examples can result in high-variance performance
Ablation study
- Semi-supervised learning methods perform better than supervised learning methods
- Traditional fine-tuning methods perform the worst
- Demo+Soft in SL method performs better than fine-tuning method
- SSL prompting models perform better than supervised learning methods
Automatic prompts and verbalizers
- Shin et al. (2020a) used a gradient-guided search to find tokens for prompts
- Li and Liang (2021) and Lester et al. (2021) attached prefix vectors and kept LM model parameters frozen
- Liu et al. (2021c) proposed P-tuning to replace input embeddings with differentiable output embeddings
- Vu et al. (2021) proposed to learn soft prompt embeddings and transfer them to target task
- Several automatic verbalizers have been proposed to automate design of verbalizer mapping function
Conclusions
- Automatic prompts and verbalizers can be used in semi-supervised learning settings
- Performance is similar or better than SoTA Manual PET method
- Methods are scalable with multiple tasks and datasets
- Semi-supervised learning methods take advantage of large amounts of unlabeled data
- Plan to investigate freezing PLMs’ parameters and tuning verbalizer and prompt parameters
- Plan to combine two proposed methods Demo+Soft PET and Vary Soft PET
- Experiments are only in English language