Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposed a new paradigm for zero-shot learners that is format agnostic
- Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without additional training
- Converted zero-shot learning into multiple-choice tasks
- Added generalization ability to models and reduced number of parameters
- Achieved state-of-the-art performance on several benchmarks
- Model has 235M parameters, substantially smaller than state-of-the-art models
Paper Content
Introduction
- Remarkable advances in large-scale language models have improved a variety of tasks
- Zero-Shot Learning (ZSL) aims to predict labels on datasets from novel domains
- Most solutions use the prompt tuning framework
- Existing frameworks have a large number of parameters and require manual processing
- Proposed Unified Multiple Choice model (UniMC) has advantages of parameter updating and deployment
- Option-mask tokens are used to predict “yes” or “no” before each option
- Option MLM and Option Prediction methods are used to output desired options
- Performance of UniMC outperforms state-of-the-art baselines with a smaller model size
Related work
- NLP tasks have diverse formats due to the emergence of datasets
- Recent research shows the need to unify formats
- T0 builds an application to map datasets into target templates
- FLAN groups datasets into 12 task clusters and designs 10 instruction templates
- Label-based tasks need to be unified, so MC formats are developed
Label information
- Label semantic is an important information source for few-shot tasks.
- L-TapNet framework integrates label information with manually designed prompts.
- LSAP introduces label semantics into pre-training and fine-tuning phases of PLMs.
Zero-shot learning
- GPT-3 has impressive performance on few-shot tasks, but limited competence on zero-shot tasks.
- Recent efforts try to mitigate this issue by designing instruction templates, collecting prompt templates, and applying supervised datasets.
- These methods require significant laborious efforts and computational resources.
- UniMC is light-weighted and requires few manual input text transformations, making it suitable for more general scenarios.
Approaches
- Proposed framework is called UniMC
- Training and inference approaches outlined in detail
Mc tuning
- Backbones are pre-trained models that capture commonsense knowledge
- Two-stage tuning paradigm is used to train models with MC tasks
- O-MLM and OP methods are used to predict the answer in unseen zero-shot datasets
- OP computes the most confident option with the OP
- MC training stage and zero-shot stage have consistent processing objectives
Experiments
Experimental setup
- Collected publicly available label-based NLP datasets
- Applied accuracy to measure performance
- Compared method with state-of-the-art baselines
- Used ALBERT-xxlarge-V2 as backbone model
- Set maximum length token to 512
- Ran one epoch in training
- Set number of samples for each task to 20K
- Repeated experiment 5 times with different seeds
Main results
- UniMC achieves best performance in all NLI datasets with few parameters
- Bi-directional structure in UniMC strengthens its ability to capture information
- UniMC outperforms previous SOTA models in zeroshot text classification
- UniMC has built-in advantage in dealing with multiple classes
- UniMC achieves better performance than FLAN in NLI task
- UniMC shows better performance on datasets that are closer to understanding style
- MC training improves UniMC zero-shot performance
- Question prompts are necessary for UniMC
- Option prompts are robust to UniMC
- Large model size results in better performance
- UniMC enhances generalization ability of NLP
- Ethical influence of NLP should be discussed