Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Proposed a new paradigm for zero-shot learners that is format agnostic
Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without additional training
Converted zero-shot learning into multiple-choice tasks
Added generalization ability to models and reduced number of parameters
Achieved state-of-the-art performance on several benchmarks
Model has 235M parameters, substantially smaller than state-of-the-art models

Remarkable advances in large-scale language models have improved a variety of tasks
Zero-Shot Learning (ZSL) aims to predict labels on datasets from novel domains
Most solutions use the prompt tuning framework
Existing frameworks have a large number of parameters and require manual processing
Proposed Unified Multiple Choice model (UniMC) has advantages of parameter updating and deployment
Option-mask tokens are used to predict “yes” or “no” before each option
Option MLM and Option Prediction methods are used to output desired options
Performance of UniMC outperforms state-of-the-art baselines with a smaller model size

Label semantic is an important information source for few-shot tasks.
L-TapNet framework integrates label information with manually designed prompts.
LSAP introduces label semantics into pre-training and fine-tuning phases of PLMs.

GPT-3 has impressive performance on few-shot tasks, but limited competence on zero-shot tasks.
Recent efforts try to mitigate this issue by designing instruction templates, collecting prompt templates, and applying supervised datasets.
These methods require significant laborious efforts and computational resources.
UniMC is light-weighted and requires few manual input text transformations, making it suitable for more general scenarios.

Backbones are pre-trained models that capture commonsense knowledge
Two-stage tuning paradigm is used to train models with MC tasks
O-MLM and OP methods are used to predict the answer in unseen zero-shot datasets
OP computes the most confident option with the OP
MC training stage and zero-shot stage have consistent processing objectives

UniMC achieves best performance in all NLI datasets with few parameters
Bi-directional structure in UniMC strengthens its ability to capture information
UniMC outperforms previous SOTA models in zeroshot text classification
UniMC has built-in advantage in dealing with multiple classes
UniMC achieves better performance than FLAN in NLI task
UniMC shows better performance on datasets that are closer to understanding style
MC training improves UniMC zero-shot performance
Question prompts are necessary for UniMC
Option prompts are robust to UniMC
Large model size results in better performance
UniMC enhances generalization ability of NLP
Ethical influence of NLP should be discussed