Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Proposed a new paradigm for zero-shot learners that is format agnostic
  • Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without additional training
  • Converted zero-shot learning into multiple-choice tasks
  • Added generalization ability to models and reduced number of parameters
  • Achieved state-of-the-art performance on several benchmarks
  • Model has 235M parameters, substantially smaller than state-of-the-art models

Paper Content

Introduction

  • Remarkable advances in large-scale language models have improved a variety of tasks
  • Zero-Shot Learning (ZSL) aims to predict labels on datasets from novel domains
  • Most solutions use the prompt tuning framework
  • Existing frameworks have a large number of parameters and require manual processing
  • Proposed Unified Multiple Choice model (UniMC) has advantages of parameter updating and deployment
  • Option-mask tokens are used to predict “yes” or “no” before each option
  • Option MLM and Option Prediction methods are used to output desired options
  • Performance of UniMC outperforms state-of-the-art baselines with a smaller model size
  • NLP tasks have diverse formats due to the emergence of datasets
  • Recent research shows the need to unify formats
  • T0 builds an application to map datasets into target templates
  • FLAN groups datasets into 12 task clusters and designs 10 instruction templates
  • Label-based tasks need to be unified, so MC formats are developed

Label information

  • Label semantic is an important information source for few-shot tasks.
  • L-TapNet framework integrates label information with manually designed prompts.
  • LSAP introduces label semantics into pre-training and fine-tuning phases of PLMs.

Zero-shot learning

  • GPT-3 has impressive performance on few-shot tasks, but limited competence on zero-shot tasks.
  • Recent efforts try to mitigate this issue by designing instruction templates, collecting prompt templates, and applying supervised datasets.
  • These methods require significant laborious efforts and computational resources.
  • UniMC is light-weighted and requires few manual input text transformations, making it suitable for more general scenarios.

Approaches

  • Proposed framework is called UniMC
  • Training and inference approaches outlined in detail

Mc tuning

  • Backbones are pre-trained models that capture commonsense knowledge
  • Two-stage tuning paradigm is used to train models with MC tasks
  • O-MLM and OP methods are used to predict the answer in unseen zero-shot datasets
  • OP computes the most confident option with the OP
  • MC training stage and zero-shot stage have consistent processing objectives

Experiments

Experimental setup

  • Collected publicly available label-based NLP datasets
  • Applied accuracy to measure performance
  • Compared method with state-of-the-art baselines
  • Used ALBERT-xxlarge-V2 as backbone model
  • Set maximum length token to 512
  • Ran one epoch in training
  • Set number of samples for each task to 20K
  • Repeated experiment 5 times with different seeds

Main results

  • UniMC achieves best performance in all NLI datasets with few parameters
  • Bi-directional structure in UniMC strengthens its ability to capture information
  • UniMC outperforms previous SOTA models in zeroshot text classification
  • UniMC has built-in advantage in dealing with multiple classes
  • UniMC achieves better performance than FLAN in NLI task
  • UniMC shows better performance on datasets that are closer to understanding style
  • MC training improves UniMC zero-shot performance
  • Question prompts are necessary for UniMC
  • Option prompts are robust to UniMC
  • Large model size results in better performance
  • UniMC enhances generalization ability of NLP
  • Ethical influence of NLP should be discussed