Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Pre-trained language models are effective for natural language processing tasks, but not for low-resource domains due to the domain gap.
  • SwitchPrompt is a novel and lightweight prompting methodology to bridge the domain gap.
  • SwitchPrompt uses domain-specific keywords with a trainable gated prompt to offer domain-oriented prompting.
  • Few-shot experiments on three text classification benchmarks demonstrate the efficacy of the general-domain pre-trained language models when used with SwitchPrompt.
  • SwitchPrompt can increase accuracy by up to 10.7%, reducing the need for domain-specific language model pre-training.

Paper Content

Introduction

  • Pre-trained language models (LMs) have been shown to be effective for natural language processing tasks, especially in low-resource settings
  • Most publicly available LMs are trained on general-domain corpora, which can lead to a domain gap when applied to tasks from a special domain
  • Pre-training deep language models requires large amounts of text data, which may not be available in low-resource domains
  • Traditional prompting techniques may not be effective in low-resource settings
  • SwitchPrompt is a novel and lightweight method to effectively retrieve domain-specific knowledge from pre-trained LMs
  • SwitchPrompt outperforms different state-of-the-art prompting methods and reduces domain gaps
  • SwitchPrompt is especially suitable for low-resource settings as it does not require pre-training domain-specific LMs or fine-tuning LMs for the downstream task

Method

  • Introduces SwitchPrompt
  • Example of architecture in which it can be applied
  • Underlying pre-trained language model is fixed

Domain-specific soft prompts

  • Proposed prompts allow model to switch between general-domain and domain-specific prompts
  • Sigmoid-based gating function used to control switching
  • General-domain prompt is a sequence of randomly initialized vectors
  • Domain-specific prompt incorporates sequence of vectors representing domain-specific keywords
  • Second gate used to control order of concatenation of general and domain-specific prompts

Prompting architecture

  • Proposed method is a new definition of soft prompts that can be integrated into any existing model.
  • Experiments use P-Tuning v2 architecture due to its high efficacy.
  • P-Tuning v2 is an adaptation of deep prompt tuning.
  • Soft prompts are injected at every layer of the pre-trained LM.
  • During training, the prompts are tuned but the LM stays fixed.
  • Classification head is added on top of the pre-trained LM.

Experiments

  • Described setup of datasets, training details and baselines
  • Presented results of experiments

Datasets

  • Used classification benchmark datasets from different domains: TREC, GARD, SOFC-Exp
  • Constructed few-shot datasets by randomly sampling N shots per class
  • Created few-shot development sets by keeping the number of shots in the training and development sets in sync
  • Used accuracy (%) as evaluation metric

Training details

  • Used open-sourced HuggingFace language models
  • Trained models with batch size of 32, max sequence length of 128, dropout rate of 0.1
  • Used ExponentialLR learning rate scheduler with gamma value of 0.95 and Adam optimizer
  • Performed experiments on V 100 GPU
  • Reported results are average of five runs

Baselines

  • Compared method to different baselines
  • Used general-domain and domain-specific language models

Results

  • Prompting methods outperform fine-tuning in low-resource domains
  • Domain-specific LMs outperform general-domain LMs
  • SwitchPrompt outperforms other prompting methods
  • SwitchPrompt reduces the performance gap between general-domain and domain-specific LMs
  • P-tuning outperforms SwitchPrompt in very-few-shot settings
  • SwitchPrompt outperforms fine-tuning and other prompting methods in general domain

Analysis

  • Ablation study shows importance of components of prompting function
  • Domain-specific keywords are automatically computed
  • Training time is reduced compared to alternative approaches
  • Qualitative error analysis shows errors when input sentences convey little domain-specific information

Conclusion

  • Proposed a new methodology called SwitchPrompt
  • Domain-specific keywords and gates to retrieve domain-specific knowledge
  • Outperforms baseline methods in few-shot and all-data settings
  • Reduces performance gap between general-domain and domain-specific language models