Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Pre-trained language models are effective for natural language processing tasks, but not for low-resource domains due to the domain gap.
- SwitchPrompt is a novel and lightweight prompting methodology to bridge the domain gap.
- SwitchPrompt uses domain-specific keywords with a trainable gated prompt to offer domain-oriented prompting.
- Few-shot experiments on three text classification benchmarks demonstrate the efficacy of the general-domain pre-trained language models when used with SwitchPrompt.
- SwitchPrompt can increase accuracy by up to 10.7%, reducing the need for domain-specific language model pre-training.
Paper Content
Introduction
- Pre-trained language models (LMs) have been shown to be effective for natural language processing tasks, especially in low-resource settings
- Most publicly available LMs are trained on general-domain corpora, which can lead to a domain gap when applied to tasks from a special domain
- Pre-training deep language models requires large amounts of text data, which may not be available in low-resource domains
- Traditional prompting techniques may not be effective in low-resource settings
- SwitchPrompt is a novel and lightweight method to effectively retrieve domain-specific knowledge from pre-trained LMs
- SwitchPrompt outperforms different state-of-the-art prompting methods and reduces domain gaps
- SwitchPrompt is especially suitable for low-resource settings as it does not require pre-training domain-specific LMs or fine-tuning LMs for the downstream task
Method
- Introduces SwitchPrompt
- Example of architecture in which it can be applied
- Underlying pre-trained language model is fixed
Domain-specific soft prompts
- Proposed prompts allow model to switch between general-domain and domain-specific prompts
- Sigmoid-based gating function used to control switching
- General-domain prompt is a sequence of randomly initialized vectors
- Domain-specific prompt incorporates sequence of vectors representing domain-specific keywords
- Second gate used to control order of concatenation of general and domain-specific prompts
Prompting architecture
- Proposed method is a new definition of soft prompts that can be integrated into any existing model.
- Experiments use P-Tuning v2 architecture due to its high efficacy.
- P-Tuning v2 is an adaptation of deep prompt tuning.
- Soft prompts are injected at every layer of the pre-trained LM.
- During training, the prompts are tuned but the LM stays fixed.
- Classification head is added on top of the pre-trained LM.
Experiments
- Described setup of datasets, training details and baselines
- Presented results of experiments
Datasets
- Used classification benchmark datasets from different domains: TREC, GARD, SOFC-Exp
- Constructed few-shot datasets by randomly sampling N shots per class
- Created few-shot development sets by keeping the number of shots in the training and development sets in sync
- Used accuracy (%) as evaluation metric
Training details
- Used open-sourced HuggingFace language models
- Trained models with batch size of 32, max sequence length of 128, dropout rate of 0.1
- Used ExponentialLR learning rate scheduler with gamma value of 0.95 and Adam optimizer
- Performed experiments on V 100 GPU
- Reported results are average of five runs
Baselines
- Compared method to different baselines
- Used general-domain and domain-specific language models
Results
- Prompting methods outperform fine-tuning in low-resource domains
- Domain-specific LMs outperform general-domain LMs
- SwitchPrompt outperforms other prompting methods
- SwitchPrompt reduces the performance gap between general-domain and domain-specific LMs
- P-tuning outperforms SwitchPrompt in very-few-shot settings
- SwitchPrompt outperforms fine-tuning and other prompting methods in general domain
Analysis
- Ablation study shows importance of components of prompting function
- Domain-specific keywords are automatically computed
- Training time is reduced compared to alternative approaches
- Qualitative error analysis shows errors when input sentences convey little domain-specific information
Conclusion
- Proposed a new methodology called SwitchPrompt
- Domain-specific keywords and gates to retrieve domain-specific knowledge
- Outperforms baseline methods in few-shot and all-data settings
- Reduces performance gap between general-domain and domain-specific language models