Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Pre-trained language models have been successful in NLG tasks.
Various decoding methods have been employed, but often produce suboptimal results.
A novel method, \textsc{PairReranker}, was proposed to improve reranking for NLG tasks.
Experiments on three NLG tasks demonstrated the effectiveness and flexibility of \textsc{PairReranker}.
\textsc{PairReranker} can generalize to improve GPT-3 results.

Paper Content

Introduction

Pre-trained encoder-decoder language models (LMs) are effective for natural language generation (NLG) tasks.
BART and T5 are examples of pre-trained models.
Task-specific pre-trained models, such as PEGASUS, have also been used.
Various decoding approaches, such as beam search, diverse beam search, top-k sampling, and top-p sampling, are used during inference.
These methods often produce suboptimal results.
Selecting the best output from the results of multiple decoding methods can improve performance.
For example, using the PEGASUS model on the CNNDM dataset, the Rouge-2 score for the top beam search generation can be increased by 57%.
The T5-large model on the CommonGen dataset can achieve a 93% gain in CIDEr score.
The Opus-MT model can achieve a 79.7% gain in BLEU on the WMT18 (zh-en) translation task.
Re-ranking candidates after decoding is a way to mitigate the gap between oracle selections and top-ranked outputs.
SimCLS and SummaReranker are examples of re-ranking approaches.
PAIRRERANKER is a novel re-ranking method that uses a single encoder and pairwise loss function.
Experiments demonstrate that PAIRRERANKER outperforms the baseline methods and is compatible with large language models.

Problem formulation

Methods

Previous baselines have methods that can be formulated
Limitations of previous baselines exist
A pair-based reranker is proposed to address the limitations

Baseline methods

SimCLS Liu and Liu (2021) treat reranking as a learning-to-rank problem and use cosine similarity between source and candidate as the predicted score.
SummaReranker Ravaut et al. (2022) frame the problem as a binary classification task and use a mixture-of-expert layer for optimization.

Pairwise reranking

Candidates are highly homogeneous, making it difficult for the model to learn their difference.
Traditional document retrieval has a rich and heterogeneous document corpus.
Search space of the problem only contains dozens of generated candidates from a normal language model.
Goal is to train a reranker to capture the subtle nuance among the candidates.
Method follows the paradigm of two-stage training.
Reranker is expected to output two scores for a given pair of candidates.
Model is a multitask binary classification problem.
In-context attention is used to capture the difference among the highly homogeneous candidate groups.
Subsampling strategy is used during training and inference.
Single bubble run of comparisons is used to select the best candidate.

Base models

Used PEGASUSlarge and BART-large for summarization task on CNN/DailyMail dataset
Used T5-large for generative commonsense reasoning task on CommonGen dataset
Used opus-mt checkpoint for Chinese-English translation task on WMT2018 dataset

Evaluation setups

Construct training dataset for reranker
Ensure base model used to generate candidates on training dataset has not seen them
Generate candidates on training dataset using decoding method
Use public checkpoints for inference
Use beam search and diverse beam search for experiments
Generate 15 candidates for each decoding method for training and inference
Train reranker for 5 epochs

Main results

Overall performance in summarization improved by 6.12% in Rouge-1
Task generalization improved by 2.90% in CIDEr on CommonGen dataset and 3.87% in BLEU on WMT2018 dataset
Transferring re-rankers to GPT-3 improved by 6.45% on CNN/DM dataset and 24.55% on CommonGen dataset
Reranker consistent with itself more than 90% of the time
Shuffling order of candidates before inference removes bias of initial order

Exploiting encoder-decoder LMs for NLG tasks is a focus in NLG research
Decoding methods like beam search are used to generate high quality candidates
Top-beam candidate is not always the best one
Reranking methods are used to improve quality of NLG
Pairwise ranking has shown great performance on NLP tasks
Attention mechanism captures difference between a pair of data
Self-critic algorithm used to train language model and reranker jointly

Conclusion

Pre-trained encoder-decoder language models can be used for NLG tasks.
Decoding methods like beam search and top-k sampling can produce suboptimal results.
Re-ranking candidate outputs after decoding can improve performance.
PAIRRERANKER is a novel reranking method that outperforms baseline methods.

Link to paper#

Abstract#

Paper Content#

Introduction#

Problem formulation#

Methods#

Baseline methods#

Pairwise reranking#

Base models#

Evaluation setups#

Main results#

Related work#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Problem formulation

Methods

Baseline methods

Pairwise reranking

Base models

Evaluation setups

Main results

Related work

Conclusion