Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Progress has been made in unifying table-to-text tasks using a single encoder-decoder model.
Existing methods use a simple dataset name as a prefix to the encoder, limiting effectiveness and hindering generalization.
We propose compositional task configurations to improve cross-task generalization.
Task configurations explicitly specify task type, input and output types.
Our method outperforms the UnifiedSKG baseline in both in-domain and zero-shot settings.

Paper Content

Introduction

NLP tasks have traditionally been studied individually
Recently, pre-trained transformer models have been used to unify multiple NLP tasks with a single encoder-decoder model
UnifiedSKG extended this paradigm to table-to-text tasks
Existing work relies on a simple trick to encode task information
This design has two major limitations
We propose the use of compositional task configurations to improve cross-task generalizability
We evaluate the model on 5 table-to-text datasets and 5 new datasets
Our method outperforms the baseline consistently and demonstrates strong cross-task generalization
Human evaluation of the generated supporting cells reveals high relevance to the task

Method & tasks

Prompting is a way to control pre-trained language models
Task configurations contain 4 aspects: task type, input type, output type, and dataset name
Task type is the end goal of a task (e.g. QA and summarization)
Input and output types specify the inputs and outputs of the model
Dataset name specifies the dataset used for training
Method requires a small input prefix for generalization to more tasks and datasets

Datasets and task configurations

5 datasets used for training and testing
5 datasets used for testing only
Test-only evaluation setup to assess effectiveness of method
Model tested on unseen tasks and new datasets
Special tokens and markups used to separate parts of inputs and outputs

Experiments

Evaluated method using experimental setup in Table 1
Used T5 as backbone of table-to-text model
Used temperature up-sampling method with temperature set to 2
Used batch size of 128 and AdamW as optimizer with initial learning rate of 5e-5
Limited input length to 1024 sentence-piece tokens
Compared method against UnifiedSKG
Evaluated in-domain tasks on training and test sets
Evaluated test-only tasks in zero-shot setting

Results

Main results

Using compositional task configs improves zero-shot performance on unseen datasets
Using compositional task configs improves in-domain performance over baseline and single-task training
Few-shot evaluation shows improved performance with task configurations
Performance gap diminishes as number of supervised examples increases

Ablation of task configs at training time

Removing output type had largest performance drop
Removing input type had least impact on performance

Ablation of task configs at test time

Method demonstrates strong zero-shot task performance
Removing input configurations results in performance drop
Removing output configurations also results in performance drop

Human evaluation of generated cells

Proposed task configurations can be modified to output more results for improved explainability.
Example of this is for the table-based fact verification task, TABFACT, which can output a binary label and a cell component as supporting evidence.
Human study conducted over 50 randomly sampled outputs from the TABFACT dataset found that the model is able to generate cells with high relevance but struggles with full completeness.

Proposed framework is capable of generalizing to a broader range of tasks and datasets
Task unification using encoder-decoder models
Cross-task generalization with pretrained models
Exploration scope limited to table-to-text tasks

Conclusion

Introduced compositional task configurations for unified table-to-text tasks
Statistics of datasets shown in Table 6
Max length of each cell limited to 15 sentence-piece tokens
Table size reduced if length is longer than 1024
Relevant cells extracted by executing SQL query annotations
Final answer annotations obtained by aggregations or numerical operations
NQ-TABLES dataset derived from NaturalQuestions dataset
Customized version of NQ-TABLES used with only unique examples
Selected cells fed to decoder as input
Reversed version of TOTTO dataset created
Both versions of TOTTO used at training time
Task configurations applied to all datasets
Model able to establish correspondence between task configurations and input/output format
Few-shot performance better than baseline
Model learns ability to generate relevant table cells according to question

Link to paper#

Abstract#

Paper Content#

Introduction#

Method & tasks#

Datasets and task configurations#

Experiments#

Results#

Main results#

Ablation of task configs at training time#

Ablation of task configs at test time#

Human evaluation of generated cells#

Related work#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Method & tasks

Datasets and task configurations

Experiments

Results

Main results

Ablation of task configs at training time

Ablation of task configs at test time

Human evaluation of generated cells

Related work

Conclusion