Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Progress has been made in unifying table-to-text tasks using a single encoder-decoder model.
- Existing methods use a simple dataset name as a prefix to the encoder, limiting effectiveness and hindering generalization.
- We propose compositional task configurations to improve cross-task generalization.
- Task configurations explicitly specify task type, input and output types.
- Our method outperforms the UnifiedSKG baseline in both in-domain and zero-shot settings.
Paper Content
Introduction
- NLP tasks have traditionally been studied individually
- Recently, pre-trained transformer models have been used to unify multiple NLP tasks with a single encoder-decoder model
- UnifiedSKG extended this paradigm to table-to-text tasks
- Existing work relies on a simple trick to encode task information
- This design has two major limitations
- We propose the use of compositional task configurations to improve cross-task generalizability
- We evaluate the model on 5 table-to-text datasets and 5 new datasets
- Our method outperforms the baseline consistently and demonstrates strong cross-task generalization
- Human evaluation of the generated supporting cells reveals high relevance to the task
Method & tasks
- Prompting is a way to control pre-trained language models
- Task configurations contain 4 aspects: task type, input type, output type, and dataset name
- Task type is the end goal of a task (e.g. QA and summarization)
- Input and output types specify the inputs and outputs of the model
- Dataset name specifies the dataset used for training
- Method requires a small input prefix for generalization to more tasks and datasets
Datasets and task configurations
- 5 datasets used for training and testing
- 5 datasets used for testing only
- Test-only evaluation setup to assess effectiveness of method
- Model tested on unseen tasks and new datasets
- Special tokens and markups used to separate parts of inputs and outputs
Experiments
- Evaluated method using experimental setup in Table 1
- Used T5 as backbone of table-to-text model
- Used temperature up-sampling method with temperature set to 2
- Used batch size of 128 and AdamW as optimizer with initial learning rate of 5e-5
- Limited input length to 1024 sentence-piece tokens
- Compared method against UnifiedSKG
- Evaluated in-domain tasks on training and test sets
- Evaluated test-only tasks in zero-shot setting
Results
Main results
- Using compositional task configs improves zero-shot performance on unseen datasets
- Using compositional task configs improves in-domain performance over baseline and single-task training
- Few-shot evaluation shows improved performance with task configurations
- Performance gap diminishes as number of supervised examples increases
Ablation of task configs at training time
- Removing output type had largest performance drop
- Removing input type had least impact on performance
Ablation of task configs at test time
- Method demonstrates strong zero-shot task performance
- Removing input configurations results in performance drop
- Removing output configurations also results in performance drop
Human evaluation of generated cells
- Proposed task configurations can be modified to output more results for improved explainability.
- Example of this is for the table-based fact verification task, TABFACT, which can output a binary label and a cell component as supporting evidence.
- Human study conducted over 50 randomly sampled outputs from the TABFACT dataset found that the model is able to generate cells with high relevance but struggles with full completeness.
Related work
- Proposed framework is capable of generalizing to a broader range of tasks and datasets
- Task unification using encoder-decoder models
- Cross-task generalization with pretrained models
- Exploration scope limited to table-to-text tasks
Conclusion
- Introduced compositional task configurations for unified table-to-text tasks
- Statistics of datasets shown in Table 6
- Max length of each cell limited to 15 sentence-piece tokens
- Table size reduced if length is longer than 1024
- Relevant cells extracted by executing SQL query annotations
- Final answer annotations obtained by aggregations or numerical operations
- NQ-TABLES dataset derived from NaturalQuestions dataset
- Customized version of NQ-TABLES used with only unique examples
- Selected cells fed to decoder as input
- Reversed version of TOTTO dataset created
- Both versions of TOTTO used at training time
- Task configurations applied to all datasets
- Model able to establish correspondence between task configurations and input/output format
- Few-shot performance better than baseline
- Model learns ability to generate relevant table cells according to question