Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Progress has been made in unifying table-to-text tasks using a single encoder-decoder model.
  • Existing methods use a simple dataset name as a prefix to the encoder, limiting effectiveness and hindering generalization.
  • We propose compositional task configurations to improve cross-task generalization.
  • Task configurations explicitly specify task type, input and output types.
  • Our method outperforms the UnifiedSKG baseline in both in-domain and zero-shot settings.

Paper Content

Introduction

  • NLP tasks have traditionally been studied individually
  • Recently, pre-trained transformer models have been used to unify multiple NLP tasks with a single encoder-decoder model
  • UnifiedSKG extended this paradigm to table-to-text tasks
  • Existing work relies on a simple trick to encode task information
  • This design has two major limitations
  • We propose the use of compositional task configurations to improve cross-task generalizability
  • We evaluate the model on 5 table-to-text datasets and 5 new datasets
  • Our method outperforms the baseline consistently and demonstrates strong cross-task generalization
  • Human evaluation of the generated supporting cells reveals high relevance to the task

Method & tasks

  • Prompting is a way to control pre-trained language models
  • Task configurations contain 4 aspects: task type, input type, output type, and dataset name
  • Task type is the end goal of a task (e.g. QA and summarization)
  • Input and output types specify the inputs and outputs of the model
  • Dataset name specifies the dataset used for training
  • Method requires a small input prefix for generalization to more tasks and datasets

Datasets and task configurations

  • 5 datasets used for training and testing
  • 5 datasets used for testing only
  • Test-only evaluation setup to assess effectiveness of method
  • Model tested on unseen tasks and new datasets
  • Special tokens and markups used to separate parts of inputs and outputs

Experiments

  • Evaluated method using experimental setup in Table 1
  • Used T5 as backbone of table-to-text model
  • Used temperature up-sampling method with temperature set to 2
  • Used batch size of 128 and AdamW as optimizer with initial learning rate of 5e-5
  • Limited input length to 1024 sentence-piece tokens
  • Compared method against UnifiedSKG
  • Evaluated in-domain tasks on training and test sets
  • Evaluated test-only tasks in zero-shot setting

Results

Main results

  • Using compositional task configs improves zero-shot performance on unseen datasets
  • Using compositional task configs improves in-domain performance over baseline and single-task training
  • Few-shot evaluation shows improved performance with task configurations
  • Performance gap diminishes as number of supervised examples increases

Ablation of task configs at training time

  • Removing output type had largest performance drop
  • Removing input type had least impact on performance

Ablation of task configs at test time

  • Method demonstrates strong zero-shot task performance
  • Removing input configurations results in performance drop
  • Removing output configurations also results in performance drop

Human evaluation of generated cells

  • Proposed task configurations can be modified to output more results for improved explainability.
  • Example of this is for the table-based fact verification task, TABFACT, which can output a binary label and a cell component as supporting evidence.
  • Human study conducted over 50 randomly sampled outputs from the TABFACT dataset found that the model is able to generate cells with high relevance but struggles with full completeness.
  • Proposed framework is capable of generalizing to a broader range of tasks and datasets
  • Task unification using encoder-decoder models
  • Cross-task generalization with pretrained models
  • Exploration scope limited to table-to-text tasks

Conclusion

  • Introduced compositional task configurations for unified table-to-text tasks
  • Statistics of datasets shown in Table 6
  • Max length of each cell limited to 15 sentence-piece tokens
  • Table size reduced if length is longer than 1024
  • Relevant cells extracted by executing SQL query annotations
  • Final answer annotations obtained by aggregations or numerical operations
  • NQ-TABLES dataset derived from NaturalQuestions dataset
  • Customized version of NQ-TABLES used with only unique examples
  • Selected cells fed to decoder as input
  • Reversed version of TOTTO dataset created
  • Both versions of TOTTO used at training time
  • Task configurations applied to all datasets
  • Model able to establish correspondence between task configurations and input/output format
  • Few-shot performance better than baseline
  • Model learns ability to generate relevant table cells according to question