Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

End-to-end neural approaches lack interpretability and robustness
Binder is a training-free neural-symbolic framework that maps task input to a program
Unified API of language model functionalities is used to extend grammar coverage
GPT-3 Codex is used as the language model
Few in-context exemplar annotations are used
Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets
No training required, only uses dozens of annotations as in-context exemplars

Paper Content

Introduction

Performance on natural language processing tasks is dominated by neural end-to-end systems
Symbolic approaches produce explicit intermediate representations
Symbolic approaches are interpretable and robust
Coverage is limited by the grammar of the symbolic language
Neural-symbolic approaches combine neural modules and symbolic languages
Neural-symbolic approaches require human design and large training data
BINDER is a training-free neural-symbolic framework that maps task inputs to an executable program
BINDER requires few annotations and is more interpretable, scalable, and robust than end-to-end approaches

Approach

Binder framework

BINDER framework is used to solve NLP tasks
BINDER program is generated from natural language input and optional context
Output answer is derived by executing BINDER program with interpreter

Binder parsing

Input natural language is parsed into a BINDER program
BINDER program is an expression in a symbolic language that includes API calls
API call is a function that accepts a question and context to be queried
Output of API call is the answer to the question
Output is represented as a variable compatible with the symbolic language grammar

Binder execution

Program Z is executed by a BINDER interpreter to derive the answer A.
BINDER interpreter consists of a standard symbolic language interpreter and the model(s) realizing the API calls.
Lexical and syntax analysis includes adding f ( Q ; D) as a new identifier in the grammar.
Program evaluation involves evaluating the API calls by calling the underlying neural models.

In-context learning for binder

Uses large language models for in-context learning
Only takes a few annotations/demonstrations as a prompt
Performs inference without training model parameters
Uses Codex as both semantic parser and model to perform API call functionalities
Takes advantage of few-shot generalization ability of Codex
Applies in-context learning for BINDER with k in-context exemplars
Outputs n candidate BINDER programs
Programs are executed by BINDER interpreter
Output answer is derived via majority voting strategy

Binder implementation

BINDER is designed to be extensible to various programming languages and API call functionalities.
Two APIs are implemented: f col and f val.
f col calls a language model to answer questions based on column data.
f val is used for more complex questions and outputs a value as the answer.

Experiments

Experiment setup

Evaluated method on three knowledge grounding datasets
WIKITQ requires complex table reasoning skills
20% of WIKITQ questions not answerable by pure SQL
TABFACT is a binary fact verification benchmark
Evaluation metrics are execution accuracy for WIKITQ and TABFACT
Pre-matching check for semantically correct cases in WIKITQ

Method

Compared to other strong published methods, Codex BINDER (ours) achieved 85.1 accuracy on the official small test set without finetuning.
Codex was also evaluated with additional inference models, including end-to-end QA and semantic parsing with the standard SQL language.

Implementation details

Used OpenAI Codex API model for experiments
Annotated 14 in-context exemplars with BINDER programs
Prompt format follows Rajkumar et al., 2022
Benefits of BINDER include interpretability and robustness

Analysis

Ablation study

Binding neural module API calls into a programming language can help solve queries that are unsolvable in that language alone.
Codex BINDER outperforms Codex SQL by 10.1% on program-unsolvable questions.
BINDER has a much lower spurious rate than SQL (12% vs. 33%).

Interpretability

BINDER improves interpretability over end-to-end approaches
BINDER enables finding the source of errors and provides a way to fix them

Robustness

Scalability

BINDER is more scalable than end-to-end QA
BINDER can handle large knowledge sources, while end-to-end QA fails or degrades

Noisy content

End-to-end methods are more brittle to noisy inputs.
A noisy WIKITQ development subset was built with distractors.
BINDER is stable confronting distractors, while end-to-end QA is more likely to be confused.

Binder with python

BINDER is designed to be extensible to various programming languages
Python (with the Pandas package) is used as the BINDER language on WIKITQ
Neural API is incorporated into BINDER with Python
Evaluated on program-unsolvable subset of WIKITQ to test if method improves Python’s capability
BINDER with Python effectively improves Python coverage on difficult subset

Multimodal application

BINDER is applied to the multi-modal dataset MULTIMODALQA (MMQA) across text, tables, and images.
Images are converted into textual image captions with a vision-text pretrained model OFA.
BINDER achieves better performance than end-to-end QA and the fine-tuned baseline Implicit-Decomp.
With the oracle retriever, BINDER can achieve comparable performance with the state-of-the-art.

Semantic parsing is a symbolic method used to produce executable programs from natural language input
Neural-Symbolic methods integrate neural modules with symbolic languages
BINDER is a training-free method that requires only dozens of annotations and is expressive and flexible to handle real-world diverse questions

Conclusion

Propose BINDER, a training-free neural-symbolic framework
Combines strengths of end-to-end and symbolic approaches
State-of-the-art performance on WIKITQ and TABFACT with only a few in-context demonstrations
No additional training required
Language model-focused attempt to integrate two widely-adopted paradigms in NLP
Can be extended to many more scenarios with the appropriate programming language and functionalities
Codex used as the LM for all neural modules
Image captions used for images
Majority vote used to ensemble multiple candidate answers
BINDER grammar adapted to SQL
Performance increases with more generations of BINDER programs
Annotation interface allows real-time executions with huggingface spaces2
Source code to be released

Link to paper#

Abstract#

Paper Content#

Introduction#

Approach#

Binder framework#

Binder parsing#

Binder execution#

In-context learning for binder#

Binder implementation#

Experiments#

Experiment setup#

Method#

Implementation details#

Analysis#

Ablation study#

Interpretability#

Robustness#

Scalability#

Noisy content#

Binder with python#

Multimodal application#

Related work#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Approach

Binder framework

Binder parsing

Binder execution

In-context learning for binder

Binder implementation

Experiments

Experiment setup

Method

Implementation details

Analysis

Ablation study

Interpretability

Robustness

Scalability

Noisy content

Binder with python

Multimodal application

Related work

Conclusion