Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Automated reasoning with unstructured natural text has made progress with the help of large language models.
Forward direction reasoning suffers from a combinatorial explosion of the search space.
Backward direction reasoning is more efficient for proof-finding problems.
LAMBADA is a backward chaining algorithm that decomposes reasoning into four sub-modules.
LAMBADA achieves accuracy boosts over forward reasoning methods on two challenging logical reasoning datasets.

Paper Content

Introduction

Automated reasoning has been a goal for AI since its early days
Recent advancements in natural language understanding have been made with pre-trained language models
Performance of these models for logical reasoning still lags behind
Logical reasoning with unstructured, natural text is a fundamental building block for automated knowledge discovery
Scale has been observed to provide limited benefit for solving complex reasoning problems
Finetuning initially seemed to enable logical reasoning in LMs, but mostly exploits spurious correlations
Prompting strategies have found some success, but struggle with proof planning for more complex logical reasoning problems
Integrating the strength and reliability of classical AI models in logical reasoning with LMs is one solution
Two major approaches to logical reasoning are Forward Chaining and Backward Chaining
Previous approaches to reasoning with LMs mostly incorporate elements of Forward Chaining
Backward Chaining is better suited for text-based deductive logical reasoning

Deep learning models can be used to solve text-based reasoning tasks
Pretraining an LM on relevant data can lead to improvements
Implicit reasoning approaches finetune LMs to produce desired output
Explicit reasoning approaches generate intermediate reasoning steps
Verifiers judge the quality of reasoning chains
Length generalization is possible with certain models
Modular reasoning breaks problem into smaller modules
Natural Language Inference (NLI) is related to logical reasoning
Example of theory with facts and rules given

Backward chaining

Backward chaining (BC) is a strategy for reasoning that starts from the goal and recursively breaks it down into sub-goals.
BC uses an operation called unification in logic to determine if a rule applies to a goal.
BC has four LM-based modules: Fact Check, Rule Selection, Goal Decomposition, and Sign Agreement.
Fact Check verifies if a goal can be proved or disproved from any of the facts.
Rule Selection identifies rules whose consequent unifies with the goal.
Goal Decomposition identifies sub-goals that need to be proved in order for the goal to be proved or disproved.
Sign Agreement verifies if the sign of the rule consequent agrees or disagrees with the sign of the goal.
Algorithm 1 integrates the four LM modules with BC to enable text-based logical reasoning.
LAMBADA is a depth-first search algorithm over the facts and the rules.
Rules are ranked based on their lengths with shorter rules being ranked higher.

Experimental setup

Compared two main baselines for logical reasoning: Chain of Thought (CoT) and Selection-Inference (SI)
Experimented with two challenging logical reasoning datasets: ProofWriter and PrOntoQA
ProofWriter has two subsets: open-world assumption (OWA) and closed-world assumption (CWA)
ProofWriter has three labels: PROVED, DISPROVED, UNKNOWN
PrOntoQA has multiple variants; used fictional characters version

Experimental results

LAMBADA outperforms two baselines on two ProofWriter datasets
Backward chaining (LAMBADA) better than forward chaining (SI)
CoT has short-coming in dealing with UNKNOWN labels
SI over-predicts DISPROVED and UNKNOWN for higher depths

Proof accuracy

CoT predicted the result correctly for 50 examples from depth-5 of the dataset
Manually verified the proofs generated by LAMBADA
CoT only generated the correct chain for 28% of the examples
Main source of error was hallucinating rules or facts
Spurious correlations in ProofWriter-PD can be exploited by CoT
Rule Selection and Goal Decomposition modules are responsible for majority of failure cases

The role of scale

Experiment from Section 5.2 repeated with PaLM 62B and 8B to see role of LM scale on LAMBADA
Performance of Goal Decomposition and Sign Agreement modules remain comparable with PaLM 62B
Performance of Fact Check and Rule Selection modules drop significantly with PaLM 62B
Smaller LMs require further decomposition of one-to-many comparisons in selection module

Number of inference calls

LAMBADA and SI require multiple LM inference calls per example.
LAMBADA requires significantly fewer inference calls than SI, especially at higher depths.

Qualitative analysis

LAMBADA calls Fact Check module which cannot prove or disprove goal

Conclusion

Developed LAMBADA, an algorithm for text-based deductive logical reasoning
Combines capacity of LMs to handle naturalistic text input with backward chaining algorithm for high-level reasoning
Significantly improved prediction and proof accuracy compared to existing approaches
Efficiently searches entire proof space to accurately conclude statement can neither be proved nor disproved
Insight on efficacy of goal-directed reasoning with LMs widely applicable and can be adapted to other NLP tasks
Algorithm may be called with same goal multiple times
Lexical sensitivity of LAMBADA tested with modified test set
Performance stays in same ballpark, showing robustness of LAMBADA
Overall model accuracies reported in main text
Confusion matrices reported to better understand biases of model
CoT and SI show similar behaviour as LAMBADA on ProofWriter-PUD
SI shows large tendency toward predicting DISPROVED for PrOntoQA
For CoT on PrOntoQA, wording slightly changed to make chains have better flow
Few-shot examples used/written for each dataset depth
For selecting a fact in Fact Check, prompt looks like “is equivalent to/negation of/neither equivalent nor negation of”
Prompts used for each of four components of model for ProofWriter dataset

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Backward chaining#

Experimental setup#

Experimental results#

Proof accuracy#

The role of scale#

Number of inference calls#

Qualitative analysis#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Related work

Backward chaining

Experimental setup

Experimental results

Proof accuracy

The role of scale

Number of inference calls

Qualitative analysis

Conclusion