Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Neural reasoning accuracy improves when generating intermediate steps
Source of improvement is unclear
Investigated benefit of generating intermediate steps for symbolic reasoning
Decomposed reasoning strategy in terms of step granularity and chaining strategy
Found that choice of reasoning strategies affects performance
Certain configurations lead to nearly perfect performance
Results indicate importance of exploring effective strategies for neural reasoning models

Artificial intelligence researchers have been attempting neural-symbolic integration for a long time.
Neural models perform better when generating intermediate reasoning steps in addition to the answer.
This phenomenon was seen across various reasoning tasks.
Researchers broke down the neural reasoning process into two strategies: output strategy and chaining strategy.
Iterative generation outperformed all-at-once outputting, and roughly granular reasoning steps lagged behind finely granular steps.

Evaluated models’ ability to perform arithmetic operations over given symbols
Task is to answer value of target variable
Reasoning depth is number of equations needed to reach answer
Equations define assignments and modular additions
Contexts contain distractors not necessary to calculate answer
Artificial data allows easier control of reasoning depth for generalization tests

Generating intermediate reasoning steps improves performance
Step-by-step works best, all-at-once works worst
Neural models have low symbolic reasoning ability
All-at-once strategy overfits to output similar length of reasoning steps as those in the training data
Step-by-step has advantage over token-by-token

Results of fixed step-by-step output strategy shown in Figure 4b and Table 1
Accuracy measured based on mathematical correctness, not exact match
Performance dropped in shortest-path setting as depth increased
Models successfully solved task when extrapolating to depths 6-12
Models correctly generated intermediate steps and final answer
Chaining strategies with no reasoning steps had better generalization performance
Appropriate output strategy improves reasoning ability of model
Accuracy higher when granularity of intermediate steps is finer

Investigated and factorized reasoning strategy in symbolic numerical reasoning with neural seq2seq models
Combination of step-by-step output and finely granular reasoning leads to successful symbolic reasoning
Simple symbolic reasoning requires appropriate selection of reasoning strategy
Unclear if findings generalize to more complex symbolic reasoning and/or problems written in natural language
Iterative strategies limited to input length of model
Examined learning rate from 10-3, 10-4, and 10-5
Used four NVIDIA V100 GPUs
Performance drops in shortest path setting as reasoning depth increases
Exhaustive or backward successfully solves task even when extrapolating to depths 6-12
T5 outperforms BART