Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Neural reasoning accuracy improves when generating intermediate steps
- Source of improvement is unclear
- Investigated benefit of generating intermediate steps for symbolic reasoning
- Decomposed reasoning strategy in terms of step granularity and chaining strategy
- Found that choice of reasoning strategies affects performance
- Certain configurations lead to nearly perfect performance
- Results indicate importance of exploring effective strategies for neural reasoning models
Paper Content
Introduction
- Artificial intelligence researchers have been attempting neural-symbolic integration for a long time.
- Neural models perform better when generating intermediate reasoning steps in addition to the answer.
- This phenomenon was seen across various reasoning tasks.
- Researchers broke down the neural reasoning process into two strategies: output strategy and chaining strategy.
- Iterative generation outperformed all-at-once outputting, and roughly granular reasoning steps lagged behind finely granular steps.
Experimental settings
- Evaluated models’ ability to perform arithmetic operations over given symbols
- Task is to answer value of target variable
- Reasoning depth is number of equations needed to reach answer
- Equations define assignments and modular additions
- Contexts contain distractors not necessary to calculate answer
- Artificial data allows easier control of reasoning depth for generalization tests
Output strategies
- Generating intermediate reasoning steps improves performance
- Step-by-step works best, all-at-once works worst
- Neural models have low symbolic reasoning ability
- All-at-once strategy overfits to output similar length of reasoning steps as those in the training data
- Step-by-step has advantage over token-by-token
Chaining strategies
- Results of fixed step-by-step output strategy shown in Figure 4b and Table 1
- Accuracy measured based on mathematical correctness, not exact match
- Performance dropped in shortest-path setting as depth increased
- Models successfully solved task when extrapolating to depths 6-12
- Models correctly generated intermediate steps and final answer
- Chaining strategies with no reasoning steps had better generalization performance
- Appropriate output strategy improves reasoning ability of model
- Accuracy higher when granularity of intermediate steps is finer
Results
- Used pre-trained T5-base, T5-large2, and BARTbase3
- Results of BART-base in Appendix C
- Numbers in dataset divided into digits
- Pre-trained using 10K simple dataset for 30 epochs
- Trained with 5K training set for 2000 epochs
- Experiment setting details in Appendix A
- 0.2K test instances for each reasoning depth
- Pretraining dataset contains two types of single-depth instances
- Results in paper are averages of results on three different seeds
Error analysis
- Copying errors were the most frequent (53%)
- Hasty assignment was the second type of error
Models’ scalability
- T5-large and T5-base were compared to investigate scalability.
- T5-large had lower accuracy than T5-base on all-at-once and step-by-step.
- T5-large had higher accuracy than T5-base on token-by-token.
Conclusions
- Investigated and factorized reasoning strategy in symbolic numerical reasoning with neural seq2seq models
- Combination of step-by-step output and finely granular reasoning leads to successful symbolic reasoning
- Simple symbolic reasoning requires appropriate selection of reasoning strategy
- Unclear if findings generalize to more complex symbolic reasoning and/or problems written in natural language
- Iterative strategies limited to input length of model
- Examined learning rate from 10-3, 10-4, and 10-5
- Used four NVIDIA V100 GPUs
- Performance drops in shortest path setting as reasoning depth increases
- Exhaustive or backward successfully solves task even when extrapolating to depths 6-12
- T5 outperforms BART