Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Language models (LMs) can be augmented with reasoning skills and the ability to use tools.
Augmentations can be used separately or in combination.
Augmented LMs (ALMs) can use external modules to expand their context processing ability.
ALMs can learn to reason, use tools, and act while still performing standard natural language tasks.
ALMs have the potential to address limitations of traditional LMs.

Paper Content

Reasoning

Reasoning is the ability to make inferences using evidence and logic
Reasoning can be divided into multiple types of skills
Reasoning often involves deductions from inference chains
Previous work has shown that LLMs can solve simple reasoning problems but fail at complex reasoning
Challenge with complex reasoning problems for LMs is to correctly obtain the solution by composing the correct answers
Works related to three popular paradigms for eliciting reasoning in LMs discussed

Recursive prompting

Problem decomposition can be used to solve complex tasks
Problem decomposition can be used to solve sub-problems independently or sequentially
Least-to-most prompting decomposes complex problems into sub-problems
Recent works employ in-context learning to decompose problems
Different works use different methods to decompose problems

Explicitly teaching language models to reason

Prompting approaches require model scale and need to discover prompts that elicit multi-step computation tasks.
Scratchpads are fine-tuned on example tasks with associated computation steps.
Nye et al. (2021) and Taylor et al. (2022) use similar approaches for LM pre-training.
Taylor et al. (2022) use a special token to mimic an internal working memory.
Zelikman et al. (2022), Yu et al. (2022), Ouyang et al. (2022), Chung et al. (2022), Iyer et al. (2022), and Ho et al. (2022) use instruction fine-tuning to improve reasoning skills.

Comparison and limitations of abstract reasoning

Reasoning can be seen as breaking down a problem into smaller parts.
Exploring all possible reasoning paths is difficult and there is no guarantee that the steps are valid.
A reasoning language model seeks to improve its context to increase the chance of a correct answer.
Mistakes on mathematical operations or known facts can lead to wrong output.
External tools such as search engines and calculators can be used to validate intermediate steps.

Using tools and act

LM research allows access to knowledge not stored in weights
LM can query external modules such as python interpreter or search engine

Calling another model

Tool can be another neural network or the LM itself
Iteratively refine output by repeatedly calling the model
Re3 generates stories of over two thousand words
Re3 first generates plan, setting, and characters
Injects information from plan and story state into new GPT3 prompt
Learned detailed outliner expands brief initial outline
PEER model initialized from LM-Adapted T5 and trained on Wikipedia edits
Models can be chained together to refine complex tasks
Leveraging other modalities such as vision and language
Flamingo models trained on large-scale multimodal web corpora
Socratic Models allow models to exchange information and acquire new multimodal capabilities
Images can be incorporated to improve reasoning capabilities of moderate size LMs

Information retrieval

LMs can be augmented with memory units to improve reasoning abilities
Knowledge can be offloaded from the LM by retrieving from an external source
Memory augmentation strategies help the LM avoid producing non-factual and out-of-date information
Two types of retrievers exist: dense and sparse
Various works augment LMs with a dense retriever by appending the retrieved documents to the current context
He et al. (2022) and Trivedi et al. (2022) combine a retriever with reasoning via chain-of-thoughts prompting
LaMDA and BlenderBot are agent-like LMs designed for dialogue applications
ReAct interleaves reasoning and acting
WebGPT and WebShop are LM-based agents that can interact with a custom text-based web-browsing environment
Most works on web navigation and computer-control assume the typical human interface

Computing via symbolic modules and code interpreters

LMs are prone to errors when dealing with large numbers or complex arithmetics
GPT3 cannot perform out-of-distribution addition
Reinforcement learning action space is equipped with symbolic modules
Mind’s Eye uses a physics engine to ground LMs physical reasoning
PAL uses CoT prompting and python code to decompose tasks

Acting on the virtual and physical world

LM’s can be used to control virtual and physical agents in simulated and real-world environments
LM’s can be used to represent goals and plans, and improve learning and generalization on tasks beyond language processing
LM’s can be used to break down high-level tasks into a series of simple commands
LM’s can be used to write robot policy code given natural language commands
LM’s can encode common sense knowledge about the world
LM’s lack contextual grounding, making it difficult to use them for decision making in the real-world
NLMap-SayCan is a framework to gather and integrate contextual information into LM planners
RT-1 leverages large-scale, diverse, task-agnostic robotic datasets to learn a model that can follow natural language instructions

Learning to reason, use tools, and act

LMs can be augmented with reasoning and tools
Approaches to teach LMs reasoning and tools

Supervision

Teaching LMs to reason and act can be done by providing them with human-written demonstrations
Common ways of doing this are few-shot prompting and regular gradient-based learning
Supervised learning is usually done after pre-training with a language modeling objective
Taylor et al. (2022) propose to mix pre-training texts with human-annotated examples containing explicit reasoning
Some authors use supervised fine-tuning followed by reinforcement learning from human feedback
Few-shot prompting is common for teaching LMs to reason and act
Performance depends on the format of examples, the choice of few-shot examples, and the order in which they are presented
Bootstrapping combines data efficiency of few-shot prompting with advantages of fine-tuning
Bootstrapping can be applied to teach models to reason and use tools

Reinforcement learning

Supervised learning from human-created prompts is effective to teach models to reason and act, but is difficult and costly to obtain.
Human preference data (rankings or likes/dislikes) is easier, faster, and cheaper to obtain than full demonstrations.
Reinforcement Learning (RL) has proven successful for learning complex behaviors through feedback-based interaction with an environment.
RL is a natural framework for training LMs to act and use tools since many of these tools are nondifferentiable.
Most existing work on RL and ALMs has focused on teaching LMs how to act rather than reason.
Hard-coded reward functions are used to update the weights of the model using a scalar reward generated by a task-dependent function.
RL has been used to teach a LM to search and fetch additional factual information, navigate a virtual shop, and interface with a graph-based knowledge base.
Human feedback can be used to improve the quality of machine-generated text.
RLHF (Reinforcement Learning from Human Feedback) uses human preferences as an evaluation metric and as an objective function to optimize the language model.
RLHF has been applied directly on top of a general-purpose LM pre-trained via self-supervised learning, and after an initial supervised fine-tuning phase.
RLHF has been used to teach a LM to use an external tool (e.g. search engine, web-browser, information retrieval module).
RLHF has also proven useful for a wide range of language generation tasks.

Limitations and future directions

Recent algorithmic progress and performance improvements in RL methods
Instability issues can make training difficult and slow
Supervised learning is an efficient and robust way to fine-tune language models
Assumes existence of a large number of expert demonstrations
Bootstrapping methods and offline RL combine “the best of both worlds”
ILQL combines online RL and supervised learning
Toolformer teaches itself to use tools in a self-supervised way
Text used for supervision can lack context
ALMs can access recent information from the external world
Tradeoff between memorizing and querying tools
Generalizing the non-parametric framework
ALMs instantiate autonomous intelligent agent concept
ALMs offer truthfulness, estimating and reducing uncertainty, and interpretability

Link to paper#

Abstract#

Paper Content#

Reasoning#

Recursive prompting#

Explicitly teaching language models to reason#

Comparison and limitations of abstract reasoning#

Using tools and act#

Calling another model#

Information retrieval#

Computing via symbolic modules and code interpreters#

Acting on the virtual and physical world#

Learning to reason, use tools, and act#

Supervision#

Reinforcement learning#

Limitations and future directions#

Link to paper

Abstract

Paper Content

Reasoning

Recursive prompting

Explicitly teaching language models to reason

Comparison and limitations of abstract reasoning

Using tools and act

Calling another model

Information retrieval

Computing via symbolic modules and code interpreters

Acting on the virtual and physical world

Learning to reason, use tools, and act

Supervision

Reinforcement learning

Limitations and future directions