Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Language models (LMs) can be augmented with reasoning skills and the ability to use tools.
- Augmentations can be used separately or in combination.
- Augmented LMs (ALMs) can use external modules to expand their context processing ability.
- ALMs can learn to reason, use tools, and act while still performing standard natural language tasks.
- ALMs have the potential to address limitations of traditional LMs.
Paper Content
Reasoning
- Reasoning is the ability to make inferences using evidence and logic
- Reasoning can be divided into multiple types of skills
- Reasoning often involves deductions from inference chains
- Previous work has shown that LLMs can solve simple reasoning problems but fail at complex reasoning
- Challenge with complex reasoning problems for LMs is to correctly obtain the solution by composing the correct answers
- Works related to three popular paradigms for eliciting reasoning in LMs discussed
Recursive prompting
- Problem decomposition can be used to solve complex tasks
- Problem decomposition can be used to solve sub-problems independently or sequentially
- Least-to-most prompting decomposes complex problems into sub-problems
- Recent works employ in-context learning to decompose problems
- Different works use different methods to decompose problems
Explicitly teaching language models to reason
- Prompting approaches require model scale and need to discover prompts that elicit multi-step computation tasks.
- Scratchpads are fine-tuned on example tasks with associated computation steps.
- Nye et al. (2021) and Taylor et al. (2022) use similar approaches for LM pre-training.
- Taylor et al. (2022) use a special token to mimic an internal working memory.
- Zelikman et al. (2022), Yu et al. (2022), Ouyang et al. (2022), Chung et al. (2022), Iyer et al. (2022), and Ho et al. (2022) use instruction fine-tuning to improve reasoning skills.
Comparison and limitations of abstract reasoning
- Reasoning can be seen as breaking down a problem into smaller parts.
- Exploring all possible reasoning paths is difficult and there is no guarantee that the steps are valid.
- A reasoning language model seeks to improve its context to increase the chance of a correct answer.
- Mistakes on mathematical operations or known facts can lead to wrong output.
- External tools such as search engines and calculators can be used to validate intermediate steps.
Using tools and act
- LM research allows access to knowledge not stored in weights
- LM can query external modules such as python interpreter or search engine
Calling another model
- Tool can be another neural network or the LM itself
- Iteratively refine output by repeatedly calling the model
- Re3 generates stories of over two thousand words
- Re3 first generates plan, setting, and characters
- Injects information from plan and story state into new GPT3 prompt
- Learned detailed outliner expands brief initial outline
- PEER model initialized from LM-Adapted T5 and trained on Wikipedia edits
- Models can be chained together to refine complex tasks
- Leveraging other modalities such as vision and language
- Flamingo models trained on large-scale multimodal web corpora
- Socratic Models allow models to exchange information and acquire new multimodal capabilities
- Images can be incorporated to improve reasoning capabilities of moderate size LMs
Information retrieval
- LMs can be augmented with memory units to improve reasoning abilities
- Knowledge can be offloaded from the LM by retrieving from an external source
- Memory augmentation strategies help the LM avoid producing non-factual and out-of-date information
- Two types of retrievers exist: dense and sparse
- Various works augment LMs with a dense retriever by appending the retrieved documents to the current context
- He et al. (2022) and Trivedi et al. (2022) combine a retriever with reasoning via chain-of-thoughts prompting
- LaMDA and BlenderBot are agent-like LMs designed for dialogue applications
- ReAct interleaves reasoning and acting
- WebGPT and WebShop are LM-based agents that can interact with a custom text-based web-browsing environment
- Most works on web navigation and computer-control assume the typical human interface
Computing via symbolic modules and code interpreters
- LMs are prone to errors when dealing with large numbers or complex arithmetics
- GPT3 cannot perform out-of-distribution addition
- Reinforcement learning action space is equipped with symbolic modules
- Mind’s Eye uses a physics engine to ground LMs physical reasoning
- PAL uses CoT prompting and python code to decompose tasks
Acting on the virtual and physical world
- LM’s can be used to control virtual and physical agents in simulated and real-world environments
- LM’s can be used to represent goals and plans, and improve learning and generalization on tasks beyond language processing
- LM’s can be used to break down high-level tasks into a series of simple commands
- LM’s can be used to write robot policy code given natural language commands
- LM’s can encode common sense knowledge about the world
- LM’s lack contextual grounding, making it difficult to use them for decision making in the real-world
- NLMap-SayCan is a framework to gather and integrate contextual information into LM planners
- RT-1 leverages large-scale, diverse, task-agnostic robotic datasets to learn a model that can follow natural language instructions
Learning to reason, use tools, and act
- LMs can be augmented with reasoning and tools
- Approaches to teach LMs reasoning and tools
Supervision
- Teaching LMs to reason and act can be done by providing them with human-written demonstrations
- Common ways of doing this are few-shot prompting and regular gradient-based learning
- Supervised learning is usually done after pre-training with a language modeling objective
- Taylor et al. (2022) propose to mix pre-training texts with human-annotated examples containing explicit reasoning
- Some authors use supervised fine-tuning followed by reinforcement learning from human feedback
- Few-shot prompting is common for teaching LMs to reason and act
- Performance depends on the format of examples, the choice of few-shot examples, and the order in which they are presented
- Bootstrapping combines data efficiency of few-shot prompting with advantages of fine-tuning
- Bootstrapping can be applied to teach models to reason and use tools
Reinforcement learning
- Supervised learning from human-created prompts is effective to teach models to reason and act, but is difficult and costly to obtain.
- Human preference data (rankings or likes/dislikes) is easier, faster, and cheaper to obtain than full demonstrations.
- Reinforcement Learning (RL) has proven successful for learning complex behaviors through feedback-based interaction with an environment.
- RL is a natural framework for training LMs to act and use tools since many of these tools are nondifferentiable.
- Most existing work on RL and ALMs has focused on teaching LMs how to act rather than reason.
- Hard-coded reward functions are used to update the weights of the model using a scalar reward generated by a task-dependent function.
- RL has been used to teach a LM to search and fetch additional factual information, navigate a virtual shop, and interface with a graph-based knowledge base.
- Human feedback can be used to improve the quality of machine-generated text.
- RLHF (Reinforcement Learning from Human Feedback) uses human preferences as an evaluation metric and as an objective function to optimize the language model.
- RLHF has been applied directly on top of a general-purpose LM pre-trained via self-supervised learning, and after an initial supervised fine-tuning phase.
- RLHF has been used to teach a LM to use an external tool (e.g. search engine, web-browser, information retrieval module).
- RLHF has also proven useful for a wide range of language generation tasks.
Limitations and future directions
- Recent algorithmic progress and performance improvements in RL methods
- Instability issues can make training difficult and slow
- Supervised learning is an efficient and robust way to fine-tune language models
- Assumes existence of a large number of expert demonstrations
- Bootstrapping methods and offline RL combine “the best of both worlds”
- ILQL combines online RL and supervised learning
- Toolformer teaches itself to use tools in a self-supervised way
- Text used for supervision can lack context
- ALMs can access recent information from the external world
- Tradeoff between memorizing and querying tools
- Generalizing the non-parametric framework
- ALMs instantiate autonomous intelligent agent concept
- ALMs offer truthfulness, estimating and reducing uncertainty, and interpretability