Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Large language models have reignited debate about whether human cognitive capacities can emerge from them.
Of particular interest is the ability of these models to reason about novel problems without training.
A comparison was made between human reasoners and a large language model on a range of analogical tasks.
The language model displayed a strong capacity for abstract pattern induction, matching or surpassing human capabilities.

Paper Content

Introduction

Analogical reasoning is an important part of human intelligence and creativity
Tests of analogical reasoning ability measure fluid intelligence
Recent research has focused on training neural networks on analogy problems
An alternative approach uses large language models (LLMs)
LLMs can perform few-shot and zero-shot reasoning
GPT-3 was evaluated on zero-shot analogy tasks and compared to human behavior
GPT-3 performed as well or better than humans in most conditions

Results

GPT-3 is a language model with 175 billion parameters
It was trained on a web-based corpus of natural language
It was trained using two objectives
We evaluated GPT-3 on analogy tasks, letter string analogies, and four-term verbal analogies
We also performed a qualitative evaluation on a problem-solving task

Matrix reasoning problems

Digit Matrices problem set is structured similarly to Matzen et al.
Consists of digit transformations and logic problems
Transformation problems defined by 3 rules: constant, distribution-of-3, and progression
Logic problems defined by single rule involving set relation: OR, AND, or XOR
GPT-3 outperformed human participants on all problem types
Performance levels similar across problem subtypes
Problems with progression rules more difficult
Problem difficulty driven by number of unique relations
Analogical mapping easier when more constraints support correct mapping
Digit Matrices text-based format may not engage same processes as visual format
Performance similar across one-rule, two-rule, and logic problems
Performance improved when presented with problems in order of increasing complexity
GPT-3 benefited from contextual information, but also impaired by it

Letter string analogies

Presented letter string analogies to GPT-3
Evaluated GPT-3 without prompt and on alternative format
Generated multiple instances by varying letters used in target
18 basic successor relation problems
16 problems involving removal of redundant character
6 letter-to-number problems
18 grouping problems
16 problems with longer target
17 successor-to-predecessor problems
17 problems used as context to test for relational priming

Four-term verbal analogies

Evaluated GPT-3 on UCLA-VAT dataset
Used colon notation for problems
Each problem had two answer choices
GPT-3 was presented with problem and answer choice
Log probability used to select answer choice

Analogy in natural language problem-solving

Analogical reasoning is important in real-world problem-solving
GPT-3 was tested on naturalistic analogy problems based on stories
GPT-3 was able to identify the solution to the target problem when presented with the source story
GPT-3 was able to explain the analogy and identify the correspondences between the source story and target problem
GPT-3 was less successful when presented with distracting, non-analogous stories
GPT-3 was able to identify the analogy when prompted
GPT-3 was not able to use the analogy to discover an appropriate solution due to lack of physical reasoning skills

Discussion

GPT-3 displays an emergent ability to reason by analogy
GPT-3 outperformed human participants on a novel text-based problem set
GPT-3 can solve analogies based on meaningful relations
GPT-3’s performance is limited to the information provided in its local context
GPT-3 lacks long-term memory and physical understanding of the world
GPT-3 exhibits a general capacity to identify and generalize relational patterns
GPT-3’s analogical capacity is often considered the core of human intelligence
GPT-3’s analogical capacity may be due to sheer size and diversity of training data
GPT-3’s analogical capacity may be due to similarity and indirection of transformer architecture
GPT-3’s analogical capacity is fundamentally different from human intelligence
GPT-3’s analogical capacity is parasitic on natural human intelligence

Methods

Code availability

Code for simulations available on GitHub
Data from human behavioral experiments available on GitHub

Gpt-3

Queried GPT-3 through OpenAI API
Used text-davinci-003 model instance
Performed additional experiments with davinci and text-davinci-002
Temperature set to 0
Max tokens set to 10 for Digit Matrices, 20 for letter string analogies, 10 for four-term verbal analogies, and 256 for story analogies
Used log probabilities to evaluate GPT-3 on multiple-choice problems

Digit matrices

Two major problem categories: transformation and logic
Transformation problems had 1-5 rules
Logic problems had 1 rule
Transformation rules: constant, distribution-of-3, progression
Logic rules: OR, XOR, AND
Spatial position of elements aligned or permuted
6 one-rule subtypes, 6 two-rule subtypes, 10 subtypes for three-rule, four-rule, five-rule, and logic problems
100 instances of each subtype (except progression)
Distractors generated using different methods for transformation and logic problems
43 participants in first experiment, 47 in second
First experiment: 32 problem subtypes in random order
Second experiment: 42 problem subtypes in order of increasing complexity

Story analogies

Story analogy problems were tested in OpenAI playground
Materials were taken from two sources
Responses were not edited or shortened
Figure S1 shows transfer from few-rule to many-rule Digit Matrix problems
Text-davinci-002 did not show the same contextual effects as text-davinci-003
Text-davinci-002 was not able to generalize from few-rule to many-rule problems
Performance decreased significantly on four-rule and five-rule problems

S1 solutions to example matrix reasoning problems

Solution to visual matrix reasoning problem in Figure 1a is option 5
Solution to Digit Matrix problem in Figure 1b is option 7
Solutions to problems in Figure 2 are in Table S1

S2 story analogy results

Evaluated analogical reasoning in natural language problemsolving
GPT-3 responses presented in bold text

S2.1 solution to radiation problem in isolation

GPT-3 proposed a solution to a radiation problem
Solution involves using brachytherapy, which involves placing a radioactive source directly into or near the tumor

S2.2 solution to radiation problem following general story

Genie is transferring jewels from one bottle to another
Target problem involves transferring gumballs from one bowl to another
Genie uses a staff to stretch from one bottle to the other
Target problem uses materials provided to transfer gumballs
Source story and target problem both involve transferring something from one place to another

S2.3 solution to radiation problem with distracting stories

GPT-3 proposed a solution to the radiation problem when presented with a general story and two distracting stories.
GPT-3 did not identify the convergence solution in this case.
The Wine Merchants story is about two merchants competing to bring a barrel of wine to a rich man.
The General story is about a general who divides his army into small groups to capture a fortress.
The Identical Twins story is about two twins who devise a plan to win a race.
The target problem is a doctor faced with a patient suffering from an inoperable tumor.
The proposed solution is brachytherapy and stereotactic radiotherapy.
Brachytherapy involves placing a radioactive source directly into the tumor.
Stereotactic radiotherapy involves using a machine to deliver a high-intensity beam of radiation to the tumor from outside the body.

S2.4 solution to radiation problem with distracting stories and prompt

GPT-3 did not propose a solution to the radiation problem based on the analogy with the general story
A prompt was used to encourage GPT-3 to consider the previously presented stories when generating its solution
GPT-3 correctly identified the convergence solution and identified the general story as the relevant source
The doctor could use the same tactic as the general in the second story to divide the rays into small groups and send each group down a different path to the tumor

S2.5 solution to gumball problem in isolation

GPT-3 proposed a solution to the gumball problem
Solution involves using materials provided: aluminum walking cane, posterboard, cardboard tube, scissors, string, tape, paper clips, and rubber bands
Steps to solution: cut posterboard, tie string to tube and cane, place tube in bowl, use paper clips to secure tube, use rubber bands to secure tube to cane, pull cane towards you with string to transfer gumballs

S2.6 solution to gumball problem following magic staff story

GPT-3 proposed a solution to the gumball problem involving a magic staff
Children identified a solution involving an aluminum walking cane
GPT-3 identified the high-level analogy between the source story and the target problem
GPT-3 was tested on matrix reasoning problems, digit matrices, letter string analogies, and verbal analogies
GPT-3 matched or exceeded human performance on all tests
GPT-3 was able to generalize the structure inferred from few-rule problems to more complex many-rule problems
GPT-3 was more reliable at identifying the successor-to-predecessor mapping than humans
GPT-3 was more likely to produce a response involving a successor relation when primed with a context problem

Link to paper#

Abstract#

Paper Content#

Introduction#

Results#

Matrix reasoning problems#

Letter string analogies#

Four-term verbal analogies#

Analogy in natural language problem-solving#

Discussion#

Methods#

Code availability#

Gpt-3#

Digit matrices#

Story analogies#

S1 solutions to example matrix reasoning problems#

S2 story analogy results#

S2.1 solution to radiation problem in isolation#

S2.2 solution to radiation problem following general story#

S2.3 solution to radiation problem with distracting stories#

S2.4 solution to radiation problem with distracting stories and prompt#

S2.5 solution to gumball problem in isolation#

S2.6 solution to gumball problem following magic staff story#

Link to paper

Abstract

Paper Content

Introduction

Results

Matrix reasoning problems

Letter string analogies

Four-term verbal analogies

Analogy in natural language problem-solving

Discussion

Methods

Code availability

Gpt-3

Digit matrices

Story analogies

S1 solutions to example matrix reasoning problems

S2 story analogy results

S2.1 solution to radiation problem in isolation

S2.2 solution to radiation problem following general story

S2.3 solution to radiation problem with distracting stories

S2.4 solution to radiation problem with distracting stories and prompt

S2.5 solution to gumball problem in isolation

S2.6 solution to gumball problem following magic staff story