Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Humans can show sudden improvements in task performance linked to insight
Artificial neural networks can also show insight-like behaviour
Insight-like behaviour in neural networks is caused by noise, attentional gating and regularisation

Paper Content

Introduction

Ability to learn from experience is common to animals and some artificial agents
Neural networks trained with SGD are a current theory of human learning
Humans may sometimes learn in an abrupt manner
Insights occur when an agent finds a novel problem solution by restructuring an existing task representation
Insights involve unconscious processes becoming conscious
Insights can be accompanied by a feeling of relief or pleasure
Insights are related to brain regions distinct from those associated with gradual learning
Insight-like behaviour can emerge from gradual learning algorithms
Insights trigger abrupt behavioural changes
Insights occur selectively in some subjects
Insights occur “spontaneously” without external cues
Regularisation and gating can cause discontinuities in learning in neural networks

Results

99 participants and 99 neural networks performed a decision task
Task required a binary choice about circular arrays of moving dots
Dots were characterized by two features with different degrees of noise
Task provided a hidden opportunity to improve one’s decision strategy
Initial training phase only motion direction predicted correct choice
Later phase both features could be used to determine choice
Post-experimental questionnaire asked if participants noticed a rule, how long it took, and if they paid attention to colour

Human behaviour

Participants learned the response mapping for the four motion directions well
Noise was added to the motion, while the colour remained uncorrelated
Performance was heavily diminished in the conditions with the largest amounts of motion noise
Performance improvements largely beyond these low baseline levels can only be attributed to colour use
Noise level continued to influence performance in the motion and colour phase
Onset of the colour correlation triggered performance improvements across all coherence levels
57.6% of participants reported using colour, 42.4% indicated not to have noticed or used the colour
Performance best fit by a non-linear sigmoid function, indicating at least a subsection of putative insight participants
48.5% of participants had values larger than the 100% percentile of the control distribution, suggesting abrupt insight occurred selectively
79.2% of insight participants self-reported to have used colour to make correct choices
Insight subjects started to perform significantly better in the lowest coherence trials once the motion and colour phase started
Reaction times reflected the same improvements upon switching to the colour strategy
Insight moments were defined as the time points of inflection of the fitted sigmoid function
Average delay of insight onset was 1.3 task blocks (130 trials)

Neural network behaviour

99 network models were trained on a decision making task with two input nodes (colour and motion) and one output node
Each input node was sampled from a normal distribution with a mean and standard deviation
The network multiplied each input node by two parameters, a corresponding weight and a gate
L1-regularisation was added to the gate weights to introduce competitive dynamics between the input channels
The network was trained with online gradient descent with Gaussian white noise added to the gradient update and a fixed learning rate
The network was trained in a curriculum matched to the human task
The input sequences the networks received were sampled from the same ten input sequences that humans were exposed to
The networks were given a slightly longer training phase than the humans
L2-regularisation was also added to the gate weights to compare the effect of the aggressiveness of the regulariser
Non-regularised networks showed no insight-like behaviour

Origins of insight-like behaviour in neural networks

Established behavioural similarity between L1-networks and humans in an insight task
Investigated dynamics of gate weights and effects of noise in insight vs. no-insight networks
Investigated role of regularisation strength parameter λ
Colour gate weight gradients were significantly larger in insight compared to no-insight L1-networks
Motion gate weight gradients did not differ
Insight networks had larger colour gate weight gradients even before any behavioural changes were apparent
Change point analysis confirmed the onset of the motion and colour phase to be the change point of the colour gradient mean
Difference between insight and no-insight networks for colour gates around the individually fitted switch points
Insight networks’ increased use of colour inputs was particularly evident at the end of learning
Colour weights were significantly larger already at the start of learning in insight networks
Varying the level of noise added during gradient updating increased the proportion of networks exhibiting insight-like behaviour
Adding noise to only weights during the updates was sufficient to induce insight-like behaviour
Adding noise only to the colour parameter updates quickly led to substantial amounts of insight-like behavioural switches
Momentary noise fluctuations were mostly responsible for the effects
Regularisation parameter λ affects two of the key characteristics of human insight -selectivity and delay

Discussion

Investigated insight-like learning behaviour in humans and neural networks
Binary decision making task with hidden regularity
Subset of regularised neural networks with multiplicative gates of input channels displayed spontaneous, jump-like learning
Networks exhibited all key characteristics of human insight-like behaviour
Trained with standard stochastic gradient descent
Behavioural characteristics of aha-moments can arise from gradual learning mechanisms
Factors causing insight-like behaviour in L1-networks: noise, attentional gating, regularisation
Noise, regularisation involved in generation of insights
Possible link between sleep, synaptic scaling and insight
Cognitive control as regularised optimisation
Occurrence of insight-like behaviour specific to L1-regularised networks
L1-regularisation beneficial in more complex task settings

Methods

Task

Employed perceptual decision task with binary choice about circular arrays of moving dots
Two features: motion direction (4 orthogonal directions) and colour (orange or purple)
Noise level of motion feature varied in 5 steps
Colour difficulty constant
Task coded in JavaScript
Participants restricted to use desktops, Firefox or Chrome browser
200 moving dots with a radius of 7 pixels each
Dots had a lifetime of 10 frames before they were replaced
Trial duration 2000 ms
Binary feedback symbol (happy or sad smiley)
Inter trial interval (ITI) of 400, 600, 800 or 1000 ms
First 400 trials, motion phase, correct binary choice related to stimulus motion
Two response keys, “X” and “M”
Motion coherence set to 100% in first block
Second task block introduced 3 lowest levels of motion noise
Assessed how well participants had learned to discriminate motion direction after 4th block
Motion and colour phase, colour feature predictive of correct choice in addition to motion feature
Last task block, asked participants if they noticed a rule, used colour feature, and replicated mapping between stimulus colour and motion directions

Human participants

Recruited participants aged 18-30 online
Required participants to show learning of stimulus classification
Probed accuracy on 3 easiest, least noisiest coherence levels
Excluded 96 participants due to insufficient accuracy
99 participants passed accuracy criterion and completed both task phases
34 participants excluded due to technical problems or quitting
All participants gave informed consent
Protocol approved by local ethics committee
Participants received 3£ for first task phase, 7£ for both task phases

Neural networks

Used same classification procedure for neural networks
Funding parties had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Modelling of insight-like switches

Linear model with two free parameters
Step model with three free parameters
Sigmoid function with three free parameters
Preference for sigmoid function over linear and step models

Human participants

Modeled human participants’ data using sigmoid functions
Defined criterion to assess whether subject had switched to colour strategy
Corrected for general fit of data to model
Identified insight participants whose corrected slope steepness was outside of control group’s distribution
57.6% of participants indicated they used colour to press correctly, 79.2% of insight participants overlapped with self reports

Hidden layer model

Tested task on more complex neural network model
Applied L1-regularisation on gate weights
18.2% of networks exhibited insight-like behaviour
Applied L2-regularisation on gate weights, 51.5% of networks exhibited insight-like behaviour
Results mirror observations from one-layer linearity

Weight and gate differences between l1-and l2-regularised networks

At correlation onset, motion and colour weights were similar between L1 and L2 networks
After learning, colour weights were higher in L2 networks than L1 networks
Gate weights were lower in L1 networks than L2 networks at correlation onset
After learning, colour gate weights were higher in L2 networks than L1 networks
After learning, motion gate weights were lower in L2 networks than L1 networks

Gaussian noise differences at weights and gates between insight and no-insight networks

Comparing Gaussian noise at weights and gates around switch points revealed no differences between insight and no-insight networks
No differences in noise at start or end of learning
Stimuli and task design shown in Figure 1
Humans: task performance and insight-like strategy switches shown in Figure 2
L1-regularised neural networks: task performance and insight-like strategy switches shown in Figure 3
Gate and weight size differences at start and end of learning and dynamics shown in Figure 4
Switch-aligned performance and switch point distributions for L1-and L2-regularised neural networks shown in Figure 5

Link to paper#

Abstract#

Paper Content#

Introduction#

Results#

Human behaviour#

Neural network behaviour#

Origins of insight-like behaviour in neural networks#

Discussion#

Methods#

Task#

Human participants#

Neural networks#

Modelling of insight-like switches#

Human participants#

Hidden layer model#

Weight and gate differences between l1-and l2-regularised networks#

Gaussian noise differences at weights and gates between insight and no-insight networks#

Link to paper

Abstract

Paper Content

Introduction

Results

Human behaviour

Neural network behaviour

Origins of insight-like behaviour in neural networks

Discussion

Methods

Task

Human participants

Neural networks

Modelling of insight-like switches

Human participants

Hidden layer model

Weight and gate differences between l1-and l2-regularised networks

Gaussian noise differences at weights and gates between insight and no-insight networks