Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Humans can show sudden improvements in task performance linked to insight
  • Artificial neural networks can also show insight-like behaviour
  • Insight-like behaviour in neural networks is caused by noise, attentional gating and regularisation

Paper Content

Introduction

  • Ability to learn from experience is common to animals and some artificial agents
  • Neural networks trained with SGD are a current theory of human learning
  • Humans may sometimes learn in an abrupt manner
  • Insights occur when an agent finds a novel problem solution by restructuring an existing task representation
  • Insights involve unconscious processes becoming conscious
  • Insights can be accompanied by a feeling of relief or pleasure
  • Insights are related to brain regions distinct from those associated with gradual learning
  • Insight-like behaviour can emerge from gradual learning algorithms
  • Insights trigger abrupt behavioural changes
  • Insights occur selectively in some subjects
  • Insights occur “spontaneously” without external cues
  • Regularisation and gating can cause discontinuities in learning in neural networks

Results

  • 99 participants and 99 neural networks performed a decision task
  • Task required a binary choice about circular arrays of moving dots
  • Dots were characterized by two features with different degrees of noise
  • Task provided a hidden opportunity to improve one’s decision strategy
  • Initial training phase only motion direction predicted correct choice
  • Later phase both features could be used to determine choice
  • Post-experimental questionnaire asked if participants noticed a rule, how long it took, and if they paid attention to colour

Human behaviour

  • Participants learned the response mapping for the four motion directions well
  • Noise was added to the motion, while the colour remained uncorrelated
  • Performance was heavily diminished in the conditions with the largest amounts of motion noise
  • Performance improvements largely beyond these low baseline levels can only be attributed to colour use
  • Noise level continued to influence performance in the motion and colour phase
  • Onset of the colour correlation triggered performance improvements across all coherence levels
  • 57.6% of participants reported using colour, 42.4% indicated not to have noticed or used the colour
  • Performance best fit by a non-linear sigmoid function, indicating at least a subsection of putative insight participants
  • 48.5% of participants had values larger than the 100% percentile of the control distribution, suggesting abrupt insight occurred selectively
  • 79.2% of insight participants self-reported to have used colour to make correct choices
  • Insight subjects started to perform significantly better in the lowest coherence trials once the motion and colour phase started
  • Reaction times reflected the same improvements upon switching to the colour strategy
  • Insight moments were defined as the time points of inflection of the fitted sigmoid function
  • Average delay of insight onset was 1.3 task blocks (130 trials)

Neural network behaviour

  • 99 network models were trained on a decision making task with two input nodes (colour and motion) and one output node
  • Each input node was sampled from a normal distribution with a mean and standard deviation
  • The network multiplied each input node by two parameters, a corresponding weight and a gate
  • L1-regularisation was added to the gate weights to introduce competitive dynamics between the input channels
  • The network was trained with online gradient descent with Gaussian white noise added to the gradient update and a fixed learning rate
  • The network was trained in a curriculum matched to the human task
  • The input sequences the networks received were sampled from the same ten input sequences that humans were exposed to
  • The networks were given a slightly longer training phase than the humans
  • L2-regularisation was also added to the gate weights to compare the effect of the aggressiveness of the regulariser
  • Non-regularised networks showed no insight-like behaviour

Origins of insight-like behaviour in neural networks

  • Established behavioural similarity between L1-networks and humans in an insight task
  • Investigated dynamics of gate weights and effects of noise in insight vs. no-insight networks
  • Investigated role of regularisation strength parameter λ
  • Colour gate weight gradients were significantly larger in insight compared to no-insight L1-networks
  • Motion gate weight gradients did not differ
  • Insight networks had larger colour gate weight gradients even before any behavioural changes were apparent
  • Change point analysis confirmed the onset of the motion and colour phase to be the change point of the colour gradient mean
  • Difference between insight and no-insight networks for colour gates around the individually fitted switch points
  • Insight networks’ increased use of colour inputs was particularly evident at the end of learning
  • Colour weights were significantly larger already at the start of learning in insight networks
  • Varying the level of noise added during gradient updating increased the proportion of networks exhibiting insight-like behaviour
  • Adding noise to only weights during the updates was sufficient to induce insight-like behaviour
  • Adding noise only to the colour parameter updates quickly led to substantial amounts of insight-like behavioural switches
  • Momentary noise fluctuations were mostly responsible for the effects
  • Regularisation parameter λ affects two of the key characteristics of human insight -selectivity and delay

Discussion

  • Investigated insight-like learning behaviour in humans and neural networks
  • Binary decision making task with hidden regularity
  • Subset of regularised neural networks with multiplicative gates of input channels displayed spontaneous, jump-like learning
  • Networks exhibited all key characteristics of human insight-like behaviour
  • Trained with standard stochastic gradient descent
  • Behavioural characteristics of aha-moments can arise from gradual learning mechanisms
  • Factors causing insight-like behaviour in L1-networks: noise, attentional gating, regularisation
  • Noise, regularisation involved in generation of insights
  • Possible link between sleep, synaptic scaling and insight
  • Cognitive control as regularised optimisation
  • Occurrence of insight-like behaviour specific to L1-regularised networks
  • L1-regularisation beneficial in more complex task settings

Methods

Task

  • Employed perceptual decision task with binary choice about circular arrays of moving dots
  • Two features: motion direction (4 orthogonal directions) and colour (orange or purple)
  • Noise level of motion feature varied in 5 steps
  • Colour difficulty constant
  • Task coded in JavaScript
  • Participants restricted to use desktops, Firefox or Chrome browser
  • 200 moving dots with a radius of 7 pixels each
  • Dots had a lifetime of 10 frames before they were replaced
  • Trial duration 2000 ms
  • Binary feedback symbol (happy or sad smiley)
  • Inter trial interval (ITI) of 400, 600, 800 or 1000 ms
  • First 400 trials, motion phase, correct binary choice related to stimulus motion
  • Two response keys, “X” and “M”
  • Motion coherence set to 100% in first block
  • Second task block introduced 3 lowest levels of motion noise
  • Assessed how well participants had learned to discriminate motion direction after 4th block
  • Motion and colour phase, colour feature predictive of correct choice in addition to motion feature
  • Last task block, asked participants if they noticed a rule, used colour feature, and replicated mapping between stimulus colour and motion directions

Human participants

  • Recruited participants aged 18-30 online
  • Required participants to show learning of stimulus classification
  • Probed accuracy on 3 easiest, least noisiest coherence levels
  • Excluded 96 participants due to insufficient accuracy
  • 99 participants passed accuracy criterion and completed both task phases
  • 34 participants excluded due to technical problems or quitting
  • All participants gave informed consent
  • Protocol approved by local ethics committee
  • Participants received 3£ for first task phase, 7£ for both task phases

Neural networks

  • Used same classification procedure for neural networks
  • Funding parties had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Modelling of insight-like switches

  • Linear model with two free parameters
  • Step model with three free parameters
  • Sigmoid function with three free parameters
  • Preference for sigmoid function over linear and step models

Human participants

  • Modeled human participants’ data using sigmoid functions
  • Defined criterion to assess whether subject had switched to colour strategy
  • Corrected for general fit of data to model
  • Identified insight participants whose corrected slope steepness was outside of control group’s distribution
  • 57.6% of participants indicated they used colour to press correctly, 79.2% of insight participants overlapped with self reports

Hidden layer model

  • Tested task on more complex neural network model
  • Applied L1-regularisation on gate weights
  • 18.2% of networks exhibited insight-like behaviour
  • Applied L2-regularisation on gate weights, 51.5% of networks exhibited insight-like behaviour
  • Results mirror observations from one-layer linearity

Weight and gate differences between l1-and l2-regularised networks

  • At correlation onset, motion and colour weights were similar between L1 and L2 networks
  • After learning, colour weights were higher in L2 networks than L1 networks
  • Gate weights were lower in L1 networks than L2 networks at correlation onset
  • After learning, colour gate weights were higher in L2 networks than L1 networks
  • After learning, motion gate weights were lower in L2 networks than L1 networks

Gaussian noise differences at weights and gates between insight and no-insight networks

  • Comparing Gaussian noise at weights and gates around switch points revealed no differences between insight and no-insight networks
  • No differences in noise at start or end of learning
  • Stimuli and task design shown in Figure 1
  • Humans: task performance and insight-like strategy switches shown in Figure 2
  • L1-regularised neural networks: task performance and insight-like strategy switches shown in Figure 3
  • Gate and weight size differences at start and end of learning and dynamics shown in Figure 4
  • Switch-aligned performance and switch point distributions for L1-and L2-regularised neural networks shown in Figure 5