Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Finding answers to given questions is important in science
  • Coming up with good questions is important in science
  • Artificial scientists can learn to answer given questions and invent new questions
  • Artificial scientists are biased towards simpler, least costly experiments with surprising outcomes
  • An empirical analysis of automatic generation of interesting experiments is presented

Paper Content

Introduction & previous work

  • Two important things in science: finding answers to given questions and coming up with good questions
  • Artificial systems can be used to implement creative part of science
  • Artificial scientists equipped with artificial curiosity and creativity have been published for 3 decades
  • Artificial Q&A system designed to invent and answer questions was the intrinsic motivation-based adversarial system from 1990
  • Two artificial NNs: controller C and world model M
  • M minimizes its error, C tries to find sequences of output actions that maximize the error of M
  • Artificial Q&A system from 1997 can ask arbitrary abstract questions with computable answers
  • Reward-maximizing C tries to come up with questions whose answers surprise the other
  • Artificial scientists maximize the sum of external rewards and intrinsic rewards
  • POWERPLAY framework (2011) enumerates the set of all formalisable questions
  • One Big Net For Everything offers a simplified NN version of POWERPLAY
  • Empirical investigation of two settings: generation of experiments driven by model prediction error and approach where C generates pure thought experiments in form of weight matrices of RNNs

Self-invented experiments encoded as neural networks

  • System allows for design of computational experiments with binary yes/no outcomes
  • Experiments can run for multiple time steps
  • Controller and model can be implemented as LSTMs
  • Controller has START unit to propose experiments
  • Experiment has HALT and RESULT units
  • Experiment outcome is 1 if RESULT unit > 0.5, 0 otherwise
  • Model predicts experiment outcome before it is executed
  • Reward for controller is proportional to model’s surprise
  • Alternative reward based on compression progress
  • Negative reward for inefficient experiments
  • Most initial experiments are thought experiments
  • Model and controller can be trained by backpropagation

Experimental evaluation

  • Automatic generation of experiments encoded as NNs
  • Evaluated empirically
  • Two setups: adversarial intrinsic reward and pure thought experiments encoded as RNNs
  • Adversarial intrinsic reward encourages experiments in differentiable environment
  • Experiments aid discovery of goal states in sparse reward setting
  • Pure thought experiments guided by information gain reward

Generating experiments in a differentiable environment

  • Reinforcement learning (RL) usually involves exploration in an environment with non-differentiable dynamics
  • RL methods such as policy gradients are used
  • A fully differentiable environment is introduced to simplify investigation and focus on self-invented experiments
  • Environment is a 2D force field with position and velocity states and real-valued force vectors as actions
  • Negative reward of -0.1 for each time step and a large reward of 100 for reaching goal state
  • Environment is deterministic and experiments are independent of each other
  • Model M is a simple MLP with parameters w
  • Intrinsic reward signal is non-differentiable
  • Reward based on information gain
  • Average runtime of experiments increases slightly over time

Pure rnn thought experiments

  • Experiment setup uses feedforward NNs and a differentiable intrinsic reward function.
  • Investigates thought experiments with no environment interactions, using RNNs without inputs and an intrinsic curiosity reward based on information gain.

Conclusion and future work

  • Extended the neural Controller-Model (CM) framework with the notion of arbitrary self-invented computational experiments with binary outcomes
  • Experiments are encoded as weight matrices of RNNs generated by the controller
  • Model has to predict the outcome of an experiment based on its parameters
  • Show that self-invented abstract experiments facilitate the discovery of rewarding goal states
  • Over time, controller is forced to create longer experiments
  • Second setup: controller generates pure abstract thought experiments in the form of RNNs
  • Over time, newly generated experiments result in less intrinsic information gain reward
  • Later experiments tend to have slightly longer runtime
  • Scaling these methods to more complex environments is challenging
  • Algorithm 2 summarizes the method described in Section 3.2
  • Efficient approximation of the policy gradients for the controller is achieved through an actor-critic method
  • Input to the LSTM history encoder is the sequence of the last 1000 experiments that have been executed
  • Hyperparameters listed in Table 2