Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Proposes a new method for approximating intractable likelihood functions and posterior densities in Bayesian surrogate modeling
  • Method involves training three complementary networks in an end-to-end fashion
  • Method can be used to estimate marginal likelihood and posterior predictive estimation
  • Benchmarked against state-of-the-art Bayesian methods and proposed a powerful diagnostic for joint calibration
  • Investigated ability of recurrent likelihood networks to emulate complex time series models

Paper Content

Introduction

  • Surrogate modeling and simulation-based inference are two important parts of a new generation of methods for simulation science
  • Surrogate modeling seeks to approximate the intractable likelihood function
  • Simulation-based inference aims to approximate the intractable posterior distribution of a complex generative model
  • Specialized neural approximators have been developed to solve the intractable problem
  • JANA is a Bayesian neural framework for simultaneous amortized SM and SBI
  • JANA enables accurate solutions to downstream tasks like the estimation of marginal and posterior predictive distributions
  • JANA outperforms or is on par with other methods given identical simulation budgets
  • JANA unlocks the potential of powerful Bayesian tools for model comparison, validation, and calibration
  • JANA can compute marginal likelihoods and rapidly produce posterior samples and normalized likelihood estimates of new data instances

Method

Problem formulation

  • Bayesian models are specified as a triple of a simulation program, externalized randomness, and prior knowledge about simulation parameters.
  • Forward inference is used to obtain an implicit likelihood from the joint distribution of the simulation program and externalized randomness.
  • The posterior distribution of the latent parameters is not available in closed form.
  • The marginal likelihood is used to compare and assign preferences to competing models.
  • Posterior predictive performance is used to compare and validate Bayesian models.
  • The expected log-predictive density is a widely-applied metric to measure posterior predictive performance.
  • Joint training leverages the symmetry between the posterior distribution and the implicit likelihood.

Posterior network

  • Normalizing flow between θ and a latent variable z θ is implemented by a posterior network P φ
  • Normalizing flow is realized by a conditional invertible neural network (cINN)
  • Summary network sub-module H ψ is optimized to extract maximally informative data representations H ψ (x)

Likelihood network

  • The likelihood network implements a normalizing flow between x and a multivariate Gaussian latent variable.
  • The likelihood network is implemented as a cINN.
  • The parameter vector θ is fed directly to the conditional coupling layers of the cINN.
  • The design of the coupling layers needs to be tailored according to the probabilistic symmetry of p(x | θ).

Simulation-based training

  • Aim for a fully amortized approach
  • Evaluate normalized densities for any pair (θ, x)
  • Generate conditional random draws θ | x and x | θ
  • Prescribe a simple distribution to summary network outputs
  • Minimize a criterion to enable error detection and model criticism
  • Detect potential simulation gaps during inference

Validation methodology: joint calibration

  • Faithful uncertainty representation is necessary for self-consistent and interpretable simulation-based inference.
  • Simulation-based calibration is a diagnostic method that considers the performance of a sampling algorithm over the entire joint distribution.
  • If the posterior and likelihood networks are accurate, then the equality implied by Eq. 10 holds.
  • Violations of this equality indicate errors in joint training.

Use cases for joint learning

  • Posterior Predictive Estimation is used to estimate the expected predictive performance of a Bayesian model.
  • ELPD cannot be computed for Bayesian models with intractable likelihoods or sequential neural estimators.
  • ELPD can be efficiently approximated using posterior draws.
  • Marginal Likelihood Estimation is used to compute a marginal likelihood.
  • Amortized surrogate simulators can generate additional data for the posterior network or a black-box optimizer.
  • Approximate Bayesian Computation is a family of algorithms used for SBI
  • ABC uses prior distributions and simulators to generate approximate posterior draws
  • ABC-SMC, ABC-MCMC, and Synthetic Likelihoods are more sophisticated ABC methods
  • Summary functions are used to reduce raw data in ABC
  • Neural networks can be used to learn informative summary statistics
  • Synthetic Likelihoods are more suitable for high-dimensional summary statistics
  • Particle MCMC is used for exact Bayesian inference
  • Neural Posterior Estimation methods specialize neural approximators for inference
  • Neural Likelihood Estimation methods target the intractable likelihood function
  • SNPLA is a method for joint posterior and likelihood estimation

Experiments

  • JANA is used in 12 Bayesian models across 4 experiments
  • Experiments 1-3 are trained without the MMD criterion
  • Code for reproducing all experiments is in the Appendix

Ten benchmark experiments

  • Experiment demonstrates fidelity of proposed architecture and utility of calibration checks
  • Deviate from original problem setting by approximating both posterior and likelihood
  • Validate results on larger held-out set of 1,000 simulations
  • Goal is to demonstrate feasibility of joint amortization and utility of JSBC diagnostic
  • Stable training and good calibration observed across 10 benchmark models
  • JSBC diagnostic reveals good calibration and systematic deviations due to likelihood network
  • JSBC diagnostic can pinpoint reasons for joint miscalibration

Two moons: method comparison

  • Focused on Two Moons benchmark
  • Compared JANA to popular sequential methods
  • Model characterized by bimodal posterior
  • Used same setup from Wiqvist et al. (2021)
  • Repeated experiment 10 times with fixed budget
  • JANA captures local patterns after 2,000 training samples
  • JANA outperforms sequential methods
  • Joint performance of amortized method comparable to non-amortized sequential methods

Exchangeable diffusion model

  • Demonstrates amortized marginal likelihood and ELPD estimation based on a mechanistic model of decision making
  • Compares results with state-of-the-art likelihood-based methods
  • Results indicate well-calibrated joint approximation and accurate posterior and likelihood estimation
  • Provides interface to PyMC for easy model building and use of existing samplers

Markovian compartmental model

  • Experiment demonstrates surrogate simulations of a complex non-exchangeable model of infectious diseases
  • Model features 34 parameters
  • Implement likelihood network as a recurrent cINN
  • Train posterior network using MMD criterion
  • Synthetic outbreak trajectories compared to outputs of original simulator
  • Good emulation across a variety of parameter configurations
  • Surrogate network accurately approximates median trajectory and variability
  • Posterior and joint calibration of two networks using (J)SBC
  • Deficiencies of likelihood network observed
  • Amortization makes a principled Bayesian workflow easier
  • Weight sharing approach for various model structures challenging
  • JANA operates with arbitrary conditional density approximators
  • Summary network needed for various sizes and shapes of data

D implementation details and additional results

  • Experiments are implemented using BayesFlow library
  • Adam optimizer used with learning rate between 0.0005 and 0.001
  • NVIDIA T4 graphics accelerator with 16GB of GPU memory used for training

D.1 experiment 1: ten benchmarks

  • Model specifications from Lueckmann et al. (2021) imported from BayesFlow library
  • Table 1 contains an overview of the benchmarks and core network settings
  • Full network configurations in Appendix
  • Figures show loss history, calibration diagnostics, and simulation budget of 10,000
  • Bernoulli GLM Raw model augmented with independent random variates
  • Recurrent likelihood networks emulate complex Bayesian SDE models
  • Samples from approximate posterior distribution for Two Moons experiment
  • True and synthetic likelihood align perfectly
  • Joint approximation of all parameters is well calibrated
  • Log marginal likelihood and ELPD estimates of JANA closely approximate those obtained via bridge sampling and PSIS-LOO
  • Jointly amortized neural approximation: offline training using a pre-simulated training set