Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposes a new method for approximating intractable likelihood functions and posterior densities in Bayesian surrogate modeling
- Method involves training three complementary networks in an end-to-end fashion
- Method can be used to estimate marginal likelihood and posterior predictive estimation
- Benchmarked against state-of-the-art Bayesian methods and proposed a powerful diagnostic for joint calibration
- Investigated ability of recurrent likelihood networks to emulate complex time series models
Paper Content
Introduction
- Surrogate modeling and simulation-based inference are two important parts of a new generation of methods for simulation science
- Surrogate modeling seeks to approximate the intractable likelihood function
- Simulation-based inference aims to approximate the intractable posterior distribution of a complex generative model
- Specialized neural approximators have been developed to solve the intractable problem
- JANA is a Bayesian neural framework for simultaneous amortized SM and SBI
- JANA enables accurate solutions to downstream tasks like the estimation of marginal and posterior predictive distributions
- JANA outperforms or is on par with other methods given identical simulation budgets
- JANA unlocks the potential of powerful Bayesian tools for model comparison, validation, and calibration
- JANA can compute marginal likelihoods and rapidly produce posterior samples and normalized likelihood estimates of new data instances
Method
Problem formulation
- Bayesian models are specified as a triple of a simulation program, externalized randomness, and prior knowledge about simulation parameters.
- Forward inference is used to obtain an implicit likelihood from the joint distribution of the simulation program and externalized randomness.
- The posterior distribution of the latent parameters is not available in closed form.
- The marginal likelihood is used to compare and assign preferences to competing models.
- Posterior predictive performance is used to compare and validate Bayesian models.
- The expected log-predictive density is a widely-applied metric to measure posterior predictive performance.
- Joint training leverages the symmetry between the posterior distribution and the implicit likelihood.
Posterior network
- Normalizing flow between θ and a latent variable z θ is implemented by a posterior network P φ
- Normalizing flow is realized by a conditional invertible neural network (cINN)
- Summary network sub-module H ψ is optimized to extract maximally informative data representations H ψ (x)
Likelihood network
- The likelihood network implements a normalizing flow between x and a multivariate Gaussian latent variable.
- The likelihood network is implemented as a cINN.
- The parameter vector θ is fed directly to the conditional coupling layers of the cINN.
- The design of the coupling layers needs to be tailored according to the probabilistic symmetry of p(x | θ).
Simulation-based training
- Aim for a fully amortized approach
- Evaluate normalized densities for any pair (θ, x)
- Generate conditional random draws θ | x and x | θ
- Prescribe a simple distribution to summary network outputs
- Minimize a criterion to enable error detection and model criticism
- Detect potential simulation gaps during inference
Validation methodology: joint calibration
- Faithful uncertainty representation is necessary for self-consistent and interpretable simulation-based inference.
- Simulation-based calibration is a diagnostic method that considers the performance of a sampling algorithm over the entire joint distribution.
- If the posterior and likelihood networks are accurate, then the equality implied by Eq. 10 holds.
- Violations of this equality indicate errors in joint training.
Use cases for joint learning
- Posterior Predictive Estimation is used to estimate the expected predictive performance of a Bayesian model.
- ELPD cannot be computed for Bayesian models with intractable likelihoods or sequential neural estimators.
- ELPD can be efficiently approximated using posterior draws.
- Marginal Likelihood Estimation is used to compute a marginal likelihood.
- Amortized surrogate simulators can generate additional data for the posterior network or a black-box optimizer.
Related work
- Approximate Bayesian Computation is a family of algorithms used for SBI
- ABC uses prior distributions and simulators to generate approximate posterior draws
- ABC-SMC, ABC-MCMC, and Synthetic Likelihoods are more sophisticated ABC methods
- Summary functions are used to reduce raw data in ABC
- Neural networks can be used to learn informative summary statistics
- Synthetic Likelihoods are more suitable for high-dimensional summary statistics
- Particle MCMC is used for exact Bayesian inference
- Neural Posterior Estimation methods specialize neural approximators for inference
- Neural Likelihood Estimation methods target the intractable likelihood function
- SNPLA is a method for joint posterior and likelihood estimation
Experiments
- JANA is used in 12 Bayesian models across 4 experiments
- Experiments 1-3 are trained without the MMD criterion
- Code for reproducing all experiments is in the Appendix
Ten benchmark experiments
- Experiment demonstrates fidelity of proposed architecture and utility of calibration checks
- Deviate from original problem setting by approximating both posterior and likelihood
- Validate results on larger held-out set of 1,000 simulations
- Goal is to demonstrate feasibility of joint amortization and utility of JSBC diagnostic
- Stable training and good calibration observed across 10 benchmark models
- JSBC diagnostic reveals good calibration and systematic deviations due to likelihood network
- JSBC diagnostic can pinpoint reasons for joint miscalibration
Two moons: method comparison
- Focused on Two Moons benchmark
- Compared JANA to popular sequential methods
- Model characterized by bimodal posterior
- Used same setup from Wiqvist et al. (2021)
- Repeated experiment 10 times with fixed budget
- JANA captures local patterns after 2,000 training samples
- JANA outperforms sequential methods
- Joint performance of amortized method comparable to non-amortized sequential methods
Exchangeable diffusion model
- Demonstrates amortized marginal likelihood and ELPD estimation based on a mechanistic model of decision making
- Compares results with state-of-the-art likelihood-based methods
- Results indicate well-calibrated joint approximation and accurate posterior and likelihood estimation
- Provides interface to PyMC for easy model building and use of existing samplers
Markovian compartmental model
- Experiment demonstrates surrogate simulations of a complex non-exchangeable model of infectious diseases
- Model features 34 parameters
- Implement likelihood network as a recurrent cINN
- Train posterior network using MMD criterion
- Synthetic outbreak trajectories compared to outputs of original simulator
- Good emulation across a variety of parameter configurations
- Surrogate network accurately approximates median trajectory and variability
- Posterior and joint calibration of two networks using (J)SBC
- Deficiencies of likelihood network observed
- Amortization makes a principled Bayesian workflow easier
- Weight sharing approach for various model structures challenging
- JANA operates with arbitrary conditional density approximators
- Summary network needed for various sizes and shapes of data
D implementation details and additional results
- Experiments are implemented using BayesFlow library
- Adam optimizer used with learning rate between 0.0005 and 0.001
- NVIDIA T4 graphics accelerator with 16GB of GPU memory used for training
D.1 experiment 1: ten benchmarks
- Model specifications from Lueckmann et al. (2021) imported from BayesFlow library
- Table 1 contains an overview of the benchmarks and core network settings
- Full network configurations in Appendix
- Figures show loss history, calibration diagnostics, and simulation budget of 10,000
- Bernoulli GLM Raw model augmented with independent random variates
- Recurrent likelihood networks emulate complex Bayesian SDE models
- Samples from approximate posterior distribution for Two Moons experiment
- True and synthetic likelihood align perfectly
- Joint approximation of all parameters is well calibrated
- Log marginal likelihood and ELPD estimates of JANA closely approximate those obtained via bridge sampling and PSIS-LOO
- Jointly amortized neural approximation: offline training using a pre-simulated training set