Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Category theory has been applied to various scientific domains.
  • DisCoPyro combines categorical structures with amortized variational inference.
  • DisCoPyro can be applied to program learning for variational autoencoders.
  • DisCoPyro provides mathematical foundations and concrete applications.
  • DisCoPyro is compared to other models (e.g. neuro-symbolic models).
  • DisCoPyro could contribute to the development of artificial general intelligence.

Paper Content

Introduction

  • Category theory has been applied to various mathematical sciences and machine learning.
  • DisCoPyro is a probabilistic generative model with amortized variational inference.
  • DisCoPyro is designed to develop human-level artificial general intelligence.

Notation

  • SMCs are built from objects and sets of morphisms between them
  • Operads are built from types and sets of morphisms between them
  • SMCs have a product operation and a unit of the product
  • Categories support composition of morphisms
  • Operads support indexed composition of morphisms

Foundations of discopyro

  • Finitary signature in an SMC consists of free operad over a signature and objects in the SMC
  • Free operad over a signature consists of trees with finitely many branches and leaves
  • Reason about composition as nesting rather than just transitive combination
  • Represented as directed acyclic hypergraphs
  • Algorithm 1 produces hypergraph with vertices corresponding to non-product type
  • Transition distance between two indexed vertices
  • Probabilistic generative model over morphisms in the free operad
  • Acyclicity requires connections between boxes without forming cycles
  • User-specified wiring diagram to write prior distribution over latent variables
  • Inference from data by Bayesian inversion

Model learning and variational bayesian inference

  • Bayesian inversion relies on evaluating the model evidence
  • Model evidence has no closed form solution
  • Transform high-dimensional integral into an expectation
  • DisCoPyro provides two methods for constructing expectation
  • Jensen’s Inequality provides lower bound to true model evidence
  • Maximizing evidence lower bound estimates model parameters

Example application and training

  • Framework connects morphism to data via likelihood with intermediate latent random variables
  • Allows for broad variety of applications
  • Example application of framework to deep probabilistic program learning for generative modeling
  • Performance of application as generative model described in Section 3.2

Deep probabilistic program learning with discopyro

  • Constructed an operad O with generators implementing Pyro building blocks for deep generative models
  • Specified one-box wiring diagram to parameterize DisCoPyro generative model
  • Trained free operad model on MNIST and downsampled Omniglot dataset
  • DisCoPyro provides amortized variational inference over its own random variables
  • Generative model provides proposal over morphisms in free operad
  • Neural network design for proposal specified as q φ (z | x, f θ )
  • Application has complete proposal density

Experimental results and performance comparison

  • Our free operad model outperforms other structured deep generative models in terms of log model evidence.
  • Older baselines fix a composition structure ahead of time, while our model learns it from data.
  • The Omniglot challenge requires a model to be usable for classification, latent feature recognition, concept generation, and exemplar generation.

Discussion

  • DisCoPyro is a generative Bayesian structure learning system
  • It uses category theory, operad theory, and variational Bayesian inference
  • It is competitive against other models on a challenge dataset
  • It can model human intelligence across more domains than handwritten characters
  • It can be applied to chemical reaction networks, natural language processing, and the systematicity of human intelligence
  • It uses an absorbing Markov chain to sample paths between types