Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • AI algorithms are inspired by physics and use stochastic fluctuations
  • Thermodynamic AI is a mathematical framework that unifies these algorithms
  • Thermodynamic AI hardware uses stochastic fluctuations as a computational resource
  • Thermodynamic AI hardware is a novel form of computing using s-bits and s-modes

Paper Content

Ii. stochasticity as a computing resource

  • Fluctuation is used to describe deviation from average value
  • Stochasticity is a precise mathematical description
  • Stochasticity is a resource that can be used to accomplish tasks
  • Randomness is a resource used in cryptography and computing
  • Stochasticity and randomness can be interconverted
  • Stochasticity can be used in generative modeling, optimization algorithms and financial asset integration

Iii. unification of intelligent algorithms

  • Unifications are powerful and highly sought after in physics
  • Goal is to motivate a hardware paradigm relevant to multiple AI applications
  • Mathematical unification of AI algorithms under same framework
  • Algorithms belong to class called Thermodynamic AI algorithms
  • Consist of two subroutines: SDE evolution and Maxwell’s demon observation
  • Mathematical unification can be useful outside of hardware development
  • Fundamental building blocks of thermodynamic AI hardware are dynamic
  • s-bits and s-modes are continuous-time Markov chains
  • s-modes can be implemented with electrical thermal and shot noise
  • Amplitude of stochasticity must be independently controllable
  • s-modes can be represented by voltage on a node in a circuit
  • Dynamics of s-modes can be constrained by adding electrical components
  • s-bits randomly flip states at times sampled from exponential distributions

C. problem geometry, inductive bias, and connectivity

  • Problem geometry can be used to design and program Thermodynamic AI Hardware.
  • Geometry can be 1D, 2D, 3D, or represented as a graph.
  • Geometry can be used to limit the search space and speed up training.
  • Thermodynamic AI Hardware is a hybrid digital-analog system.

Vii. states, operators, and superoperators

  • State of an s-unit lives in a vector space
  • Vector space is R2N
  • Transition matrix Q(t) is 4N-dimensional
  • Gates can be viewed as superoperators that act on the operator space
  • Superoperator space is N4-dimensional
  • Drift matrix schedule, drift vector schedule, and diffusion matrix schedule
  • Bra-ket notation for vector spaces
  • Drift matrix schedule, drift vector schedule, and diffusion matrix schedule can be viewed as gate sequences
  • Superoperator Q belongs to 16N-dimensional linear superoperator space
  • Continuous or discrete approach to gates
  • Continuous approach is analogous to pulse-level control
  • Matrix elements of A t , B t , and C t are continuous in t
  • Generator approach avoids having to specify timedependence of every matrix element of the gate

B. discrete approach to gates

  • Pulse-level control is natural for s-mode systems.
  • Continuous pulses are more efficient than discrete gates.

C. gate sequence as a software program

  • Three gate sequences represent a complete software program
  • Generator-based formalism used to write gate sequences
  • Gate sequence decomposes A t into a set of f gates
  • Gate sequences affect parameters of dynamics in Eq. (14)

D. special cases of gates

  • Gate B t acts on a single s-mode
  • Gate B t multiplies the first element of the b 0 vector by a time-dependent function
  • Gate B t acts on all s-modes independently
  • Gate A t and C t can act on a single s-mode or all s-modes independently
  • Gate A t and C t can affect the diagonal or off-diagonal elements of A 0 and C 0
  • Entropy of the system may naturally change over time
  • Probability distribution is Gaussian
  • Variance and entropy are interchangeable
  • Entropy can increase or decrease over time depending on the hardware drift and diffusion matrices

B. complicated entropy dynamics in ai applications

  • AI applications require complicated entropy dynamics.
  • Entropy needs to be reduced from a high to a low uncertainty situation.
  • Entropy dynamics needed for AI applications cannot be achieved with an isolated physical system.

D. maxwell’s demon

E. maxwell’s demon as a hardware component

  • MD is a component of the bare-bones hardware
  • Need to connect MD hardware to s-unit hardware
  • Can construct MD device in digital, analog or hybrid digital-analog approaches
  • Digital approach is a neural network stored on a CPU
  • Need to interconvert signals between thermodynamic hardware and CPU
  • MD takes in time and state vector as inputs
  • Outputs a vector which needs to be converted to physical form
  • Vector applied to s-unit system to give rise to drift term in SDE

G. training the maxwell’s demon

  • Equation (41) assumes MD has some level of intelligence
  • MD needs to be trained to be intelligent
  • MD output should depend on trainable parameters
  • Isolated training (ex situ) involves mimicking s-unit system with digital hardware
  • In situ training involves interacting s-unit system with MD system
  • Benefits of in situ training include using physical hardware to accelerate computation of loss functions and learning to correct errors
  • Issues to consider when constructing MD device include expressibility, signal interconversion, and latency
  • MD output can be described in terms of a differential equation
  • MD device receives analog inputs from s-mode system
  • MD device can be fully analog or hybrid digital-analog
  • Alternative approach to constructing MD system involves thinking of output as a force
  • MD device has a latent variable that evolves over time
  • MD device stores a potential energy function that can be time-dependent

X. thermodynamic error correction and noise robustness

A. noise plaguing other computing paradigms

  • Hardware noise is a major issue for quantum computing and analog computing.
  • Noise can make efficient algorithms inefficient, eliminating the quantum speedup.
  • Digital computers became more precise and economical in the 1950s-1970s, leading to the decline of analog computing.

B. using noise to one’s advantage

  • Thermodynamic AI uses noise as a fundamental ingredient in the hardware.
  • Noise is seen as essential, not a nuisance.
  • Noise sources can be both intentional and unintentional.

C. noise preserves the mathematical framework

  • Hardware can be intentionally designed with a drift matrix, drift vector, and diffusion matrix.
  • Unintentional and uncharacterized hardware noise can occur.
  • True values of the relevant matrices and vectors can be perturbed away from the original design.

D. maxwell’s demon learns to correct errors

  • Maxwell’s Demon (MD) device is a key ingredient in Thermodynamic AI hardware
  • MD system allows for error correction
  • Loss function is used to measure performance of hardware
  • MD system is trained in presence of physical s-mode system
  • MD system is able to correct for errors or noise in hardware
  • Thermodynamic AI systems have inherent robustness to hardware noise

Xi. application: thermodynamic diffusion models

  • Time-series data is important for financial analysis, market prediction, epidemiology, and medical data analysis.
  • Discrete neural networks and latent ODEs have been used to interpolate and extrapolate time-series data.
  • Latent SDEs have been explored for fitting and extrapolating time-series data.

B. fitting into our thermodynamic ai framework

  • Discussing using Thermodynamic AI hardware as either a latent ODE or latent SDE
  • Using an s-mode device combined with a parameterized Maxwell’s demon device to generate a parameterized SDE
  • Model in Fig. 18 has three subroutines: Encoder, Latent Thermodynamic AI hardware

C. description of diffusion hardware

  • Thermodynamic Diffusion Model can be implemented with analog electrical circuits or continuous-variable optical systems
  • Model has multiple degrees of freedom which correspond to the s-modes
  • Function generator can multiply diffusion and drift terms by time-dependent functions
  • Data can be uploaded and downloaded from the device by initializing and measuring the continuous state variables
  • Score network acts as a Maxwell’s Demon to reduce the physical system’s entropy

D. analog score network

  • Score network can take many physical forms, including digital and analog
  • Latency issues can arise when using digital score network with analog s-mode system
  • Figure 12 shows an analog circuit for score network
  • Subroutines of evaluating q and r can be digital or analog neural networks
  • Alternative means of constructing analog score network is force-based approach

Xii. application: thermodynamic deep learning

  • TDL is a term for applying Thermodynamic AI Hardware to deep learning
  • BDL allows for uncertainty quantification on the predicted output of neural networks
  • Current digital hardware is not able to perform BDL with both high accuracy and fast speed
  • Thermodynamic AI Hardware could potentially accelerate BDL to make large-scale BDL feasible

Background

  • Machine learning systems are often overconfident in their predictions.
  • Overconfidence can be catastrophic for high-stakes applications.
  • Overconfidence is caused by limited training data.
  • Uncertainty quantification (UQ) can help make machine learning more reliable and trustworthy.
  • UQ can provide guidance for when to defer to human judgement.
  • Bayesian deep learning is a continuous-time approach to UQ.

Fitting into our thermodynamic ai framework

  • Weight diffuser corresponds to s-mode device
  • Posterior drift network corresponds to Maxwell’s demon device

Xiii. application: thermodynamic monte carlo

  • Monte Carlo algorithms are used in finance, physics, chemistry, and machine learning
  • Monte Carlo algorithms approximate integrals involving probability distributions
  • Markov Chain Monte Carlo (MCMC) is a popular strategy for constructing samplers
  • MCMC operates by constructing a Markov chain with the target distribution as its stationary distribution
  • Langevin Monte Carlo (LMC) and Hamiltonian Monte Carlo (HMC) are two key algorithms
  • HMC is widely used for statistical analysis and learning
  • HMC proposes new samples using a combination of gradient information and Hamiltonian dynamics
  • No U-Turn sampler (NUTS) is an extension of HMC
  • Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is an extension of HMC for large problem sizes

B. connection to langevin monte carlo

  • SG-HMC and Langevin Monte Carlo (LMC) have a close connection
  • SGLD is a procedure for Bayesian posterior sampling of the parameters of a machine learning model
  • Parameters are prevented from freezing at a particular value by a noise term
  • Thermodynamic fluctuations are a resource for posterior inference
  • Logarithm of a distribution can be treated as an energy
  • Data becomes part of a time dependent diffusion vector

C. fitting into our thermodynamic ai framework

  • Algorithms can fit into a framework
  • Terminology introduced in section IX H 4 is mapped to s-unit formalism

D. description of monte carlo hardware

  • Coupled differential equations can be implemented on Thermodynamic AI hardware.
  • Computing derivatives of position and momentum involves diagonalizing matrices, which has a computation cost of O(n3).
  • Thermodynamic AI hardware can help alleviate the bottleneck of sampling for many applications.

Xiv. application: thermodynamic annealing

B. sde approach to simulated annealing

  • Reference [75] provides a mathematical framework for simulated annealing.
  • The framework is based on a system of equations for a state variable x and an auxiliary variable p.
  • The dynamics of x are stochastic and in the long-time limit, it is distributed according to a Boltzmann probability distribution.
  • This allows the exploration of the extrema landscape of the loss function L.

C. fitting into our thermodynamic ai framework

  • Equations 83 and 84 fit into the framework for Thermodynamic AI hardware.
  • Auxiliary SDE maps to s-mode device, S corresponds to coefficient C(t), -(1/2)D corresponds to coefficient A(t), -∇L(x) corresponds to demon vector d.
  • Optimization ODE maps to evolution of latent variable in Maxwell’s demon device.
  • Framework discussed in Sec. IX H 4 involves a forced-based Maxwell’s demon.
  • Mass matrix set to identity: M = I.