Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- AI algorithms are inspired by physics and use stochastic fluctuations
- Thermodynamic AI is a mathematical framework that unifies these algorithms
- Thermodynamic AI hardware uses stochastic fluctuations as a computational resource
- Thermodynamic AI hardware is a novel form of computing using s-bits and s-modes
Paper Content
Ii. stochasticity as a computing resource
- Fluctuation is used to describe deviation from average value
- Stochasticity is a precise mathematical description
- Stochasticity is a resource that can be used to accomplish tasks
- Randomness is a resource used in cryptography and computing
- Stochasticity and randomness can be interconverted
- Stochasticity can be used in generative modeling, optimization algorithms and financial asset integration
Iii. unification of intelligent algorithms
- Unifications are powerful and highly sought after in physics
- Goal is to motivate a hardware paradigm relevant to multiple AI applications
- Mathematical unification of AI algorithms under same framework
- Algorithms belong to class called Thermodynamic AI algorithms
- Consist of two subroutines: SDE evolution and Maxwell’s demon observation
- Mathematical unification can be useful outside of hardware development
- Fundamental building blocks of thermodynamic AI hardware are dynamic
- s-bits and s-modes are continuous-time Markov chains
- s-modes can be implemented with electrical thermal and shot noise
- Amplitude of stochasticity must be independently controllable
- s-modes can be represented by voltage on a node in a circuit
- Dynamics of s-modes can be constrained by adding electrical components
- s-bits randomly flip states at times sampled from exponential distributions
C. problem geometry, inductive bias, and connectivity
- Problem geometry can be used to design and program Thermodynamic AI Hardware.
- Geometry can be 1D, 2D, 3D, or represented as a graph.
- Geometry can be used to limit the search space and speed up training.
- Thermodynamic AI Hardware is a hybrid digital-analog system.
Vii. states, operators, and superoperators
- State of an s-unit lives in a vector space
- Vector space is R2N
- Transition matrix Q(t) is 4N-dimensional
- Gates can be viewed as superoperators that act on the operator space
- Superoperator space is N4-dimensional
- Drift matrix schedule, drift vector schedule, and diffusion matrix schedule
- Bra-ket notation for vector spaces
- Drift matrix schedule, drift vector schedule, and diffusion matrix schedule can be viewed as gate sequences
- Superoperator Q belongs to 16N-dimensional linear superoperator space
- Continuous or discrete approach to gates
- Continuous approach is analogous to pulse-level control
- Matrix elements of A t , B t , and C t are continuous in t
- Generator approach avoids having to specify timedependence of every matrix element of the gate
B. discrete approach to gates
- Pulse-level control is natural for s-mode systems.
- Continuous pulses are more efficient than discrete gates.
C. gate sequence as a software program
- Three gate sequences represent a complete software program
- Generator-based formalism used to write gate sequences
- Gate sequence decomposes A t into a set of f gates
- Gate sequences affect parameters of dynamics in Eq. (14)
D. special cases of gates
- Gate B t acts on a single s-mode
- Gate B t multiplies the first element of the b 0 vector by a time-dependent function
- Gate B t acts on all s-modes independently
- Gate A t and C t can act on a single s-mode or all s-modes independently
- Gate A t and C t can affect the diagonal or off-diagonal elements of A 0 and C 0
- Entropy of the system may naturally change over time
- Probability distribution is Gaussian
- Variance and entropy are interchangeable
- Entropy can increase or decrease over time depending on the hardware drift and diffusion matrices
B. complicated entropy dynamics in ai applications
- AI applications require complicated entropy dynamics.
- Entropy needs to be reduced from a high to a low uncertainty situation.
- Entropy dynamics needed for AI applications cannot be achieved with an isolated physical system.
D. maxwell’s demon
E. maxwell’s demon as a hardware component
- MD is a component of the bare-bones hardware
- Need to connect MD hardware to s-unit hardware
- Can construct MD device in digital, analog or hybrid digital-analog approaches
- Digital approach is a neural network stored on a CPU
- Need to interconvert signals between thermodynamic hardware and CPU
- MD takes in time and state vector as inputs
- Outputs a vector which needs to be converted to physical form
- Vector applied to s-unit system to give rise to drift term in SDE
G. training the maxwell’s demon
- Equation (41) assumes MD has some level of intelligence
- MD needs to be trained to be intelligent
- MD output should depend on trainable parameters
- Isolated training (ex situ) involves mimicking s-unit system with digital hardware
- In situ training involves interacting s-unit system with MD system
- Benefits of in situ training include using physical hardware to accelerate computation of loss functions and learning to correct errors
- Issues to consider when constructing MD device include expressibility, signal interconversion, and latency
- MD output can be described in terms of a differential equation
- MD device receives analog inputs from s-mode system
- MD device can be fully analog or hybrid digital-analog
- Alternative approach to constructing MD system involves thinking of output as a force
- MD device has a latent variable that evolves over time
- MD device stores a potential energy function that can be time-dependent
X. thermodynamic error correction and noise robustness
A. noise plaguing other computing paradigms
- Hardware noise is a major issue for quantum computing and analog computing.
- Noise can make efficient algorithms inefficient, eliminating the quantum speedup.
- Digital computers became more precise and economical in the 1950s-1970s, leading to the decline of analog computing.
B. using noise to one’s advantage
- Thermodynamic AI uses noise as a fundamental ingredient in the hardware.
- Noise is seen as essential, not a nuisance.
- Noise sources can be both intentional and unintentional.
C. noise preserves the mathematical framework
- Hardware can be intentionally designed with a drift matrix, drift vector, and diffusion matrix.
- Unintentional and uncharacterized hardware noise can occur.
- True values of the relevant matrices and vectors can be perturbed away from the original design.
D. maxwell’s demon learns to correct errors
- Maxwell’s Demon (MD) device is a key ingredient in Thermodynamic AI hardware
- MD system allows for error correction
- Loss function is used to measure performance of hardware
- MD system is trained in presence of physical s-mode system
- MD system is able to correct for errors or noise in hardware
- Thermodynamic AI systems have inherent robustness to hardware noise
Xi. application: thermodynamic diffusion models
- Time-series data is important for financial analysis, market prediction, epidemiology, and medical data analysis.
- Discrete neural networks and latent ODEs have been used to interpolate and extrapolate time-series data.
- Latent SDEs have been explored for fitting and extrapolating time-series data.
B. fitting into our thermodynamic ai framework
- Discussing using Thermodynamic AI hardware as either a latent ODE or latent SDE
- Using an s-mode device combined with a parameterized Maxwell’s demon device to generate a parameterized SDE
- Model in Fig. 18 has three subroutines: Encoder, Latent Thermodynamic AI hardware
C. description of diffusion hardware
- Thermodynamic Diffusion Model can be implemented with analog electrical circuits or continuous-variable optical systems
- Model has multiple degrees of freedom which correspond to the s-modes
- Function generator can multiply diffusion and drift terms by time-dependent functions
- Data can be uploaded and downloaded from the device by initializing and measuring the continuous state variables
- Score network acts as a Maxwell’s Demon to reduce the physical system’s entropy
D. analog score network
- Score network can take many physical forms, including digital and analog
- Latency issues can arise when using digital score network with analog s-mode system
- Figure 12 shows an analog circuit for score network
- Subroutines of evaluating q and r can be digital or analog neural networks
- Alternative means of constructing analog score network is force-based approach
Xii. application: thermodynamic deep learning
- TDL is a term for applying Thermodynamic AI Hardware to deep learning
- BDL allows for uncertainty quantification on the predicted output of neural networks
- Current digital hardware is not able to perform BDL with both high accuracy and fast speed
- Thermodynamic AI Hardware could potentially accelerate BDL to make large-scale BDL feasible
Background
- Machine learning systems are often overconfident in their predictions.
- Overconfidence can be catastrophic for high-stakes applications.
- Overconfidence is caused by limited training data.
- Uncertainty quantification (UQ) can help make machine learning more reliable and trustworthy.
- UQ can provide guidance for when to defer to human judgement.
- Bayesian deep learning is a continuous-time approach to UQ.
Fitting into our thermodynamic ai framework
- Weight diffuser corresponds to s-mode device
- Posterior drift network corresponds to Maxwell’s demon device
Xiii. application: thermodynamic monte carlo
- Monte Carlo algorithms are used in finance, physics, chemistry, and machine learning
- Monte Carlo algorithms approximate integrals involving probability distributions
- Markov Chain Monte Carlo (MCMC) is a popular strategy for constructing samplers
- MCMC operates by constructing a Markov chain with the target distribution as its stationary distribution
- Langevin Monte Carlo (LMC) and Hamiltonian Monte Carlo (HMC) are two key algorithms
- HMC is widely used for statistical analysis and learning
- HMC proposes new samples using a combination of gradient information and Hamiltonian dynamics
- No U-Turn sampler (NUTS) is an extension of HMC
- Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is an extension of HMC for large problem sizes
B. connection to langevin monte carlo
- SG-HMC and Langevin Monte Carlo (LMC) have a close connection
- SGLD is a procedure for Bayesian posterior sampling of the parameters of a machine learning model
- Parameters are prevented from freezing at a particular value by a noise term
- Thermodynamic fluctuations are a resource for posterior inference
- Logarithm of a distribution can be treated as an energy
- Data becomes part of a time dependent diffusion vector
C. fitting into our thermodynamic ai framework
- Algorithms can fit into a framework
- Terminology introduced in section IX H 4 is mapped to s-unit formalism
D. description of monte carlo hardware
- Coupled differential equations can be implemented on Thermodynamic AI hardware.
- Computing derivatives of position and momentum involves diagonalizing matrices, which has a computation cost of O(n3).
- Thermodynamic AI hardware can help alleviate the bottleneck of sampling for many applications.
Xiv. application: thermodynamic annealing
B. sde approach to simulated annealing
- Reference [75] provides a mathematical framework for simulated annealing.
- The framework is based on a system of equations for a state variable x and an auxiliary variable p.
- The dynamics of x are stochastic and in the long-time limit, it is distributed according to a Boltzmann probability distribution.
- This allows the exploration of the extrema landscape of the loss function L.
C. fitting into our thermodynamic ai framework
- Equations 83 and 84 fit into the framework for Thermodynamic AI hardware.
- Auxiliary SDE maps to s-mode device, S corresponds to coefficient C(t), -(1/2)D corresponds to coefficient A(t), -∇L(x) corresponds to demon vector d.
- Optimization ODE maps to evolution of latent variable in Maxwell’s demon device.
- Framework discussed in Sec. IX H 4 involves a forced-based Maxwell’s demon.
- Mass matrix set to identity: M = I.