Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

AI algorithms are inspired by physics and use stochastic fluctuations
Thermodynamic AI is a mathematical framework that unifies these algorithms
Thermodynamic AI hardware uses stochastic fluctuations as a computational resource
Thermodynamic AI hardware is a novel form of computing using s-bits and s-modes

Paper Content

Ii. stochasticity as a computing resource

Fluctuation is used to describe deviation from average value
Stochasticity is a precise mathematical description
Stochasticity is a resource that can be used to accomplish tasks
Randomness is a resource used in cryptography and computing
Stochasticity and randomness can be interconverted
Stochasticity can be used in generative modeling, optimization algorithms and financial asset integration

Iii. unification of intelligent algorithms

Unifications are powerful and highly sought after in physics
Goal is to motivate a hardware paradigm relevant to multiple AI applications
Mathematical unification of AI algorithms under same framework
Algorithms belong to class called Thermodynamic AI algorithms
Consist of two subroutines: SDE evolution and Maxwell’s demon observation
Mathematical unification can be useful outside of hardware development
Fundamental building blocks of thermodynamic AI hardware are dynamic
s-bits and s-modes are continuous-time Markov chains
s-modes can be implemented with electrical thermal and shot noise
Amplitude of stochasticity must be independently controllable
s-modes can be represented by voltage on a node in a circuit
Dynamics of s-modes can be constrained by adding electrical components
s-bits randomly flip states at times sampled from exponential distributions

C. problem geometry, inductive bias, and connectivity

Problem geometry can be used to design and program Thermodynamic AI Hardware.
Geometry can be 1D, 2D, 3D, or represented as a graph.
Geometry can be used to limit the search space and speed up training.
Thermodynamic AI Hardware is a hybrid digital-analog system.

Vii. states, operators, and superoperators

State of an s-unit lives in a vector space
Vector space is R2N
Transition matrix Q(t) is 4N-dimensional
Gates can be viewed as superoperators that act on the operator space
Superoperator space is N4-dimensional
Drift matrix schedule, drift vector schedule, and diffusion matrix schedule
Bra-ket notation for vector spaces
Drift matrix schedule, drift vector schedule, and diffusion matrix schedule can be viewed as gate sequences
Superoperator Q belongs to 16N-dimensional linear superoperator space
Continuous or discrete approach to gates
Continuous approach is analogous to pulse-level control
Matrix elements of A t , B t , and C t are continuous in t
Generator approach avoids having to specify timedependence of every matrix element of the gate

B. discrete approach to gates

Pulse-level control is natural for s-mode systems.
Continuous pulses are more efficient than discrete gates.

C. gate sequence as a software program

Three gate sequences represent a complete software program
Generator-based formalism used to write gate sequences
Gate sequence decomposes A t into a set of f gates
Gate sequences affect parameters of dynamics in Eq. (14)

D. special cases of gates

Gate B t acts on a single s-mode
Gate B t multiplies the first element of the b 0 vector by a time-dependent function
Gate B t acts on all s-modes independently
Gate A t and C t can act on a single s-mode or all s-modes independently
Gate A t and C t can affect the diagonal or off-diagonal elements of A 0 and C 0
Entropy of the system may naturally change over time
Probability distribution is Gaussian
Variance and entropy are interchangeable
Entropy can increase or decrease over time depending on the hardware drift and diffusion matrices

B. complicated entropy dynamics in ai applications

AI applications require complicated entropy dynamics.
Entropy needs to be reduced from a high to a low uncertainty situation.
Entropy dynamics needed for AI applications cannot be achieved with an isolated physical system.

D. maxwell’s demon

E. maxwell’s demon as a hardware component

MD is a component of the bare-bones hardware
Need to connect MD hardware to s-unit hardware
Can construct MD device in digital, analog or hybrid digital-analog approaches
Digital approach is a neural network stored on a CPU
Need to interconvert signals between thermodynamic hardware and CPU
MD takes in time and state vector as inputs
Outputs a vector which needs to be converted to physical form
Vector applied to s-unit system to give rise to drift term in SDE

G. training the maxwell’s demon

Equation (41) assumes MD has some level of intelligence
MD needs to be trained to be intelligent
MD output should depend on trainable parameters
Isolated training (ex situ) involves mimicking s-unit system with digital hardware
In situ training involves interacting s-unit system with MD system
Benefits of in situ training include using physical hardware to accelerate computation of loss functions and learning to correct errors
Issues to consider when constructing MD device include expressibility, signal interconversion, and latency
MD output can be described in terms of a differential equation
MD device receives analog inputs from s-mode system
MD device can be fully analog or hybrid digital-analog
Alternative approach to constructing MD system involves thinking of output as a force
MD device has a latent variable that evolves over time
MD device stores a potential energy function that can be time-dependent

X. thermodynamic error correction and noise robustness

A. noise plaguing other computing paradigms

Hardware noise is a major issue for quantum computing and analog computing.
Noise can make efficient algorithms inefficient, eliminating the quantum speedup.
Digital computers became more precise and economical in the 1950s-1970s, leading to the decline of analog computing.

B. using noise to one’s advantage

Thermodynamic AI uses noise as a fundamental ingredient in the hardware.
Noise is seen as essential, not a nuisance.
Noise sources can be both intentional and unintentional.

C. noise preserves the mathematical framework

Hardware can be intentionally designed with a drift matrix, drift vector, and diffusion matrix.
Unintentional and uncharacterized hardware noise can occur.
True values of the relevant matrices and vectors can be perturbed away from the original design.

D. maxwell’s demon learns to correct errors

Maxwell’s Demon (MD) device is a key ingredient in Thermodynamic AI hardware
MD system allows for error correction
Loss function is used to measure performance of hardware
MD system is trained in presence of physical s-mode system
MD system is able to correct for errors or noise in hardware
Thermodynamic AI systems have inherent robustness to hardware noise

Xi. application: thermodynamic diffusion models

Time-series data is important for financial analysis, market prediction, epidemiology, and medical data analysis.
Discrete neural networks and latent ODEs have been used to interpolate and extrapolate time-series data.
Latent SDEs have been explored for fitting and extrapolating time-series data.

B. fitting into our thermodynamic ai framework

Discussing using Thermodynamic AI hardware as either a latent ODE or latent SDE
Using an s-mode device combined with a parameterized Maxwell’s demon device to generate a parameterized SDE
Model in Fig. 18 has three subroutines: Encoder, Latent Thermodynamic AI hardware

C. description of diffusion hardware

Thermodynamic Diffusion Model can be implemented with analog electrical circuits or continuous-variable optical systems
Model has multiple degrees of freedom which correspond to the s-modes
Function generator can multiply diffusion and drift terms by time-dependent functions
Data can be uploaded and downloaded from the device by initializing and measuring the continuous state variables
Score network acts as a Maxwell’s Demon to reduce the physical system’s entropy

D. analog score network

Score network can take many physical forms, including digital and analog
Latency issues can arise when using digital score network with analog s-mode system
Figure 12 shows an analog circuit for score network
Subroutines of evaluating q and r can be digital or analog neural networks
Alternative means of constructing analog score network is force-based approach

Xii. application: thermodynamic deep learning

TDL is a term for applying Thermodynamic AI Hardware to deep learning
BDL allows for uncertainty quantification on the predicted output of neural networks
Current digital hardware is not able to perform BDL with both high accuracy and fast speed
Thermodynamic AI Hardware could potentially accelerate BDL to make large-scale BDL feasible

Background

Machine learning systems are often overconfident in their predictions.
Overconfidence can be catastrophic for high-stakes applications.
Overconfidence is caused by limited training data.
Uncertainty quantification (UQ) can help make machine learning more reliable and trustworthy.
UQ can provide guidance for when to defer to human judgement.
Bayesian deep learning is a continuous-time approach to UQ.

Fitting into our thermodynamic ai framework

Weight diffuser corresponds to s-mode device
Posterior drift network corresponds to Maxwell’s demon device

Xiii. application: thermodynamic monte carlo

Monte Carlo algorithms are used in finance, physics, chemistry, and machine learning
Monte Carlo algorithms approximate integrals involving probability distributions
Markov Chain Monte Carlo (MCMC) is a popular strategy for constructing samplers
MCMC operates by constructing a Markov chain with the target distribution as its stationary distribution
Langevin Monte Carlo (LMC) and Hamiltonian Monte Carlo (HMC) are two key algorithms
HMC is widely used for statistical analysis and learning
HMC proposes new samples using a combination of gradient information and Hamiltonian dynamics
No U-Turn sampler (NUTS) is an extension of HMC
Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is an extension of HMC for large problem sizes

B. connection to langevin monte carlo

SG-HMC and Langevin Monte Carlo (LMC) have a close connection
SGLD is a procedure for Bayesian posterior sampling of the parameters of a machine learning model
Parameters are prevented from freezing at a particular value by a noise term
Thermodynamic fluctuations are a resource for posterior inference
Logarithm of a distribution can be treated as an energy
Data becomes part of a time dependent diffusion vector

C. fitting into our thermodynamic ai framework

Algorithms can fit into a framework
Terminology introduced in section IX H 4 is mapped to s-unit formalism

D. description of monte carlo hardware

Coupled differential equations can be implemented on Thermodynamic AI hardware.
Computing derivatives of position and momentum involves diagonalizing matrices, which has a computation cost of O(n3).
Thermodynamic AI hardware can help alleviate the bottleneck of sampling for many applications.

Xiv. application: thermodynamic annealing

B. sde approach to simulated annealing

Reference [75] provides a mathematical framework for simulated annealing.
The framework is based on a system of equations for a state variable x and an auxiliary variable p.
The dynamics of x are stochastic and in the long-time limit, it is distributed according to a Boltzmann probability distribution.
This allows the exploration of the extrema landscape of the loss function L.

C. fitting into our thermodynamic ai framework

Equations 83 and 84 fit into the framework for Thermodynamic AI hardware.
Auxiliary SDE maps to s-mode device, S corresponds to coefficient C(t), -(1/2)D corresponds to coefficient A(t), -∇L(x) corresponds to demon vector d.
Optimization ODE maps to evolution of latent variable in Maxwell’s demon device.
Framework discussed in Sec. IX H 4 involves a forced-based Maxwell’s demon.
Mass matrix set to identity: M = I.

Link to paper#

Abstract#

Paper Content#

Ii. stochasticity as a computing resource#

Iii. unification of intelligent algorithms#

C. problem geometry, inductive bias, and connectivity#

Vii. states, operators, and superoperators#

B. discrete approach to gates#

C. gate sequence as a software program#

D. special cases of gates#

B. complicated entropy dynamics in ai applications#

D. maxwell’s demon#

E. maxwell’s demon as a hardware component#

G. training the maxwell’s demon#

X. thermodynamic error correction and noise robustness#

A. noise plaguing other computing paradigms#

B. using noise to one’s advantage#

C. noise preserves the mathematical framework#

D. maxwell’s demon learns to correct errors#

Xi. application: thermodynamic diffusion models#

B. fitting into our thermodynamic ai framework#

C. description of diffusion hardware#

D. analog score network#

Xii. application: thermodynamic deep learning#

Background#

Fitting into our thermodynamic ai framework#

Xiii. application: thermodynamic monte carlo#

B. connection to langevin monte carlo#

C. fitting into our thermodynamic ai framework#

D. description of monte carlo hardware#

Xiv. application: thermodynamic annealing#

B. sde approach to simulated annealing#

C. fitting into our thermodynamic ai framework#

Link to paper

Abstract

Paper Content

Ii. stochasticity as a computing resource

Iii. unification of intelligent algorithms

C. problem geometry, inductive bias, and connectivity

Vii. states, operators, and superoperators

B. discrete approach to gates

C. gate sequence as a software program

D. special cases of gates

B. complicated entropy dynamics in ai applications

D. maxwell’s demon

E. maxwell’s demon as a hardware component

G. training the maxwell’s demon

X. thermodynamic error correction and noise robustness

A. noise plaguing other computing paradigms

B. using noise to one’s advantage

C. noise preserves the mathematical framework

D. maxwell’s demon learns to correct errors

Xi. application: thermodynamic diffusion models

B. fitting into our thermodynamic ai framework

C. description of diffusion hardware

D. analog score network

Xii. application: thermodynamic deep learning

Background

Fitting into our thermodynamic ai framework

Xiii. application: thermodynamic monte carlo

B. connection to langevin monte carlo

C. fitting into our thermodynamic ai framework

D. description of monte carlo hardware

Xiv. application: thermodynamic annealing

B. sde approach to simulated annealing

C. fitting into our thermodynamic ai framework