Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Representation of data involves investigator choices
Choices lead to exact symmetry
Symmetries include coordinate freedom, gauge symmetry and units covariance
Goal is to understand implications of passive symmetries for machine learning
Discuss links to causal modeling
Implementation of passive symmetries valuable for generalizing out of sample

Paper Content

Introduction

ML has been inspired by mathematical physics
Kernel trick and statistical mechanics techniques used in ML
Representation of observables involves investigator choices
ML methods should be written in a form that is equivariant to changes in investigator choices
Geometric Principle: Laws of physics must be expressed as geometric relationships between geometric objects
Symmetries of coordinate freedom and gauge symmetry
Analogs of these symmetries could have big impacts in ML
Two types of symmetries: passive and active
Passive symmetries apply to all data analysis problems
Guidance on how to structure ML models to respect passive symmetries

Passive symmetries

Passive symmetries arise from redundancies or free parameters in data representation
Active symmetries arise from observed or empirical invariances of physical laws
Passive symmetries can be established without observations
Active symmetries are rare and usually only appear in natural-science contexts
Active and passive transformations are similar mathematically
To enforce a passive symmetry, all relevant contextual information must be incorporated
Ricci calculus is used to make objects equivariant to coordinate diffeomorphisms
Passive symmetries have not featured in ML practice, but could be significant

Example: units covariance

Units covariance is a passive symmetry that states the behavior of a system does not depend on the units system used.
Dropping a mass from a height and launching a mass at a velocity can be answered using dimensional arguments.
Answers to these questions do not depend on the input mass.
Units covariance has been used in ML methods to improve training, predictive accuracy, and out-of-sample generalization.

Formal definition

Consider X to be the state space of a physical system
We consider a family of maps {Φ i : X → H i } i∈I
Two encodings Φ i and Φ j are compatible if there exists an invertible morphism
Not all observables are compatible
Passive symmetries are the groupoid of invertible morphisms between compatible encodings
Imposing a passive symmetry on an ML model can lead to generalization improvements

Experiments and examples

Planck discovered a formula for black-body radiation intensity
Planck’s constant h was introduced
Classical physics had an “ultraviolet catastrophe”
Quantum mechanics solved the problem by cutting off ultraviolet modes
Villar et al. 2022 used units covariant regression to predict intensity
Units covariant regression found a constant with units consistent with h

Springy double pendulum:

Double pendulum connected by springs is a toy example used in equivariant ML demonstrations
Final conditions related to initial conditions and dynamics is classically chaotic
System subject to passive O(3) symmetry and active O(2) symmetry
O(3) passive symmetry requires all relevant vectors to be transformed identically
Experiment to predict dynamics of double pendulum using O(3)-equivariant models
Models implemented are Known-g, No-g, and Learned-g

Connections with causality

Passive symmetries can be applied to causal models
Dimensional analysis assumes all relevant quantities are specified
Difficulty in causal inference is related to knowing all confounding variables
Experiments can indicate which variables are relevant
Prior knowledge from related problems can inform which variables are relevant

Connections to current ml practice

ML implementations don’t impose exact symmetries
Data augmentation can approximate equivariances
Two approaches to optimize equivariant functions: parameterizing or finding invariant features
Symplectic networks preserve differential 2-form
Equivariant ML models have scientific applications
Implicit bias, generalization error, and sample complexity of equivariant ML models have been studied

Dos and don’ts

Principal Component Analysis is a dimensionally invalid method
Changing the units of one variable changes all the “principal components”
PCA does the right thing if all features have the same units
Output of PCA is sensitive to units system if features have different units
Kernel functions with inputs of different units cannot obey the passive symmetry of units covariance
Optimization of scalar cost function must obey passive geometric groups
Common ML methods violate rules if features are normalized differently or have different units
Neural nets violate many rules, nonlinear functions can only be applied to scalars
L1 and L∞ norms are almost always inconsistent with passive symmetries
Regularizers are often not units covariant
Latent variable models and ICA must incorporate gauge transformations correctly

Discussion

Passive symmetries are present in most ML or data-analysis tasks.
Enforcing these symmetries should improve ML methods’ generalization capabilities.
Implementation of passive symmetries can be difficult due to missing elements in the problem formulation.
Symmetries can be exact or approximate, depending on the context.
Convolutional structure in image models might be related to observer symmetries.

A. springy double pendulum

We consider a dissipationless spherical double pendulum with springs.
The kinetic and potential energy of the system are given by equations.
The prediction task is to learn the positions and momenta over a set of later times given the initializations of the pendulum positions and momenta.
The training inputs consist of 500 different initializations of the pendulum positions and momenta.
We consider three different O(3)-equivariant models depending on how the gravitational acceleration vector is involved.
Known-g model uses the gravitational acceleration vector as an input feature.
Learned-g model uses the gravitational acceleration vector as an learnable variable.
No-g model does not use the gravitational acceleration vector as an input feature.
The model is evaluated on a test data set with T = 150 and t 0 = 0.
The performance of the three predictive models is based on the state relative error at a given time t.
Figure 1 depicts the difference between active and passive transformations.
Figure 2 shows the prediction of the intensity of black body radiation and the performance of learning the dynamics of the springy double pendulum.
Inference of causal structure is possible without any training data or interventions.

Link to paper#

Abstract#

Paper Content#

Introduction#

Passive symmetries#

Example: units covariance#

Formal definition#

Experiments and examples#

Springy double pendulum:#

Connections with causality#

Connections to current ml practice#

Dos and don’ts#

Discussion#

A. springy double pendulum#

Link to paper

Abstract

Paper Content

Introduction

Passive symmetries

Example: units covariance

Formal definition

Experiments and examples

Springy double pendulum:

Connections with causality

Connections to current ml practice

Dos and don’ts

Discussion

A. springy double pendulum