Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Representation of data involves investigator choices
- Choices lead to exact symmetry
- Symmetries include coordinate freedom, gauge symmetry and units covariance
- Goal is to understand implications of passive symmetries for machine learning
- Discuss links to causal modeling
- Implementation of passive symmetries valuable for generalizing out of sample
Paper Content
Introduction
- ML has been inspired by mathematical physics
- Kernel trick and statistical mechanics techniques used in ML
- Representation of observables involves investigator choices
- ML methods should be written in a form that is equivariant to changes in investigator choices
- Geometric Principle: Laws of physics must be expressed as geometric relationships between geometric objects
- Symmetries of coordinate freedom and gauge symmetry
- Analogs of these symmetries could have big impacts in ML
- Two types of symmetries: passive and active
- Passive symmetries apply to all data analysis problems
- Guidance on how to structure ML models to respect passive symmetries
Passive symmetries
- Passive symmetries arise from redundancies or free parameters in data representation
- Active symmetries arise from observed or empirical invariances of physical laws
- Passive symmetries can be established without observations
- Active symmetries are rare and usually only appear in natural-science contexts
- Active and passive transformations are similar mathematically
- To enforce a passive symmetry, all relevant contextual information must be incorporated
- Ricci calculus is used to make objects equivariant to coordinate diffeomorphisms
- Passive symmetries have not featured in ML practice, but could be significant
Example: units covariance
- Units covariance is a passive symmetry that states the behavior of a system does not depend on the units system used.
- Dropping a mass from a height and launching a mass at a velocity can be answered using dimensional arguments.
- Answers to these questions do not depend on the input mass.
- Units covariance has been used in ML methods to improve training, predictive accuracy, and out-of-sample generalization.
Formal definition
- Consider X to be the state space of a physical system
- We consider a family of maps {Φ i : X → H i } i∈I
- Two encodings Φ i and Φ j are compatible if there exists an invertible morphism
- Not all observables are compatible
- Passive symmetries are the groupoid of invertible morphisms between compatible encodings
- Imposing a passive symmetry on an ML model can lead to generalization improvements
Experiments and examples
- Planck discovered a formula for black-body radiation intensity
- Planck’s constant h was introduced
- Classical physics had an “ultraviolet catastrophe”
- Quantum mechanics solved the problem by cutting off ultraviolet modes
- Villar et al. 2022 used units covariant regression to predict intensity
- Units covariant regression found a constant with units consistent with h
Springy double pendulum:
- Double pendulum connected by springs is a toy example used in equivariant ML demonstrations
- Final conditions related to initial conditions and dynamics is classically chaotic
- System subject to passive O(3) symmetry and active O(2) symmetry
- O(3) passive symmetry requires all relevant vectors to be transformed identically
- Experiment to predict dynamics of double pendulum using O(3)-equivariant models
- Models implemented are Known-g, No-g, and Learned-g
Connections with causality
- Passive symmetries can be applied to causal models
- Dimensional analysis assumes all relevant quantities are specified
- Difficulty in causal inference is related to knowing all confounding variables
- Experiments can indicate which variables are relevant
- Prior knowledge from related problems can inform which variables are relevant
Connections to current ml practice
- ML implementations don’t impose exact symmetries
- Data augmentation can approximate equivariances
- Two approaches to optimize equivariant functions: parameterizing or finding invariant features
- Symplectic networks preserve differential 2-form
- Equivariant ML models have scientific applications
- Implicit bias, generalization error, and sample complexity of equivariant ML models have been studied
Dos and don’ts
- Principal Component Analysis is a dimensionally invalid method
- Changing the units of one variable changes all the “principal components”
- PCA does the right thing if all features have the same units
- Output of PCA is sensitive to units system if features have different units
- Kernel functions with inputs of different units cannot obey the passive symmetry of units covariance
- Optimization of scalar cost function must obey passive geometric groups
- Common ML methods violate rules if features are normalized differently or have different units
- Neural nets violate many rules, nonlinear functions can only be applied to scalars
- L1 and L∞ norms are almost always inconsistent with passive symmetries
- Regularizers are often not units covariant
- Latent variable models and ICA must incorporate gauge transformations correctly
Discussion
- Passive symmetries are present in most ML or data-analysis tasks.
- Enforcing these symmetries should improve ML methods’ generalization capabilities.
- Implementation of passive symmetries can be difficult due to missing elements in the problem formulation.
- Symmetries can be exact or approximate, depending on the context.
- Convolutional structure in image models might be related to observer symmetries.
A. springy double pendulum
- We consider a dissipationless spherical double pendulum with springs.
- The kinetic and potential energy of the system are given by equations.
- The prediction task is to learn the positions and momenta over a set of later times given the initializations of the pendulum positions and momenta.
- The training inputs consist of 500 different initializations of the pendulum positions and momenta.
- We consider three different O(3)-equivariant models depending on how the gravitational acceleration vector is involved.
- Known-g model uses the gravitational acceleration vector as an input feature.
- Learned-g model uses the gravitational acceleration vector as an learnable variable.
- No-g model does not use the gravitational acceleration vector as an input feature.
- The model is evaluated on a test data set with T = 150 and t 0 = 0.
- The performance of the three predictive models is based on the state relative error at a given time t.
- Figure 1 depicts the difference between active and passive transformations.
- Figure 2 shows the prediction of the intensity of black body radiation and the performance of learning the dynamics of the springy double pendulum.
- Inference of causal structure is possible without any training data or interventions.