Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Representation of data involves investigator choices
  • Choices lead to exact symmetry
  • Symmetries include coordinate freedom, gauge symmetry and units covariance
  • Goal is to understand implications of passive symmetries for machine learning
  • Discuss links to causal modeling
  • Implementation of passive symmetries valuable for generalizing out of sample

Paper Content

Introduction

  • ML has been inspired by mathematical physics
  • Kernel trick and statistical mechanics techniques used in ML
  • Representation of observables involves investigator choices
  • ML methods should be written in a form that is equivariant to changes in investigator choices
  • Geometric Principle: Laws of physics must be expressed as geometric relationships between geometric objects
  • Symmetries of coordinate freedom and gauge symmetry
  • Analogs of these symmetries could have big impacts in ML
  • Two types of symmetries: passive and active
  • Passive symmetries apply to all data analysis problems
  • Guidance on how to structure ML models to respect passive symmetries

Passive symmetries

  • Passive symmetries arise from redundancies or free parameters in data representation
  • Active symmetries arise from observed or empirical invariances of physical laws
  • Passive symmetries can be established without observations
  • Active symmetries are rare and usually only appear in natural-science contexts
  • Active and passive transformations are similar mathematically
  • To enforce a passive symmetry, all relevant contextual information must be incorporated
  • Ricci calculus is used to make objects equivariant to coordinate diffeomorphisms
  • Passive symmetries have not featured in ML practice, but could be significant

Example: units covariance

  • Units covariance is a passive symmetry that states the behavior of a system does not depend on the units system used.
  • Dropping a mass from a height and launching a mass at a velocity can be answered using dimensional arguments.
  • Answers to these questions do not depend on the input mass.
  • Units covariance has been used in ML methods to improve training, predictive accuracy, and out-of-sample generalization.

Formal definition

  • Consider X to be the state space of a physical system
  • We consider a family of maps {Φ i : X → H i } i∈I
  • Two encodings Φ i and Φ j are compatible if there exists an invertible morphism
  • Not all observables are compatible
  • Passive symmetries are the groupoid of invertible morphisms between compatible encodings
  • Imposing a passive symmetry on an ML model can lead to generalization improvements

Experiments and examples

  • Planck discovered a formula for black-body radiation intensity
  • Planck’s constant h was introduced
  • Classical physics had an “ultraviolet catastrophe”
  • Quantum mechanics solved the problem by cutting off ultraviolet modes
  • Villar et al. 2022 used units covariant regression to predict intensity
  • Units covariant regression found a constant with units consistent with h

Springy double pendulum:

  • Double pendulum connected by springs is a toy example used in equivariant ML demonstrations
  • Final conditions related to initial conditions and dynamics is classically chaotic
  • System subject to passive O(3) symmetry and active O(2) symmetry
  • O(3) passive symmetry requires all relevant vectors to be transformed identically
  • Experiment to predict dynamics of double pendulum using O(3)-equivariant models
  • Models implemented are Known-g, No-g, and Learned-g

Connections with causality

  • Passive symmetries can be applied to causal models
  • Dimensional analysis assumes all relevant quantities are specified
  • Difficulty in causal inference is related to knowing all confounding variables
  • Experiments can indicate which variables are relevant
  • Prior knowledge from related problems can inform which variables are relevant

Connections to current ml practice

  • ML implementations don’t impose exact symmetries
  • Data augmentation can approximate equivariances
  • Two approaches to optimize equivariant functions: parameterizing or finding invariant features
  • Symplectic networks preserve differential 2-form
  • Equivariant ML models have scientific applications
  • Implicit bias, generalization error, and sample complexity of equivariant ML models have been studied

Dos and don’ts

  • Principal Component Analysis is a dimensionally invalid method
  • Changing the units of one variable changes all the “principal components”
  • PCA does the right thing if all features have the same units
  • Output of PCA is sensitive to units system if features have different units
  • Kernel functions with inputs of different units cannot obey the passive symmetry of units covariance
  • Optimization of scalar cost function must obey passive geometric groups
  • Common ML methods violate rules if features are normalized differently or have different units
  • Neural nets violate many rules, nonlinear functions can only be applied to scalars
  • L1 and L∞ norms are almost always inconsistent with passive symmetries
  • Regularizers are often not units covariant
  • Latent variable models and ICA must incorporate gauge transformations correctly

Discussion

  • Passive symmetries are present in most ML or data-analysis tasks.
  • Enforcing these symmetries should improve ML methods’ generalization capabilities.
  • Implementation of passive symmetries can be difficult due to missing elements in the problem formulation.
  • Symmetries can be exact or approximate, depending on the context.
  • Convolutional structure in image models might be related to observer symmetries.

A. springy double pendulum

  • We consider a dissipationless spherical double pendulum with springs.
  • The kinetic and potential energy of the system are given by equations.
  • The prediction task is to learn the positions and momenta over a set of later times given the initializations of the pendulum positions and momenta.
  • The training inputs consist of 500 different initializations of the pendulum positions and momenta.
  • We consider three different O(3)-equivariant models depending on how the gravitational acceleration vector is involved.
  • Known-g model uses the gravitational acceleration vector as an input feature.
  • Learned-g model uses the gravitational acceleration vector as an learnable variable.
  • No-g model does not use the gravitational acceleration vector as an input feature.
  • The model is evaluated on a test data set with T = 150 and t 0 = 0.
  • The performance of the three predictive models is based on the state relative error at a given time t.
  • Figure 1 depicts the difference between active and passive transformations.
  • Figure 2 shows the prediction of the intensity of black body radiation and the performance of learning the dynamics of the springy double pendulum.
  • Inference of causal structure is possible without any training data or interventions.