Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Schur-Weyl duality between partition algebra and symmetric group provides a stronger theoretical foundation for characterizing permutation equivariant neural networks.
  • Unifies two separate bodies of literature and corrects some widely quoted results in machine learning community.
  • Graphical representation of basis of set partitions used to find basis of matrices for learnable, linear, permutation equivariant layer functions.
  • Number of weights in layer functions can be calculated by looking at certain paths through McKay quiver for $M_n$.
  • Approach generalizes to construction of neural networks equivariant to local symmetries.

Paper Content

Introduction

  • Permutation equivariant neural networks are used to learn from data that has permutation symmetry.
  • Two major issues remain in the research topic.
  • Representation theory of the partition algebra is used to construct permutation equivariant neural networks.
  • Combinatorial objects are used to construct the neural networks.
  • Schur-Weyl duality is used to calculate the number of weights.

Preliminaries

A word on notation

  • Denote the symmetric group of permutations of n objects by S n
  • Field of scalars is C, but results generalize to any field F of characteristic zero, such as R
  • Tensor products taken over C, unless otherwise stated
  • [n] represents the set {1, . . . , n}

Representations of the symmetric group and its branching rules

  • Representation of S_n is a choice of vector space V over C and a group homomorphism ρ
  • Focus on finite-dimensional vector spaces V
  • V is a module over the group algebra C[S_n]
  • Partitions of nonnegative integer n and their associated Young frames
  • Partition λ of n is a tuple of nonnegative integers
  • Partition has k parts, length is k
  • To every partition λ n, associate a Young frame of shape λ
  • Set of coordinates for each Young frame
  • Box at coordinates (i, j) is removable/addable
  • Young tableau of shape λ is a bijection between set {1, 2, …, n} and boxes of Young frame
  • Young tableau is standard if numbers are increasing in each row/column
  • Equivalence classes of Young tableaux of shape λ
  • Representation of S_n has basis set of all Young tabloids of shape λ
  • Representation is M_λ
  • M_λ may or may not be irreducible
  • As part of decomposition into irreducible representations of S_n, one copy of irreducible representation S_λ_n
  • S_λ_n is called Specht module
  • Complete set of irreducible representations for S_n up to equivalence
  • Basis of S_λ_n in bijective correspondence with set of all standard Young tableaux of shape λ
  • Size of S_λ_n given by hook-length formula
  • Branching rule for irreducible representations of S_n (restriction/induction versions)

A special isomorphism between two representations of s n .

  • M [n−1,1] is a permutation representation of S n of dimension n.
  • M n has a basis consisting of all tabloids of shape [n − 1, 1].
  • Any k-tensor power of the permutation representation, M ⊗k n , is a representation of S n.
  • C n k is isomorphic to M ⊗k n as representations of S n.

Permutation equivariant neural networks

  • Permutation equivariant neural networks are composed of linear and non-linear maps between representations of the symmetric group
  • Permutation equivariance is a map between two representations of the symmetric group
  • Permutation invariance is a special case of permutation equivariance
  • Neural networks are permutation equivariant functions
  • Neural networks can be made permutation invariant by choosing the representation in the final layer to be the d-dimensional trivial representation of S n
  • The composition of any number of permutation equivariant functions is itself permutation equivariant
  • We are interested in calculating the exact form of any permutation equivariant neural network
  • We are interested in understanding the representations and the learnable, linear, permutation equivariant layer functions between them
  • We want to calculate the number of weight parameters and find the matrix form of each layer function
  • We assume throughout most of the paper that the feature dimension for all of our representations is one
  • Layer functions do not take into account any bias terms

How walking on a graph results in a tensor power representation

  • Finite-dimensional S n -module V
  • Quiver Q = (Q 0 , Q 1 , s, e)
  • Elements of Q 0 called nodes
  • Elements of Q 1 called arrows
  • s(α) start vertex, e(α) end vertex
  • McKay quiver associated with V, Q V (S n )
  • Nodes are partitions λ n corresponding to irreducible S n -modules
  • Arrows between λ and µ if S µ n appears in S λ n ⊗ V
  • Adjacency matrix A V associated with Q V (S n )
  • Walk of k steps from λ to µ if k arrows from λ to µ
  • Each step on walk of k steps corresponds to tensoring with V
  • Multiplicity of S µ n in V ⊗k is number of walks of k steps from [n] to µ
  • V = M n reduces decomposition of S λ n ⊗ M n to removal and addition operations on boxes of Young frame
  • Character theory or adjacency matrix to find m µ k
  • Tensor Identity for S λ n ⊗ M n
  • Restriction-Induction Bratteli diagram B(S n , S n−1 )
  • Vertices in row k are elements of Λ(S n ) in M ⊗k n
  • Vertices in row k+ 1 2 are elements of Λ(S n−1 ) in Res Sn S n−1 (M ⊗k n )
  • Edges between vertices in adjacent rows
  • m µ k is number of paths from [n] on level 0 to µ ∈ Λ k (S n ) on level k
  • Multiplicities for λ on given level computed iteratively from previous level
  • Bijection between indexing of irreducibles of symmetric group and partition algebra

Schur-weyl duality and centraliser algebras

  • M ⊗k n decomposes as an S n -module
  • Double Centraliser Theorem states that B is a semisimple algebra and A = End B (V )
  • Choosing A = C[S n ] and V = M ⊗k n , B = End Sn (M ⊗k n ) is a semisimple algebra
  • S n is in Schur-Weyl duality with its centraliser algebra on M ⊗k n

Understanding permutation equivariant neural networks through the irreducibles of the symmetric group

  • M ⊗k n has a structure related to S n and its centraliser algebra.
  • Theorem 10 states that the multiplicity of the irreducible S n -module S λ n in M ⊗k n, the number of walks of k steps from the node [n] to the node λ in Q Mn (S n ), and the kth power of the adjacency matrix A Mn for Q Mn (S n) are the same.

The number of weights in a k-order permutation equivariant layer function

  • Number of weights in a k-order permutation equivariant layer function is equal to dim End Sn (M ⊗k n)
  • Number of weights can be calculated using decomposition and Schur’s Lemma
  • Number of weights can also be calculated using a combinatorial approach
  • Number of weights is the sum of all pairs of walks of k steps from the node [n] to the node λ in Q Mn (S n)
  • Number of weights is equal to the number of walks of length 2k that start and end at node [n] and pass through λ on the kth step

The number of weights in a k-order permutation invariant layer function

  • The number of weights in a k-order permutation invariant layer function is equal to dim Hom Sn (M ⊗k n , M ⊗0 n ).
  • The number of weights in a k-order permutation invariant layer function is equal to the number of walks of length k that start and end at Mn.
  • The number of weights in a k-order permutation invariant layer function is equal to the multiplicity of the irreducible S n -module S [n] n in M ⊗k n.

The partition algebra and its consequences for k-order permutation equivariant layer functions

  • Constructed an algebra P k (n) with a basis of combinatorial diagrams
  • Combinatorial diagrams correspond bijectively to all possible partitions of a set of 2k elements
  • Can calculate standard basis matrices of End Sn (M ⊗k n )
  • Argument adapted from Benkart and Halverson (2019a,b)
  • Argument expanded from Jones (1994)
  • Jones (1994) first to find algebra homomorphism given in Section 6.5

The partition algebra, p k (n)

  • Consider the set {1, . . . , 2k} of 2k elements, denoted [2k].
  • Set partition of [2k] into subsets, called blocks.
  • Label top row 1, . . . , k and bottom row k + 1, . . . , 2k.
  • Draw edges between vertices such that connected components correspond to blocks.

The orbit basis of p k (n)

  • P k (n) is constructed from a set of diagrams corresponding to a set partition in Π 2k
  • A partial ordering is defined on the set partitions in Π 2k
  • A set of elements in P k (n) is indexed by the set partitions of Π 2k
  • The transition matrix between the diagram basis and the set is unitriangular
  • The standard basis of M n is identified with the basis labelled by {v a | a ∈ [n]}
  • The elements over all tuples I := (i 1 , i 2 , . . . , i k ) ∈ [n] k form a basis of M ⊗k n
  • The elements of End Sn (M ⊗k n ) correspond bijectively to all set partitions π in Π 2k having at most n blocks
  • The basis elements of End Sn (M ⊗k n ) are formed by summing the matrix units indexed by each pair in the orbit
  • The block labelling of π is (I π , J π ) = (1, 2, 1, 2, 3, 4, 5, 4)
  • M ⊗2k n has a basis consisting of elements of the form
  • For every set partition π in Π 2k having at most n blocks, an S n -submodule of M ⊗2k n is defined
  • The t-size ordered subsets of [n] correspond bijectively to the basis elements of M [n−t,1 t ]
  • Construction of vector space consisting of combinatorial diagrams indexed by elements of Π l+k
  • Diagram basis and orbit basis of P l k (n)
  • Map of vector spaces Φ l k,n : P l k (n) → Hom Sn (M ⊗k n , M ⊗l n )
  • Dimension of Hom Sn (M ⊗k n , M ⊗l n ) is B(l + k)
  • Kernel of Φ l k,n is C-linear span of orbit basis elements corresponding to set partitions in Π l+k that have more than n blocks
  • Weight matrix of k-order to l-order permutation equivariant neural network layer function in standard basis of M n can be found by finding orbit basis diagrams x π in P l k (n) corresponding bijectively to set partitions π in Π l+k,n
  • Isomorphism between Hom Sn (M ⊗k n , M ⊗l n ) and Hom Sn (M ⊗p n , M ⊗q n ) for any p, q ∈ Z ≥0 such that q + p = l + k
  • Unfolding of orbit basis diagrams in P l k (n) to P 0 l+k (n) and folding of first q nodes from left to obtain orbit basis diagram in P q p (n)

Adding features and biases

  • Maron et al. (2019) improved upon a result and provided a clearer sense of the form of the standard basis matrices for the spaces under consideration.
  • The feature dimension of the layers was assumed to be one, simplifying the analysis.
  • The results can be adapted for the case where the feature dimension of the layers is greater than one.
  • To include bias terms in the layer functions of a permutation equivariant neural network, the layer functions must be redefined.
  • The bias vector can be found by finding the basis elements of Hom Sn (M ⊗0 n , M ⊗l n).

A generalisation to layer functions that are equivariant to a product of symmetric groups

  • Construct neural networks that are equivariant to a product of symmetric groups
  • Respect only the symmetries in each group of n r objects
  • Define external tensor product representation of direct product group G × H
  • Map P l 1 k 1 (n 1 ) ⊗ • • • ⊗ P lm km (n m ) onto Hom-space given in (8.4)
  • Find basis of Hom-space by considering all possible side-by-side combinations of orbit basis diagrams
  • Dimension of Hom-space given in (8.4) is m r=1 d kr d lr B(l r + k r , n r )
  • Recover result of Hartford et al. (2018) by mapping orbit basis of direct product of partition algebras onto Hom-space
  • Learn layer functions that are equivariant to permutations of the features
  • Hom-space given in (8.16) has dimension B(2m+ m r=1 (l r +k r ), m r=1 (n r +d r ))
  • Extend (8.16) to consider layer functions on tensors with feature spaces that are tensors
  • Martin (1990, 1994, 1996) introduced the partition algebra
  • Jones (1994) constructed a surjective algebra homomorphism between the partition algebra and the tensor power centraliser algebra
  • Benkart and Halverson led the development of the theory
  • Benkart and Moon (2018) wrote a paper on walking on graphs and its connection to the centraliser algebra
  • Benkart (2014, 2016) studied the McKay quiver and Schur-Weyl duality
  • Halverson and Ram (2005) wrote a seminal paper on the partition algebra
  • Halverson (2001, 2019) studied characters and summarised main results
  • Bowman et al. (2013) looked at Kronecker coefficients
  • Benkart and Halverson (2019a,b) and Benkart et al. (2017) showed how the partition algebra can be used to construct the invariant theory of the symmetric group
  • Zaheer et al. (2017) introduced the first permutation equivariant neural network
  • Hartford et al. (2018) considered permutation equivariant neural networks
  • Maron et al. (2019) studied linear permutation equivariant and invariant neural network layers
  • Finzi et al. (2021) recognised that the dimensions of the Hom-spaces are not independent of n
  • Pan and Kondor (2022) assumed the dimension of the Hom-spaces is independent of n

Conclusion

  • Combinatorial representation theory of partition algebra provides theoretical background for understanding permutation equivariant neural networks
  • McKay quiver decomposes any tensor power of representation into irreducibles of S n
  • Schur-Weyl duality between S n and centraliser algebra End Sn (M ⊗k n )
  • Multiplicities of irreducibles in decomposition are dimensions of simple modules of centraliser algebra
  • Orbit basis of partition algebra and related vector spaces finds form of layer functions
  • Equivariant across feature dimension for all tensor spaces
  • Schur-Weyl duality between group and algebra of diagrams to understand structure of neural networks
  • Example of block labelling for set partition
  • Structure of McKay quiver and restriction-induction Bratteli diagram linked to construction of permutation equivariant neural networks
  • Number of elements in Π 2k,n is equal to 20 with 9 blocks
  • Diagram d π consists of two rows of k vertices and edges between vertices
  • Dimension of Hom-space given in (8.16) is equal to k-th Bell number