Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Schur-Weyl duality between partition algebra and symmetric group provides a stronger theoretical foundation for characterizing permutation equivariant neural networks.
- Unifies two separate bodies of literature and corrects some widely quoted results in machine learning community.
- Graphical representation of basis of set partitions used to find basis of matrices for learnable, linear, permutation equivariant layer functions.
- Number of weights in layer functions can be calculated by looking at certain paths through McKay quiver for $M_n$.
- Approach generalizes to construction of neural networks equivariant to local symmetries.
Paper Content
Introduction
- Permutation equivariant neural networks are used to learn from data that has permutation symmetry.
- Two major issues remain in the research topic.
- Representation theory of the partition algebra is used to construct permutation equivariant neural networks.
- Combinatorial objects are used to construct the neural networks.
- Schur-Weyl duality is used to calculate the number of weights.
Preliminaries
A word on notation
- Denote the symmetric group of permutations of n objects by S n
- Field of scalars is C, but results generalize to any field F of characteristic zero, such as R
- Tensor products taken over C, unless otherwise stated
- [n] represents the set {1, . . . , n}
Representations of the symmetric group and its branching rules
- Representation of S_n is a choice of vector space V over C and a group homomorphism ρ
- Focus on finite-dimensional vector spaces V
- V is a module over the group algebra C[S_n]
- Partitions of nonnegative integer n and their associated Young frames
- Partition λ of n is a tuple of nonnegative integers
- Partition has k parts, length is k
- To every partition λ n, associate a Young frame of shape λ
- Set of coordinates for each Young frame
- Box at coordinates (i, j) is removable/addable
- Young tableau of shape λ is a bijection between set {1, 2, …, n} and boxes of Young frame
- Young tableau is standard if numbers are increasing in each row/column
- Equivalence classes of Young tableaux of shape λ
- Representation of S_n has basis set of all Young tabloids of shape λ
- Representation is M_λ
- M_λ may or may not be irreducible
- As part of decomposition into irreducible representations of S_n, one copy of irreducible representation S_λ_n
- S_λ_n is called Specht module
- Complete set of irreducible representations for S_n up to equivalence
- Basis of S_λ_n in bijective correspondence with set of all standard Young tableaux of shape λ
- Size of S_λ_n given by hook-length formula
- Branching rule for irreducible representations of S_n (restriction/induction versions)
A special isomorphism between two representations of s n .
- M [n−1,1] is a permutation representation of S n of dimension n.
- M n has a basis consisting of all tabloids of shape [n − 1, 1].
- Any k-tensor power of the permutation representation, M ⊗k n , is a representation of S n.
- C n k is isomorphic to M ⊗k n as representations of S n.
Permutation equivariant neural networks
- Permutation equivariant neural networks are composed of linear and non-linear maps between representations of the symmetric group
- Permutation equivariance is a map between two representations of the symmetric group
- Permutation invariance is a special case of permutation equivariance
- Neural networks are permutation equivariant functions
- Neural networks can be made permutation invariant by choosing the representation in the final layer to be the d-dimensional trivial representation of S n
- The composition of any number of permutation equivariant functions is itself permutation equivariant
- We are interested in calculating the exact form of any permutation equivariant neural network
- We are interested in understanding the representations and the learnable, linear, permutation equivariant layer functions between them
- We want to calculate the number of weight parameters and find the matrix form of each layer function
- We assume throughout most of the paper that the feature dimension for all of our representations is one
- Layer functions do not take into account any bias terms
How walking on a graph results in a tensor power representation
- Finite-dimensional S n -module V
- Quiver Q = (Q 0 , Q 1 , s, e)
- Elements of Q 0 called nodes
- Elements of Q 1 called arrows
- s(α) start vertex, e(α) end vertex
- McKay quiver associated with V, Q V (S n )
- Nodes are partitions λ n corresponding to irreducible S n -modules
- Arrows between λ and µ if S µ n appears in S λ n ⊗ V
- Adjacency matrix A V associated with Q V (S n )
- Walk of k steps from λ to µ if k arrows from λ to µ
- Each step on walk of k steps corresponds to tensoring with V
- Multiplicity of S µ n in V ⊗k is number of walks of k steps from [n] to µ
- V = M n reduces decomposition of S λ n ⊗ M n to removal and addition operations on boxes of Young frame
- Character theory or adjacency matrix to find m µ k
- Tensor Identity for S λ n ⊗ M n
- Restriction-Induction Bratteli diagram B(S n , S n−1 )
- Vertices in row k are elements of Λ(S n ) in M ⊗k n
- Vertices in row k+ 1 2 are elements of Λ(S n−1 ) in Res Sn S n−1 (M ⊗k n )
- Edges between vertices in adjacent rows
- m µ k is number of paths from [n] on level 0 to µ ∈ Λ k (S n ) on level k
- Multiplicities for λ on given level computed iteratively from previous level
- Bijection between indexing of irreducibles of symmetric group and partition algebra
Schur-weyl duality and centraliser algebras
- M ⊗k n decomposes as an S n -module
- Double Centraliser Theorem states that B is a semisimple algebra and A = End B (V )
- Choosing A = C[S n ] and V = M ⊗k n , B = End Sn (M ⊗k n ) is a semisimple algebra
- S n is in Schur-Weyl duality with its centraliser algebra on M ⊗k n
Understanding permutation equivariant neural networks through the irreducibles of the symmetric group
- M ⊗k n has a structure related to S n and its centraliser algebra.
- Theorem 10 states that the multiplicity of the irreducible S n -module S λ n in M ⊗k n, the number of walks of k steps from the node [n] to the node λ in Q Mn (S n ), and the kth power of the adjacency matrix A Mn for Q Mn (S n) are the same.
The number of weights in a k-order permutation equivariant layer function
- Number of weights in a k-order permutation equivariant layer function is equal to dim End Sn (M ⊗k n)
- Number of weights can be calculated using decomposition and Schur’s Lemma
- Number of weights can also be calculated using a combinatorial approach
- Number of weights is the sum of all pairs of walks of k steps from the node [n] to the node λ in Q Mn (S n)
- Number of weights is equal to the number of walks of length 2k that start and end at node [n] and pass through λ on the kth step
The number of weights in a k-order permutation invariant layer function
- The number of weights in a k-order permutation invariant layer function is equal to dim Hom Sn (M ⊗k n , M ⊗0 n ).
- The number of weights in a k-order permutation invariant layer function is equal to the number of walks of length k that start and end at Mn.
- The number of weights in a k-order permutation invariant layer function is equal to the multiplicity of the irreducible S n -module S [n] n in M ⊗k n.
The partition algebra and its consequences for k-order permutation equivariant layer functions
- Constructed an algebra P k (n) with a basis of combinatorial diagrams
- Combinatorial diagrams correspond bijectively to all possible partitions of a set of 2k elements
- Can calculate standard basis matrices of End Sn (M ⊗k n )
- Argument adapted from Benkart and Halverson (2019a,b)
- Argument expanded from Jones (1994)
- Jones (1994) first to find algebra homomorphism given in Section 6.5
The partition algebra, p k (n)
- Consider the set {1, . . . , 2k} of 2k elements, denoted [2k].
- Set partition of [2k] into subsets, called blocks.
- Label top row 1, . . . , k and bottom row k + 1, . . . , 2k.
- Draw edges between vertices such that connected components correspond to blocks.
The orbit basis of p k (n)
- P k (n) is constructed from a set of diagrams corresponding to a set partition in Π 2k
- A partial ordering is defined on the set partitions in Π 2k
- A set of elements in P k (n) is indexed by the set partitions of Π 2k
- The transition matrix between the diagram basis and the set is unitriangular
- The standard basis of M n is identified with the basis labelled by {v a | a ∈ [n]}
- The elements over all tuples I := (i 1 , i 2 , . . . , i k ) ∈ [n] k form a basis of M ⊗k n
- The elements of End Sn (M ⊗k n ) correspond bijectively to all set partitions π in Π 2k having at most n blocks
- The basis elements of End Sn (M ⊗k n ) are formed by summing the matrix units indexed by each pair in the orbit
- The block labelling of π is (I π , J π ) = (1, 2, 1, 2, 3, 4, 5, 4)
- M ⊗2k n has a basis consisting of elements of the form
- For every set partition π in Π 2k having at most n blocks, an S n -submodule of M ⊗2k n is defined
- The t-size ordered subsets of [n] correspond bijectively to the basis elements of M [n−t,1 t ]
Constructing other permutation invariant and equivariant layer functions through related partition diagrams
- Construction of vector space consisting of combinatorial diagrams indexed by elements of Π l+k
- Diagram basis and orbit basis of P l k (n)
- Map of vector spaces Φ l k,n : P l k (n) → Hom Sn (M ⊗k n , M ⊗l n )
- Dimension of Hom Sn (M ⊗k n , M ⊗l n ) is B(l + k)
- Kernel of Φ l k,n is C-linear span of orbit basis elements corresponding to set partitions in Π l+k that have more than n blocks
- Weight matrix of k-order to l-order permutation equivariant neural network layer function in standard basis of M n can be found by finding orbit basis diagrams x π in P l k (n) corresponding bijectively to set partitions π in Π l+k,n
- Isomorphism between Hom Sn (M ⊗k n , M ⊗l n ) and Hom Sn (M ⊗p n , M ⊗q n ) for any p, q ∈ Z ≥0 such that q + p = l + k
- Unfolding of orbit basis diagrams in P l k (n) to P 0 l+k (n) and folding of first q nodes from left to obtain orbit basis diagram in P q p (n)
Adding features and biases
- Maron et al. (2019) improved upon a result and provided a clearer sense of the form of the standard basis matrices for the spaces under consideration.
- The feature dimension of the layers was assumed to be one, simplifying the analysis.
- The results can be adapted for the case where the feature dimension of the layers is greater than one.
- To include bias terms in the layer functions of a permutation equivariant neural network, the layer functions must be redefined.
- The bias vector can be found by finding the basis elements of Hom Sn (M ⊗0 n , M ⊗l n).
A generalisation to layer functions that are equivariant to a product of symmetric groups
- Construct neural networks that are equivariant to a product of symmetric groups
- Respect only the symmetries in each group of n r objects
- Define external tensor product representation of direct product group G × H
- Map P l 1 k 1 (n 1 ) ⊗ • • • ⊗ P lm km (n m ) onto Hom-space given in (8.4)
- Find basis of Hom-space by considering all possible side-by-side combinations of orbit basis diagrams
- Dimension of Hom-space given in (8.4) is m r=1 d kr d lr B(l r + k r , n r )
- Recover result of Hartford et al. (2018) by mapping orbit basis of direct product of partition algebras onto Hom-space
- Learn layer functions that are equivariant to permutations of the features
- Hom-space given in (8.16) has dimension B(2m+ m r=1 (l r +k r ), m r=1 (n r +d r ))
- Extend (8.16) to consider layer functions on tensors with feature spaces that are tensors
Related work
- Martin (1990, 1994, 1996) introduced the partition algebra
- Jones (1994) constructed a surjective algebra homomorphism between the partition algebra and the tensor power centraliser algebra
- Benkart and Halverson led the development of the theory
- Benkart and Moon (2018) wrote a paper on walking on graphs and its connection to the centraliser algebra
- Benkart (2014, 2016) studied the McKay quiver and Schur-Weyl duality
- Halverson and Ram (2005) wrote a seminal paper on the partition algebra
- Halverson (2001, 2019) studied characters and summarised main results
- Bowman et al. (2013) looked at Kronecker coefficients
- Benkart and Halverson (2019a,b) and Benkart et al. (2017) showed how the partition algebra can be used to construct the invariant theory of the symmetric group
- Zaheer et al. (2017) introduced the first permutation equivariant neural network
- Hartford et al. (2018) considered permutation equivariant neural networks
- Maron et al. (2019) studied linear permutation equivariant and invariant neural network layers
- Finzi et al. (2021) recognised that the dimensions of the Hom-spaces are not independent of n
- Pan and Kondor (2022) assumed the dimension of the Hom-spaces is independent of n
Conclusion
- Combinatorial representation theory of partition algebra provides theoretical background for understanding permutation equivariant neural networks
- McKay quiver decomposes any tensor power of representation into irreducibles of S n
- Schur-Weyl duality between S n and centraliser algebra End Sn (M ⊗k n )
- Multiplicities of irreducibles in decomposition are dimensions of simple modules of centraliser algebra
- Orbit basis of partition algebra and related vector spaces finds form of layer functions
- Equivariant across feature dimension for all tensor spaces
- Schur-Weyl duality between group and algebra of diagrams to understand structure of neural networks
- Example of block labelling for set partition
- Structure of McKay quiver and restriction-induction Bratteli diagram linked to construction of permutation equivariant neural networks
- Number of elements in Π 2k,n is equal to 20 with 9 blocks
- Diagram d π consists of two rows of k vertices and edges between vertices
- Dimension of Hom-space given in (8.16) is equal to k-th Bell number