Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Transformers are used for a variety of modalities, including images, audio, video, and undirected graphs.
- Transformers for directed graphs are an underexplored topic.
- Two direction- and structure-aware positional encodings for directed graphs are proposed.
- The extra directionality information is useful for downstream tasks.
- Model outperforms prior state of the art on Open Graph Benchmark Code2 by 14.7%.
Paper Content
Introduction
- Transformers are used in many machine learning models
- Transformers are used for competitive programming, question answering, and combinatorial optimization
- Transformers have been used for graph learning tasks
- Most prior works focus on undirected graphs, but directed graphs are important
- Attention mechanism needs to be aware of graph structure
- Positional encodings can incorporate structural information
- Directional positional encodings are proposed
- Directional random walk encodings are proposed
- Directional positional encodings are predictive for different distance measures
- Directional positional encodings can improve GNNs
- Directed graphs can reduce input dimensionality
- Representing source code as a sequence is standard
- Contributions: connection between sinusoidal positional encodings and Laplacian, spectral positional encodings, random walk positional encodings, predictiveness of structure-aware positional encodings, correctness prediction for sorting networks, modeling sequence of program statements as directed graph, new state of the art on OGB Code2 dataset
Sinusoidal and laplacian encodings
- Positional Encodings (PEs) are used to introduce a domain-specific inductive bias.
- Sinusoidal positional encodings are proposed by Vaswani et al. (2017).
- Eigenvectors of the Laplacian generalize sinusoidal positional encodings to graphs.
- Sinusoidal encodings sweep frequencies using a geometric series.
- Graphs generalize sequences to sets of tokens/nodes with arbitrary connections.
- Graph Fourier Transformation (GFT) is defined via the eigendecomposition of the combinatorial Laplacian.
- Eigenvectors of the Laplacian are real-valued and have sign invariance.
- Cosine Transformation Type II is a possible set of eigenvectors of the combinatorial Laplacian for a sequence.
Directional spectral encodings
- Propose to use Magnetic Laplacian, a direction-aware generalization of combinatorial Laplacian
- Encodes direction with complex numbers
- Eigenvectors used for structure-aware positional encoding
- Magnetic Laplacian is Hermitian matrix
- Equivalent to combinatorial Laplacian for q = 0 and q ≤ 0.25
- Exp iΘ (q) term encodes edge direction
- Potential q determines ratio of real and imaginary part
- Eigenvectors encode direction
- Normalized first eigenvector minimizes conflicting edges
- Propose to scale potential q with number of nodes and amount of directed edges
- Choose c ∈ C by convention
- MagLapNet preprocesses eigenvectors before using them as positional encodings
- SignNet resolves sign ambiguity of eigenvectors
Directional random walks
- Random walks can be used to encode node positions in a graph.
- Random walks can improve the expressiveness of GNNs.
- Random walks can be applied to directed graphs, but come with caveats.
- Random walks are best suited for capturing short distances.
Positional encodings playground
- Defined two directional structure-aware positional encodings
- Assessed efficacy by predicting (relative) distance measures on (directed) graphs
- Hypothesized that a good positional encoding should distinguish between ancestors/successors and have a notion of distance on the graph
- Predicted if a node is reachable acknowledging the edge directions
- Studied prediction of adjacent nodes and directed/undirected shortest path distance
- Used transformer encoder with Magnetic Laplacian, direction-aware random walk, and eigenvectors of the combinatorial Laplacian
- Assessed with F1 score and Root Mean Squared Error
- Results showed Magnetic Laplacian outperformed Combinatorial Laplacian for directedness, random walk encodings performed similarly to Magnetic Laplacian for validation but outperformed on test
Application: sorting networks
- Sorting networks are a class of comparison-based algorithms that sort any input sequence of fixed size with a static sequence of comparators.
- We use this task to make the implications of symmetrization and sequentialization explicit.
- We construct a dataset consisting of 800,000 training instances for equally probable sequence lengths 7 ≤ p train ≤ 11.
- There exist correct and incorrect sorting networks that map to the same undirected graph.
- Representing directed graphs as sequences can introduce a huge amount of arbitrary orderedness.
- Magnetic Laplacian eigenvectors outperform all other positional encodings.
- Sinusoidal encodings combined with the Magnetic Laplacian outperform all other encodings.
- GNNs with directional information perform on par with a transformer encoder with the combination of Magnetic Laplacian and sinusoidal positional encodings.
- Magnetic Laplacian eigenvectors can also help a GNN’s generalization.
- Random walk encodings struggle to generalize to longer sequences and harm performance.
Application: function name prediction
- Function name prediction is an established task in the graph learning community
- Graphs for source code used for machine learning retain the sequential connections between instructions
- Open Graph Benchmark Code2 dataset represents 450,000 functions with its Abstract Syntax Tree (AST) and sequential connections
- State-of-the-art model is susceptible to semantics-preserving permutation of the source code
- Graph construction is inspired by Bieber et al. (2022)
- Constructs a Directed Acyclic Graph (DAG) for each “block”
- Connects the statements between blocks considering the control flow
- Addresses the (non-) commutative properties for basic python operations via edge features
- Does not reference the tokenized source code
Related work
- Traditional graph metrics used for positional encodings
- Relative positional encodings used in Zügner et al. (2021)
- Sinusoidal positional encodings proposed by Luo (2022)
- Spectral encodings based on SVD used by Hussain et al. (2022)
Conclusion
- We propose positional encodings for directed graphs based on the Magnetic Laplacian and random walks.
- Both positional encodings can help transformers to gain structure awareness.
- Directed graphs can lower the effective input dimensionality.
- The combinatorial Laplacian is a discretization of the Laplace-Beltrami operator on Riemannian manifolds.
- The eigenvectors of the combinatorial Laplacian are widely used in graph machine learning.
- The Magnetic Laplacian bridges the gap using complex values.
- The first eigenvector of the Magnetic Laplacian is related to the angular synchronization problem.
- The eigenvectors of the Magnetic Laplacian are plotted in Fig. 3.
- The eigenvectors of the Laplacian (without symmetrization) are plotted in Fig. D.1 and D.2.
- Repeated eigenvalues can be a source of ambiguity in the eigenvectors.
- For disconnected components, the eigenvectors and eigenvalues resolve as if we would decompose each component independently.