Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Transformers are used for a variety of modalities, including images, audio, video, and undirected graphs.
Transformers for directed graphs are an underexplored topic.
Two direction- and structure-aware positional encodings for directed graphs are proposed.
The extra directionality information is useful for downstream tasks.
Model outperforms prior state of the art on Open Graph Benchmark Code2 by 14.7%.

Transformers are used in many machine learning models
Transformers are used for competitive programming, question answering, and combinatorial optimization
Transformers have been used for graph learning tasks
Most prior works focus on undirected graphs, but directed graphs are important
Attention mechanism needs to be aware of graph structure
Positional encodings can incorporate structural information
Directional positional encodings are proposed
Directional random walk encodings are proposed
Directional positional encodings are predictive for different distance measures
Directional positional encodings can improve GNNs
Directed graphs can reduce input dimensionality
Representing source code as a sequence is standard
Contributions: connection between sinusoidal positional encodings and Laplacian, spectral positional encodings, random walk positional encodings, predictiveness of structure-aware positional encodings, correctness prediction for sorting networks, modeling sequence of program statements as directed graph, new state of the art on OGB Code2 dataset

Positional Encodings (PEs) are used to introduce a domain-specific inductive bias.
Sinusoidal positional encodings are proposed by Vaswani et al. (2017).
Eigenvectors of the Laplacian generalize sinusoidal positional encodings to graphs.
Sinusoidal encodings sweep frequencies using a geometric series.
Graphs generalize sequences to sets of tokens/nodes with arbitrary connections.
Graph Fourier Transformation (GFT) is defined via the eigendecomposition of the combinatorial Laplacian.
Eigenvectors of the Laplacian are real-valued and have sign invariance.
Cosine Transformation Type II is a possible set of eigenvectors of the combinatorial Laplacian for a sequence.

Propose to use Magnetic Laplacian, a direction-aware generalization of combinatorial Laplacian
Encodes direction with complex numbers
Eigenvectors used for structure-aware positional encoding
Magnetic Laplacian is Hermitian matrix
Equivalent to combinatorial Laplacian for q = 0 and q ≤ 0.25
Exp iΘ (q) term encodes edge direction
Potential q determines ratio of real and imaginary part
Eigenvectors encode direction
Normalized first eigenvector minimizes conflicting edges
Propose to scale potential q with number of nodes and amount of directed edges
Choose c ∈ C by convention
MagLapNet preprocesses eigenvectors before using them as positional encodings
SignNet resolves sign ambiguity of eigenvectors

Defined two directional structure-aware positional encodings
Assessed efficacy by predicting (relative) distance measures on (directed) graphs
Hypothesized that a good positional encoding should distinguish between ancestors/successors and have a notion of distance on the graph
Predicted if a node is reachable acknowledging the edge directions
Studied prediction of adjacent nodes and directed/undirected shortest path distance
Used transformer encoder with Magnetic Laplacian, direction-aware random walk, and eigenvectors of the combinatorial Laplacian
Assessed with F1 score and Root Mean Squared Error
Results showed Magnetic Laplacian outperformed Combinatorial Laplacian for directedness, random walk encodings performed similarly to Magnetic Laplacian for validation but outperformed on test

Sorting networks are a class of comparison-based algorithms that sort any input sequence of fixed size with a static sequence of comparators.
We use this task to make the implications of symmetrization and sequentialization explicit.
We construct a dataset consisting of 800,000 training instances for equally probable sequence lengths 7 ≤ p train ≤ 11.
There exist correct and incorrect sorting networks that map to the same undirected graph.
Representing directed graphs as sequences can introduce a huge amount of arbitrary orderedness.
Magnetic Laplacian eigenvectors outperform all other positional encodings.
Sinusoidal encodings combined with the Magnetic Laplacian outperform all other encodings.
GNNs with directional information perform on par with a transformer encoder with the combination of Magnetic Laplacian and sinusoidal positional encodings.
Magnetic Laplacian eigenvectors can also help a GNN’s generalization.
Random walk encodings struggle to generalize to longer sequences and harm performance.

Function name prediction is an established task in the graph learning community
Graphs for source code used for machine learning retain the sequential connections between instructions
Open Graph Benchmark Code2 dataset represents 450,000 functions with its Abstract Syntax Tree (AST) and sequential connections
State-of-the-art model is susceptible to semantics-preserving permutation of the source code
Graph construction is inspired by Bieber et al. (2022)
Constructs a Directed Acyclic Graph (DAG) for each “block”
Connects the statements between blocks considering the control flow
Addresses the (non-) commutative properties for basic python operations via edge features
Does not reference the tokenized source code

We propose positional encodings for directed graphs based on the Magnetic Laplacian and random walks.
Both positional encodings can help transformers to gain structure awareness.
Directed graphs can lower the effective input dimensionality.
The combinatorial Laplacian is a discretization of the Laplace-Beltrami operator on Riemannian manifolds.
The eigenvectors of the combinatorial Laplacian are widely used in graph machine learning.
The Magnetic Laplacian bridges the gap using complex values.
The first eigenvector of the Magnetic Laplacian is related to the angular synchronization problem.
The eigenvectors of the Magnetic Laplacian are plotted in Fig. 3.
The eigenvectors of the Laplacian (without symmetrization) are plotted in Fig. D.1 and D.2.
Repeated eigenvalues can be a source of ambiguity in the eigenvectors.
For disconnected components, the eigenvectors and eigenvalues resolve as if we would decompose each component independently.