Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Transformers are used for a variety of modalities, including images, audio, video, and undirected graphs.
  • Transformers for directed graphs are an underexplored topic.
  • Two direction- and structure-aware positional encodings for directed graphs are proposed.
  • The extra directionality information is useful for downstream tasks.
  • Model outperforms prior state of the art on Open Graph Benchmark Code2 by 14.7%.

Paper Content

Introduction

  • Transformers are used in many machine learning models
  • Transformers are used for competitive programming, question answering, and combinatorial optimization
  • Transformers have been used for graph learning tasks
  • Most prior works focus on undirected graphs, but directed graphs are important
  • Attention mechanism needs to be aware of graph structure
  • Positional encodings can incorporate structural information
  • Directional positional encodings are proposed
  • Directional random walk encodings are proposed
  • Directional positional encodings are predictive for different distance measures
  • Directional positional encodings can improve GNNs
  • Directed graphs can reduce input dimensionality
  • Representing source code as a sequence is standard
  • Contributions: connection between sinusoidal positional encodings and Laplacian, spectral positional encodings, random walk positional encodings, predictiveness of structure-aware positional encodings, correctness prediction for sorting networks, modeling sequence of program statements as directed graph, new state of the art on OGB Code2 dataset

Sinusoidal and laplacian encodings

  • Positional Encodings (PEs) are used to introduce a domain-specific inductive bias.
  • Sinusoidal positional encodings are proposed by Vaswani et al. (2017).
  • Eigenvectors of the Laplacian generalize sinusoidal positional encodings to graphs.
  • Sinusoidal encodings sweep frequencies using a geometric series.
  • Graphs generalize sequences to sets of tokens/nodes with arbitrary connections.
  • Graph Fourier Transformation (GFT) is defined via the eigendecomposition of the combinatorial Laplacian.
  • Eigenvectors of the Laplacian are real-valued and have sign invariance.
  • Cosine Transformation Type II is a possible set of eigenvectors of the combinatorial Laplacian for a sequence.

Directional spectral encodings

  • Propose to use Magnetic Laplacian, a direction-aware generalization of combinatorial Laplacian
  • Encodes direction with complex numbers
  • Eigenvectors used for structure-aware positional encoding
  • Magnetic Laplacian is Hermitian matrix
  • Equivalent to combinatorial Laplacian for q = 0 and q ≤ 0.25
  • Exp iΘ (q) term encodes edge direction
  • Potential q determines ratio of real and imaginary part
  • Eigenvectors encode direction
  • Normalized first eigenvector minimizes conflicting edges
  • Propose to scale potential q with number of nodes and amount of directed edges
  • Choose c ∈ C by convention
  • MagLapNet preprocesses eigenvectors before using them as positional encodings
  • SignNet resolves sign ambiguity of eigenvectors

Directional random walks

  • Random walks can be used to encode node positions in a graph.
  • Random walks can improve the expressiveness of GNNs.
  • Random walks can be applied to directed graphs, but come with caveats.
  • Random walks are best suited for capturing short distances.

Positional encodings playground

  • Defined two directional structure-aware positional encodings
  • Assessed efficacy by predicting (relative) distance measures on (directed) graphs
  • Hypothesized that a good positional encoding should distinguish between ancestors/successors and have a notion of distance on the graph
  • Predicted if a node is reachable acknowledging the edge directions
  • Studied prediction of adjacent nodes and directed/undirected shortest path distance
  • Used transformer encoder with Magnetic Laplacian, direction-aware random walk, and eigenvectors of the combinatorial Laplacian
  • Assessed with F1 score and Root Mean Squared Error
  • Results showed Magnetic Laplacian outperformed Combinatorial Laplacian for directedness, random walk encodings performed similarly to Magnetic Laplacian for validation but outperformed on test

Application: sorting networks

  • Sorting networks are a class of comparison-based algorithms that sort any input sequence of fixed size with a static sequence of comparators.
  • We use this task to make the implications of symmetrization and sequentialization explicit.
  • We construct a dataset consisting of 800,000 training instances for equally probable sequence lengths 7 ≤ p train ≤ 11.
  • There exist correct and incorrect sorting networks that map to the same undirected graph.
  • Representing directed graphs as sequences can introduce a huge amount of arbitrary orderedness.
  • Magnetic Laplacian eigenvectors outperform all other positional encodings.
  • Sinusoidal encodings combined with the Magnetic Laplacian outperform all other encodings.
  • GNNs with directional information perform on par with a transformer encoder with the combination of Magnetic Laplacian and sinusoidal positional encodings.
  • Magnetic Laplacian eigenvectors can also help a GNN’s generalization.
  • Random walk encodings struggle to generalize to longer sequences and harm performance.

Application: function name prediction

  • Function name prediction is an established task in the graph learning community
  • Graphs for source code used for machine learning retain the sequential connections between instructions
  • Open Graph Benchmark Code2 dataset represents 450,000 functions with its Abstract Syntax Tree (AST) and sequential connections
  • State-of-the-art model is susceptible to semantics-preserving permutation of the source code
  • Graph construction is inspired by Bieber et al. (2022)
  • Constructs a Directed Acyclic Graph (DAG) for each “block”
  • Connects the statements between blocks considering the control flow
  • Addresses the (non-) commutative properties for basic python operations via edge features
  • Does not reference the tokenized source code
  • Traditional graph metrics used for positional encodings
  • Relative positional encodings used in Zügner et al. (2021)
  • Sinusoidal positional encodings proposed by Luo (2022)
  • Spectral encodings based on SVD used by Hussain et al. (2022)

Conclusion

  • We propose positional encodings for directed graphs based on the Magnetic Laplacian and random walks.
  • Both positional encodings can help transformers to gain structure awareness.
  • Directed graphs can lower the effective input dimensionality.
  • The combinatorial Laplacian is a discretization of the Laplace-Beltrami operator on Riemannian manifolds.
  • The eigenvectors of the combinatorial Laplacian are widely used in graph machine learning.
  • The Magnetic Laplacian bridges the gap using complex values.
  • The first eigenvector of the Magnetic Laplacian is related to the angular synchronization problem.
  • The eigenvectors of the Magnetic Laplacian are plotted in Fig. 3.
  • The eigenvectors of the Laplacian (without symmetrization) are plotted in Fig. D.1 and D.2.
  • Repeated eigenvalues can be a source of ambiguity in the eigenvectors.
  • For disconnected components, the eigenvectors and eigenvalues resolve as if we would decompose each component independently.