Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Graph Neural Networks (GNNs) can be used to compute graph queries.
  • GNNs can use arbitrary real weights and a wide class of activation functions.
  • GNNs with random initialization and global readout can compute the same queries as bounded depth Boolean circuits with threshold gates.
  • Queries computable by a single GNN with piecewise linear activations and rational weights are definable in GFO+C without built-in relations.

Paper Content

Introduction

  • Graph neural networks (GNNs) are deep learning models for graph data
  • GNNs describe a distributed algorithm carrying out local computations at the vertices of the input graph
  • Messages sent between vertices are vectors of reals and depend on the current state of the sender and receiver
  • Vertex aggregates messages and computes new state depending on old state and aggregated messages
  • GNNs are “analogue” computation models operating on and with real numbers
  • Functions on graphs must be isomorphism invariant and functions on vertices must be equivariant
  • Focus on Boolean functions with {0, 1}-vector input and Boolean output
  • Unary queries on labelled graphs are Boolean functions
  • Central result: unary query on labelled graphs is computable by polynomial-weight bounded-depth family of GNNs with rpl approximable activation functions if and only if it is definable in the guarded fragment GFO+C nu of first-order logic with counting and with built-in relations
  • Strength of GNNs can be increased by extending input signals with random component
  • Characterisation of TC 0 : unary query on labelled graphs is computable by polynomial-weight bounded-depth family of GNNs with random initialisation and with rpl approximable activation functions if and only if it is computable in TC 0
  • GNNs and WL algorithm are related
  • GNNs can distinguish two graphs if and only if WL algorithm can
  • GNNs can be compared to WL algorithm to measure expressiveness
  • GNNs may be exponentially large or polynomial in size
  • Unary queries definable in GC are computable by GNNs
  • Queries computable by GNNs may not be definable in first-order logic
  • MPLang is a logic that operates directly on reals
  • GNN computations can be quantified in classical complexity
  • FNNs can be used to decide subsets of {0, 1}p
  • Language is decidable by FNNs if and only if it is in TC 0
  • Corollary 1.2 does not follow easily from FNNs result

Techniques

  • Theorem 1.4 states that a query computable by a GNN with rational weights and piecewise linear activations is definable in GFO+C.
  • GC ⊆ GNN ⊆ GFO+C, but both inclusions are strict.
  • To prove Theorem 1.4, GNN computations must be simulated in GFO+C.
  • Theorem 1.1 is proved by approximating arbitrary GNNs with rational weights and piecewise linear activations.
  • The Adleman Trick is used to trade randomness for non-uniformity.

Structure of this paper

  • Developed machinery for rational arithmetic in first-order logic with counting and guarded fragment
  • Introduced GNNs and proved Theorem 1.4
  • Proved forward and backward directions of Theorem 1.1
  • Proved Theorem 1.3

Preliminaries

  • Sets of integers, nonnegative integers, positive integers, rational numbers, and real numbers are denoted by Z, N, N >0 , Q, R respectively.
  • Dyadic rationals are rationals whose denominator is a power of two.
  • Binary representation of n ∈ N is denoted by bin(n).
  • Bitsize of n is length of the binary representation.
  • Tuples are denoted using boldface letters.
  • Empty tuple is denoted ∅.
  • Vector norm is often the ℓ ∞ -norm.

Functions and approximations

  • Unique minimal representation of a linear function with minimal number of pieces
  • Thresholds of linear function are points where it is nonlinear
  • Rational if all parameters of minimal representation are dyadic rationals
  • Bitsize of rational linear function is sum of bitsizes of all parameters
  • Continuous linear function is Lipschitz continuous
  • Example of rational piecewise linear function is rectified linear unit
  • Notion of approximation between functions on reals allows for both additive and multiplicative error
  • Function is rpl-approximable if continuous rational piecewise linear function of polynomial bitsize approximates it
  • Examples of rpl-approximable functions include logistic, hyperbolic tangent, soft plus, and exponential linear units

Graphs and signals

  • Graphs are used as data structures for logics and graph neural networks
  • Graphs are either undirected or directed acyclic graphs (dags)
  • Vertex set of all graphs is finite and nonempty
  • Order of a graph is the size of its vertex set
  • Bitsize of a graph is the size of its representation
  • Out-degree and in-degree of a vertex in an undirected graph is denoted by
  • Nodes of in-degree 0 are sources and nodes of out-degree 0 are sinks
  • Depth of a node in a dag is the length of the longest path from a source to it
  • Graph signals are real-valued features on vertices of graphs
  • Signals can be viewed as matrices in R V (G)×ℓ
  • Vector norms can be applied to graph signals
  • Signals can be restricted to subsets of V (G)

Boolean circuits

  • Boolean circuits are directed acyclic graphs with nodes labeled as negation, disjunction, or conjunction.
  • Sources are input nodes and sinks are output nodes.
  • The depth of a circuit is the maximum length of a path from an input node to an output node.
  • The order of a circuit is the number of nodes and the size is the number of nodes plus the number of edges.
  • Non-uniform TC 0 is the class of all languages that are decided by a family of threshold circuits of bounded depth and polynomial size.
  • Standard arithmetic functions on the bit representations of natural numbers can be computed by bounded-depth polynomial-size threshold circuits.
  • ADD, MUL, DIV, ITADD, and LEQ are computable by dlogtime uniform families of bounded-depth polynomial-size threshold circuits.

Feedforward neural networks

  • Formalise feedforward neural networks in a similar way as Boolean circuits
  • Triple V, E, (a v ) v∈V is an FNN architecture
  • Weights and biases associated with edges and nodes
  • Sources of the dag are input nodes, sinks are output nodes
  • Define order, depth, input and output dimension
  • Activation functions are Lipschitz continuous
  • Output of FNN can be linearly bounded in the input
  • FNNs can be piecewise linear or rational piecewise linear

Relational structures

  • A vocabulary is a finite set of relation symbols.
  • A structure consists of a finite set and a relation for each relation symbol.
  • A graph can be viewed as an {E}-structure, where E is a binary symbol.
  • A pair (G, b) is a graph with a Boolean signal.
  • A query is an equivariant mapping that associates a mapping with each structure in a class.

First-order logic with counting

  • Fix a vocabulary τ
  • Two types of variables: vertex and number
  • Vertex variables denoted by x, number variables denoted by y, z for unspecified
  • Sets of FO+C-formulas and FO+C-terms defined inductively
  • τ -interpretation is a pair (A, a)
  • Value ⟦θ⟧ (A,a) ∈ N for each FO+C-term θ and Boolean value ⟦ϕ⟧ (A,a) ∈ {0, 1} for each FO+C-formula ϕ
  • Free variables of an FO+C-expression ξ defined inductively
  • Closed term is a term without free variables, sentence is a formula without free variables
  • FO+C-formula ϕ(x 1 , . . . , x k ) defines a k-ary query on the class of τ -structures
  • Standard arithmetical and logical operators used as abbreviations
  • FO+C corresponds to FOCN({P ≤ })
  • All FO+C-terms are polynomially bounded

Descriptive complexity

  • FO+C is related to the complexity class TC 0
  • Structures need to be expanded by a linear order of the vertex set
  • A binary relation symbol ⩽ is introduced
  • An ordered τ-structure is a τ ∪ {⩽}-structure
  • There is a simple canonical representation of ordered structures as bitstrings
  • Theorem 3.3 states that L(C) is in uniform TC 0 if and only if there is an FO+C sentence
  • Unary queries can be defined in order-invariant FO+C

Non-uniformity and built-in relations

  • Non-uniformity in descriptive complexity is captured by adding built-in relations.
  • Built-in numerical relations are subsets of Nk for some k ≥ 0.
  • FO+C nu captures (non-uniform) TC 0.

Types and second-order variables

  • Counting extension of first-order logic refers to a 2-sorted extension of relational structures
  • Variables have types: vertex (v) and number (n)
  • Relation variables (X, Y) have type {t}
  • Function variables (U, V) have type t → n
  • Atomic formulas X(ξ 1 , . . . , ξ k ) and terms U (ξ, . . . , ξ k )
  • FO+C does not allow quantification over relation and function variables
  • Arithmetic on bitwise representations of integers is expressible in FO+C
  • Lemma 3.2 does not hold if term θ contains function variables
  • Lemma 3.7 (Bennett [5]): FO+C-formula bit(y, y ′ ) and FO+C-term len(y)
  • FO+C-term bin(n) ≤ n for all n ≥ 1
  • Formula ϕ(y, z) defines a function z ↦ y
  • Trick to obtain a term η(z) such that ⟦η⟧ A (c) = f A (c)
  • Relation variable Y of type n to encode binary representations of natural numbers
  • 0-ary function variable U to specify bound on number of bits
  • Formula χ and term θ to specify numbers
  • Lemma 3.9: arithmetical FO+C-formulas add, sub and arithmetical FO+C-terms bd-add, bd-sub
  • Lemma 3.10: arithmetical FO+C-formula s-itadd and arithmetical FO+C-term bd-s-itadd3
  • Encoding of sequences by natural numbers

Rational arithmetic

  • Need to lift results to arithmetic on rational numbers
  • Problem with iterated addition: denominator of sum can get too large
  • Work with arithmetic on dyadic rationals
  • Represent dyadic rationals by tuples (r, I, s, t)
  • Representation is not unique
  • Each dyadic rational has a unique representation (crep)
  • Representation needs four variables
  • Introduce shortcuts (r-schemas and r-expressions)
  • Lemma 3.18: arithmetical r-expression to define canonical representation
  • Lemma 3.19: arithmetical r-expressions for add, sub, and mul
  • Lemma 3.20: r-expressions for iterated addition, max, and min
  • Lemma 3.21: arithmetical r-expression for division

Evaluating feedforward neural networks

  • We can simulate rational piecewise-linear FNNs
  • We need an integer k and three families of dyadic rationals
  • We define an L-schema and an F-schema
  • We define an arithmetical r-expression
  • We define an FO+C-formula
  • We define a decomposable FO 2 +C-formula
  • We define a bijection between vertices and initial segment of N
  • We define a distinguished number variable for every vertex variable
  • We define a guarded fragment GFO+C
  • We define an intermediate fragment GFO+C gc

Useful bounds

  • Lemma 4.2 states that there is a constant γ that can be used to prove a local bound for all graphs, signals, and vertices.
  • Lemma 4.3 states that there is a constant λ that can be used to prove a bound for all graphs, signals, and vertices.
  • Lemma 2.5 provides constants γ msg and γ comb that can be used to prove the local bound.