Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Graph Neural Networks (GNNs) can be used to compute graph queries.
- GNNs can use arbitrary real weights and a wide class of activation functions.
- GNNs with random initialization and global readout can compute the same queries as bounded depth Boolean circuits with threshold gates.
- Queries computable by a single GNN with piecewise linear activations and rational weights are definable in GFO+C without built-in relations.
Paper Content
Introduction
- Graph neural networks (GNNs) are deep learning models for graph data
- GNNs describe a distributed algorithm carrying out local computations at the vertices of the input graph
- Messages sent between vertices are vectors of reals and depend on the current state of the sender and receiver
- Vertex aggregates messages and computes new state depending on old state and aggregated messages
- GNNs are “analogue” computation models operating on and with real numbers
- Functions on graphs must be isomorphism invariant and functions on vertices must be equivariant
- Focus on Boolean functions with {0, 1}-vector input and Boolean output
- Unary queries on labelled graphs are Boolean functions
- Central result: unary query on labelled graphs is computable by polynomial-weight bounded-depth family of GNNs with rpl approximable activation functions if and only if it is definable in the guarded fragment GFO+C nu of first-order logic with counting and with built-in relations
- Strength of GNNs can be increased by extending input signals with random component
- Characterisation of TC 0 : unary query on labelled graphs is computable by polynomial-weight bounded-depth family of GNNs with random initialisation and with rpl approximable activation functions if and only if it is computable in TC 0
Related work
- GNNs and WL algorithm are related
- GNNs can distinguish two graphs if and only if WL algorithm can
- GNNs can be compared to WL algorithm to measure expressiveness
- GNNs may be exponentially large or polynomial in size
- Unary queries definable in GC are computable by GNNs
- Queries computable by GNNs may not be definable in first-order logic
- MPLang is a logic that operates directly on reals
- GNN computations can be quantified in classical complexity
- FNNs can be used to decide subsets of {0, 1}p
- Language is decidable by FNNs if and only if it is in TC 0
- Corollary 1.2 does not follow easily from FNNs result
Techniques
- Theorem 1.4 states that a query computable by a GNN with rational weights and piecewise linear activations is definable in GFO+C.
- GC ⊆ GNN ⊆ GFO+C, but both inclusions are strict.
- To prove Theorem 1.4, GNN computations must be simulated in GFO+C.
- Theorem 1.1 is proved by approximating arbitrary GNNs with rational weights and piecewise linear activations.
- The Adleman Trick is used to trade randomness for non-uniformity.
Structure of this paper
- Developed machinery for rational arithmetic in first-order logic with counting and guarded fragment
- Introduced GNNs and proved Theorem 1.4
- Proved forward and backward directions of Theorem 1.1
- Proved Theorem 1.3
Preliminaries
- Sets of integers, nonnegative integers, positive integers, rational numbers, and real numbers are denoted by Z, N, N >0 , Q, R respectively.
- Dyadic rationals are rationals whose denominator is a power of two.
- Binary representation of n ∈ N is denoted by bin(n).
- Bitsize of n is length of the binary representation.
- Tuples are denoted using boldface letters.
- Empty tuple is denoted ∅.
- Vector norm is often the ℓ ∞ -norm.
Functions and approximations
- Unique minimal representation of a linear function with minimal number of pieces
- Thresholds of linear function are points where it is nonlinear
- Rational if all parameters of minimal representation are dyadic rationals
- Bitsize of rational linear function is sum of bitsizes of all parameters
- Continuous linear function is Lipschitz continuous
- Example of rational piecewise linear function is rectified linear unit
- Notion of approximation between functions on reals allows for both additive and multiplicative error
- Function is rpl-approximable if continuous rational piecewise linear function of polynomial bitsize approximates it
- Examples of rpl-approximable functions include logistic, hyperbolic tangent, soft plus, and exponential linear units
Graphs and signals
- Graphs are used as data structures for logics and graph neural networks
- Graphs are either undirected or directed acyclic graphs (dags)
- Vertex set of all graphs is finite and nonempty
- Order of a graph is the size of its vertex set
- Bitsize of a graph is the size of its representation
- Out-degree and in-degree of a vertex in an undirected graph is denoted by
- Nodes of in-degree 0 are sources and nodes of out-degree 0 are sinks
- Depth of a node in a dag is the length of the longest path from a source to it
- Graph signals are real-valued features on vertices of graphs
- Signals can be viewed as matrices in R V (G)×ℓ
- Vector norms can be applied to graph signals
- Signals can be restricted to subsets of V (G)
Boolean circuits
- Boolean circuits are directed acyclic graphs with nodes labeled as negation, disjunction, or conjunction.
- Sources are input nodes and sinks are output nodes.
- The depth of a circuit is the maximum length of a path from an input node to an output node.
- The order of a circuit is the number of nodes and the size is the number of nodes plus the number of edges.
- Non-uniform TC 0 is the class of all languages that are decided by a family of threshold circuits of bounded depth and polynomial size.
- Standard arithmetic functions on the bit representations of natural numbers can be computed by bounded-depth polynomial-size threshold circuits.
- ADD, MUL, DIV, ITADD, and LEQ are computable by dlogtime uniform families of bounded-depth polynomial-size threshold circuits.
Feedforward neural networks
- Formalise feedforward neural networks in a similar way as Boolean circuits
- Triple V, E, (a v ) v∈V is an FNN architecture
- Weights and biases associated with edges and nodes
- Sources of the dag are input nodes, sinks are output nodes
- Define order, depth, input and output dimension
- Activation functions are Lipschitz continuous
- Output of FNN can be linearly bounded in the input
- FNNs can be piecewise linear or rational piecewise linear
Relational structures
- A vocabulary is a finite set of relation symbols.
- A structure consists of a finite set and a relation for each relation symbol.
- A graph can be viewed as an {E}-structure, where E is a binary symbol.
- A pair (G, b) is a graph with a Boolean signal.
- A query is an equivariant mapping that associates a mapping with each structure in a class.
First-order logic with counting
- Fix a vocabulary τ
- Two types of variables: vertex and number
- Vertex variables denoted by x, number variables denoted by y, z for unspecified
- Sets of FO+C-formulas and FO+C-terms defined inductively
- τ -interpretation is a pair (A, a)
- Value ⟦θ⟧ (A,a) ∈ N for each FO+C-term θ and Boolean value ⟦ϕ⟧ (A,a) ∈ {0, 1} for each FO+C-formula ϕ
- Free variables of an FO+C-expression ξ defined inductively
- Closed term is a term without free variables, sentence is a formula without free variables
- FO+C-formula ϕ(x 1 , . . . , x k ) defines a k-ary query on the class of τ -structures
- Standard arithmetical and logical operators used as abbreviations
- FO+C corresponds to FOCN({P ≤ })
- All FO+C-terms are polynomially bounded
Descriptive complexity
- FO+C is related to the complexity class TC 0
- Structures need to be expanded by a linear order of the vertex set
- A binary relation symbol ⩽ is introduced
- An ordered τ-structure is a τ ∪ {⩽}-structure
- There is a simple canonical representation of ordered structures as bitstrings
- Theorem 3.3 states that L(C) is in uniform TC 0 if and only if there is an FO+C sentence
- Unary queries can be defined in order-invariant FO+C
Non-uniformity and built-in relations
- Non-uniformity in descriptive complexity is captured by adding built-in relations.
- Built-in numerical relations are subsets of Nk for some k ≥ 0.
- FO+C nu captures (non-uniform) TC 0.
Types and second-order variables
- Counting extension of first-order logic refers to a 2-sorted extension of relational structures
- Variables have types: vertex (v) and number (n)
- Relation variables (X, Y) have type {t}
- Function variables (U, V) have type t → n
- Atomic formulas X(ξ 1 , . . . , ξ k ) and terms U (ξ, . . . , ξ k )
- FO+C does not allow quantification over relation and function variables
- Arithmetic on bitwise representations of integers is expressible in FO+C
- Lemma 3.2 does not hold if term θ contains function variables
- Lemma 3.7 (Bennett [5]): FO+C-formula bit(y, y ′ ) and FO+C-term len(y)
- FO+C-term bin(n) ≤ n for all n ≥ 1
- Formula ϕ(y, z) defines a function z ↦ y
- Trick to obtain a term η(z) such that ⟦η⟧ A (c) = f A (c)
- Relation variable Y of type n to encode binary representations of natural numbers
- 0-ary function variable U to specify bound on number of bits
- Formula χ and term θ to specify numbers
- Lemma 3.9: arithmetical FO+C-formulas add, sub and arithmetical FO+C-terms bd-add, bd-sub
- Lemma 3.10: arithmetical FO+C-formula s-itadd and arithmetical FO+C-term bd-s-itadd3
- Encoding of sequences by natural numbers
Rational arithmetic
- Need to lift results to arithmetic on rational numbers
- Problem with iterated addition: denominator of sum can get too large
- Work with arithmetic on dyadic rationals
- Represent dyadic rationals by tuples (r, I, s, t)
- Representation is not unique
- Each dyadic rational has a unique representation (crep)
- Representation needs four variables
- Introduce shortcuts (r-schemas and r-expressions)
- Lemma 3.18: arithmetical r-expression to define canonical representation
- Lemma 3.19: arithmetical r-expressions for add, sub, and mul
- Lemma 3.20: r-expressions for iterated addition, max, and min
- Lemma 3.21: arithmetical r-expression for division
Evaluating feedforward neural networks
- We can simulate rational piecewise-linear FNNs
- We need an integer k and three families of dyadic rationals
- We define an L-schema and an F-schema
- We define an arithmetical r-expression
- We define an FO+C-formula
- We define a decomposable FO 2 +C-formula
- We define a bijection between vertices and initial segment of N
- We define a distinguished number variable for every vertex variable
- We define a guarded fragment GFO+C
- We define an intermediate fragment GFO+C gc
Useful bounds
- Lemma 4.2 states that there is a constant γ that can be used to prove a local bound for all graphs, signals, and vertices.
- Lemma 4.3 states that there is a constant λ that can be used to prove a bound for all graphs, signals, and vertices.
- Lemma 2.5 provides constants γ msg and γ comb that can be used to prove the local bound.