Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Graph Neural Networks (GNNs) can be used to compute graph queries.
GNNs can use arbitrary real weights and a wide class of activation functions.
GNNs with random initialization and global readout can compute the same queries as bounded depth Boolean circuits with threshold gates.
Queries computable by a single GNN with piecewise linear activations and rational weights are definable in GFO+C without built-in relations.

Paper Content

Introduction

Graph neural networks (GNNs) are deep learning models for graph data
GNNs describe a distributed algorithm carrying out local computations at the vertices of the input graph
Messages sent between vertices are vectors of reals and depend on the current state of the sender and receiver
Vertex aggregates messages and computes new state depending on old state and aggregated messages
GNNs are “analogue” computation models operating on and with real numbers
Functions on graphs must be isomorphism invariant and functions on vertices must be equivariant
Focus on Boolean functions with {0, 1}-vector input and Boolean output
Unary queries on labelled graphs are Boolean functions
Central result: unary query on labelled graphs is computable by polynomial-weight bounded-depth family of GNNs with rpl approximable activation functions if and only if it is definable in the guarded fragment GFO+C nu of first-order logic with counting and with built-in relations
Strength of GNNs can be increased by extending input signals with random component
Characterisation of TC 0 : unary query on labelled graphs is computable by polynomial-weight bounded-depth family of GNNs with random initialisation and with rpl approximable activation functions if and only if it is computable in TC 0

GNNs and WL algorithm are related
GNNs can distinguish two graphs if and only if WL algorithm can
GNNs can be compared to WL algorithm to measure expressiveness
GNNs may be exponentially large or polynomial in size
Unary queries definable in GC are computable by GNNs
Queries computable by GNNs may not be definable in first-order logic
MPLang is a logic that operates directly on reals
GNN computations can be quantified in classical complexity
FNNs can be used to decide subsets of {0, 1}p
Language is decidable by FNNs if and only if it is in TC 0
Corollary 1.2 does not follow easily from FNNs result

Techniques

Theorem 1.4 states that a query computable by a GNN with rational weights and piecewise linear activations is definable in GFO+C.
GC ⊆ GNN ⊆ GFO+C, but both inclusions are strict.
To prove Theorem 1.4, GNN computations must be simulated in GFO+C.
Theorem 1.1 is proved by approximating arbitrary GNNs with rational weights and piecewise linear activations.
The Adleman Trick is used to trade randomness for non-uniformity.

Structure of this paper

Developed machinery for rational arithmetic in first-order logic with counting and guarded fragment
Introduced GNNs and proved Theorem 1.4
Proved forward and backward directions of Theorem 1.1
Proved Theorem 1.3

Preliminaries

Sets of integers, nonnegative integers, positive integers, rational numbers, and real numbers are denoted by Z, N, N >0 , Q, R respectively.
Dyadic rationals are rationals whose denominator is a power of two.
Binary representation of n ∈ N is denoted by bin(n).
Bitsize of n is length of the binary representation.
Tuples are denoted using boldface letters.
Empty tuple is denoted ∅.
Vector norm is often the ℓ ∞ -norm.

Functions and approximations

Unique minimal representation of a linear function with minimal number of pieces
Thresholds of linear function are points where it is nonlinear
Rational if all parameters of minimal representation are dyadic rationals
Bitsize of rational linear function is sum of bitsizes of all parameters
Continuous linear function is Lipschitz continuous
Example of rational piecewise linear function is rectified linear unit
Notion of approximation between functions on reals allows for both additive and multiplicative error
Function is rpl-approximable if continuous rational piecewise linear function of polynomial bitsize approximates it
Examples of rpl-approximable functions include logistic, hyperbolic tangent, soft plus, and exponential linear units

Graphs and signals

Graphs are used as data structures for logics and graph neural networks
Graphs are either undirected or directed acyclic graphs (dags)
Vertex set of all graphs is finite and nonempty
Order of a graph is the size of its vertex set
Bitsize of a graph is the size of its representation
Out-degree and in-degree of a vertex in an undirected graph is denoted by
Nodes of in-degree 0 are sources and nodes of out-degree 0 are sinks
Depth of a node in a dag is the length of the longest path from a source to it
Graph signals are real-valued features on vertices of graphs
Signals can be viewed as matrices in R V (G)×ℓ
Vector norms can be applied to graph signals
Signals can be restricted to subsets of V (G)

Boolean circuits

Boolean circuits are directed acyclic graphs with nodes labeled as negation, disjunction, or conjunction.
Sources are input nodes and sinks are output nodes.
The depth of a circuit is the maximum length of a path from an input node to an output node.
The order of a circuit is the number of nodes and the size is the number of nodes plus the number of edges.
Non-uniform TC 0 is the class of all languages that are decided by a family of threshold circuits of bounded depth and polynomial size.
Standard arithmetic functions on the bit representations of natural numbers can be computed by bounded-depth polynomial-size threshold circuits.
ADD, MUL, DIV, ITADD, and LEQ are computable by dlogtime uniform families of bounded-depth polynomial-size threshold circuits.

Feedforward neural networks

Formalise feedforward neural networks in a similar way as Boolean circuits
Triple V, E, (a v ) v∈V is an FNN architecture
Weights and biases associated with edges and nodes
Sources of the dag are input nodes, sinks are output nodes
Define order, depth, input and output dimension
Activation functions are Lipschitz continuous
Output of FNN can be linearly bounded in the input
FNNs can be piecewise linear or rational piecewise linear

Relational structures

A vocabulary is a finite set of relation symbols.
A structure consists of a finite set and a relation for each relation symbol.
A graph can be viewed as an {E}-structure, where E is a binary symbol.
A pair (G, b) is a graph with a Boolean signal.
A query is an equivariant mapping that associates a mapping with each structure in a class.

First-order logic with counting

Fix a vocabulary τ
Two types of variables: vertex and number
Vertex variables denoted by x, number variables denoted by y, z for unspecified
Sets of FO+C-formulas and FO+C-terms defined inductively
τ -interpretation is a pair (A, a)
Value ⟦θ⟧ (A,a) ∈ N for each FO+C-term θ and Boolean value ⟦ϕ⟧ (A,a) ∈ {0, 1} for each FO+C-formula ϕ
Free variables of an FO+C-expression ξ defined inductively
Closed term is a term without free variables, sentence is a formula without free variables
FO+C-formula ϕ(x 1 , . . . , x k ) defines a k-ary query on the class of τ -structures
Standard arithmetical and logical operators used as abbreviations
FO+C corresponds to FOCN({P ≤ })
All FO+C-terms are polynomially bounded

Descriptive complexity

FO+C is related to the complexity class TC 0
Structures need to be expanded by a linear order of the vertex set
A binary relation symbol ⩽ is introduced
An ordered τ-structure is a τ ∪ {⩽}-structure
There is a simple canonical representation of ordered structures as bitstrings
Theorem 3.3 states that L(C) is in uniform TC 0 if and only if there is an FO+C sentence
Unary queries can be defined in order-invariant FO+C

Non-uniformity and built-in relations

Non-uniformity in descriptive complexity is captured by adding built-in relations.
Built-in numerical relations are subsets of Nk for some k ≥ 0.
FO+C nu captures (non-uniform) TC 0.

Types and second-order variables

Counting extension of first-order logic refers to a 2-sorted extension of relational structures
Variables have types: vertex (v) and number (n)
Relation variables (X, Y) have type {t}
Function variables (U, V) have type t → n
Atomic formulas X(ξ 1 , . . . , ξ k ) and terms U (ξ, . . . , ξ k )
FO+C does not allow quantification over relation and function variables
Arithmetic on bitwise representations of integers is expressible in FO+C
Lemma 3.2 does not hold if term θ contains function variables
Lemma 3.7 (Bennett [5]): FO+C-formula bit(y, y ′ ) and FO+C-term len(y)
FO+C-term bin(n) ≤ n for all n ≥ 1
Formula ϕ(y, z) defines a function z ↦ y
Trick to obtain a term η(z) such that ⟦η⟧ A (c) = f A (c)
Relation variable Y of type n to encode binary representations of natural numbers
0-ary function variable U to specify bound on number of bits
Formula χ and term θ to specify numbers
Lemma 3.9: arithmetical FO+C-formulas add, sub and arithmetical FO+C-terms bd-add, bd-sub
Lemma 3.10: arithmetical FO+C-formula s-itadd and arithmetical FO+C-term bd-s-itadd3
Encoding of sequences by natural numbers

Rational arithmetic

Need to lift results to arithmetic on rational numbers
Problem with iterated addition: denominator of sum can get too large
Work with arithmetic on dyadic rationals
Represent dyadic rationals by tuples (r, I, s, t)
Representation is not unique
Each dyadic rational has a unique representation (crep)
Representation needs four variables
Introduce shortcuts (r-schemas and r-expressions)
Lemma 3.18: arithmetical r-expression to define canonical representation
Lemma 3.19: arithmetical r-expressions for add, sub, and mul
Lemma 3.20: r-expressions for iterated addition, max, and min
Lemma 3.21: arithmetical r-expression for division

Evaluating feedforward neural networks

We can simulate rational piecewise-linear FNNs
We need an integer k and three families of dyadic rationals
We define an L-schema and an F-schema
We define an arithmetical r-expression
We define an FO+C-formula
We define a decomposable FO 2 +C-formula
We define a bijection between vertices and initial segment of N
We define a distinguished number variable for every vertex variable
We define a guarded fragment GFO+C
We define an intermediate fragment GFO+C gc

Useful bounds

Lemma 4.2 states that there is a constant γ that can be used to prove a local bound for all graphs, signals, and vertices.
Lemma 4.3 states that there is a constant λ that can be used to prove a bound for all graphs, signals, and vertices.
Lemma 2.5 provides constants γ msg and γ comb that can be used to prove the local bound.

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Techniques#

Structure of this paper#

Preliminaries#

Functions and approximations#

Graphs and signals#

Boolean circuits#

Feedforward neural networks#

Relational structures#

First-order logic with counting#

Descriptive complexity#

Non-uniformity and built-in relations#

Types and second-order variables#

Rational arithmetic#

Evaluating feedforward neural networks#

Useful bounds#

Link to paper

Abstract

Paper Content

Introduction

Related work

Techniques

Structure of this paper

Preliminaries

Functions and approximations

Graphs and signals

Boolean circuits

Feedforward neural networks

Relational structures

First-order logic with counting

Descriptive complexity

Non-uniformity and built-in relations

Types and second-order variables

Rational arithmetic

Evaluating feedforward neural networks

Useful bounds