Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Node features of graph neural networks become more similar with increased network depth, known as over-smoothing.
Definition of over-smoothing is unified and new quantitative measures are introduced.
Over-smoothing is demonstrated empirically on different graphs.
Approaches for mitigating over-smoothing are reviewed and tested on real-world datasets.
Mitigating over-smoothing is necessary but not sufficient for building deep GNNs.
Definition of over-smoothing is extended to continuous-time GNNs.

Graph Neural Networks (GNNs) are powerful tools for learning on relational and interaction data
GNNs have been applied to a variety of tasks, such as computer vision, recommender systems, transportation, etc.
The number of layers in a neural network is often considered to be important for performance
GNNs are often relatively shallow and have few layers
Issues impairing the performance of deep GNNs include graph bottlenecks, over-squashing, and over-smoothing
Over-smoothing is when node features converge to the same constant value as the number of layers increases
Recent literature has focused on defining over-smoothing through measures of node feature similarities
This article aims to unify existing approaches and define over-smoothing in a formal and tractable manner
It also reviews approaches to mitigate over-smoothing and provides empirical evaluation

G is an undirected graph with v nodes and e edges
Each node has an m-dimensional feature vector
Message-Passing GNN updates the node features
Local (1-neighborhood) coupling of the form (F(X, G)) i = F(X i , X j∈Ni )
Over-smoothing is defined as layer-wise exponential convergence of the node-similarity measure to zero
Node-similarity measure must satisfy triangle inequality and subadditivity
Definition can be generalized to disconnected graphs

Existing approaches to measure over-smoothing in deep GNNs are based on Dirichlet energy on graphs
Dirichlet energy is a node-similarity measure that satisfies conditions 1 and 2 of the over-smoothing definition
Mean Average Distance (MAD) is not a node-similarity measure and is always zero in the scalar case
MAD does converge exponentially to zero for increasing number of layers if the GNN over-smooths
Dirichlet energy should be favored over MAD
Other measures that constitute a node-similarity measure can be used
Rusch et al. (2022) have empirically demonstrated the qualitative behavior described in Definition 1
Exponential convergence of the layer-wise over-smoothing measure is necessary for GNN to suffer from over-smoothing

Normalization and regularization can reduce over-smoothing in deep GNNs
Explicit regularization techniques measure over-smoothing using Dirichlet energy
Implicit regularization techniques add noise to the optimization process
Change of GNN dynamics can mitigate over-smoothing
Residual connections can be added to deep GNNs
Residual connections aggregate all node features of every layer of a deep GNN at the final layer
Residual connections can lead to major improvements over competing methods

Over-smoothing can be mitigated by several methods
Adding a bias vector to a deep GCN with shared parameters can keep the layer-wise Dirichlet energy constant
Keeping the Dirichlet energy constant is not enough to construct well performing deep GNNs
PairNorm exhibits an approximately constant layer-wise Dirichlet energy but its performance drops exponentially
G2-GCN keeps the node-similarity measure approximately constant and increases its expressive power for increasing number of layers

GNNs are continuous in depth
Message-passing propagation is modeled by differential equations
Different vector fields yield different architectures
Over-smoothing is defined as exponential convergence of node-similarity measure to zero

Stacking multiple message-passing layers is necessary to process relational data with long-range interactions.
Over-smoothing is a central challenge in constructing deep GNNs.
An axiomatic definition of over-smoothing is provided.
Measures for over-smoothing are tested on three different graph datasets.
Approaches to mitigate over-smoothing are reviewed and tested.
Balancing the ability to mitigate over-smoothing and expressive power is necessary.