Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Node features of graph neural networks become more similar with increased network depth, known as over-smoothing.
  • Definition of over-smoothing is unified and new quantitative measures are introduced.
  • Over-smoothing is demonstrated empirically on different graphs.
  • Approaches for mitigating over-smoothing are reviewed and tested on real-world datasets.
  • Mitigating over-smoothing is necessary but not sufficient for building deep GNNs.
  • Definition of over-smoothing is extended to continuous-time GNNs.

Paper Content

Introduction

  • Graph Neural Networks (GNNs) are powerful tools for learning on relational and interaction data
  • GNNs have been applied to a variety of tasks, such as computer vision, recommender systems, transportation, etc.
  • The number of layers in a neural network is often considered to be important for performance
  • GNNs are often relatively shallow and have few layers
  • Issues impairing the performance of deep GNNs include graph bottlenecks, over-squashing, and over-smoothing
  • Over-smoothing is when node features converge to the same constant value as the number of layers increases
  • Recent literature has focused on defining over-smoothing through measures of node feature similarities
  • This article aims to unify existing approaches and define over-smoothing in a formal and tractable manner
  • It also reviews approaches to mitigate over-smoothing and provides empirical evaluation

Definition of over-smoothing

  • G is an undirected graph with v nodes and e edges
  • Each node has an m-dimensional feature vector
  • Message-Passing GNN updates the node features
  • Local (1-neighborhood) coupling of the form (F(X, G)) i = F(X i , X jโˆˆNi )
  • Over-smoothing is defined as layer-wise exponential convergence of the node-similarity measure to zero
  • Node-similarity measure must satisfy triangle inequality and subadditivity
  • Definition can be generalized to disconnected graphs

Over-smoothing measures

  • Existing approaches to measure over-smoothing in deep GNNs are based on Dirichlet energy on graphs
  • Dirichlet energy is a node-similarity measure that satisfies conditions 1 and 2 of the over-smoothing definition
  • Mean Average Distance (MAD) is not a node-similarity measure and is always zero in the scalar case
  • MAD does converge exponentially to zero for increasing number of layers if the GNN over-smooths
  • Dirichlet energy should be favored over MAD
  • Other measures that constitute a node-similarity measure can be used
  • Rusch et al. (2022) have empirically demonstrated the qualitative behavior described in Definition 1
  • Exponential convergence of the layer-wise over-smoothing measure is necessary for GNN to suffer from over-smoothing

Methods

  • Normalization and regularization can reduce over-smoothing in deep GNNs
  • Explicit regularization techniques measure over-smoothing using Dirichlet energy
  • Implicit regularization techniques add noise to the optimization process
  • Change of GNN dynamics can mitigate over-smoothing
  • Residual connections can be added to deep GNNs
  • Residual connections aggregate all node features of every layer of a deep GNN at the final layer
  • Residual connections can lead to major improvements over competing methods

Empirical evaluation

  • Evaluated effectiveness of methods to mitigate over-smoothing in deep GNNs
  • Used 3 different graphs (Texas, Cora, Cornell5)
  • Fixed one node-similarity measure (Dirichlet energy)
  • DropEdge-GCN and Res-GCN suffer from over-smoothing
  • Other methods keep layer-wise Dirichlet energy approximately constant

Risk of sacrificing expressivity to mitigate over-smoothing

  • Over-smoothing can be mitigated by several methods
  • Adding a bias vector to a deep GCN with shared parameters can keep the layer-wise Dirichlet energy constant
  • Keeping the Dirichlet energy constant is not enough to construct well performing deep GNNs
  • PairNorm exhibits an approximately constant layer-wise Dirichlet energy but its performance drops exponentially
  • G2-GCN keeps the node-similarity measure approximately constant and increases its expressive power for increasing number of layers

Extension to continuous-time gnns

  • GNNs are continuous in depth
  • Message-passing propagation is modeled by differential equations
  • Different vector fields yield different architectures
  • Over-smoothing is defined as exponential convergence of node-similarity measure to zero

Conclusion

  • Stacking multiple message-passing layers is necessary to process relational data with long-range interactions.
  • Over-smoothing is a central challenge in constructing deep GNNs.
  • An axiomatic definition of over-smoothing is provided.
  • Measures for over-smoothing are tested on three different graph datasets.
  • Approaches to mitigate over-smoothing are reviewed and tested.
  • Balancing the ability to mitigate over-smoothing and expressive power is necessary.