Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Node features of graph neural networks become more similar with increased network depth, known as over-smoothing.
- Definition of over-smoothing is unified and new quantitative measures are introduced.
- Over-smoothing is demonstrated empirically on different graphs.
- Approaches for mitigating over-smoothing are reviewed and tested on real-world datasets.
- Mitigating over-smoothing is necessary but not sufficient for building deep GNNs.
- Definition of over-smoothing is extended to continuous-time GNNs.
Paper Content
Introduction
- Graph Neural Networks (GNNs) are powerful tools for learning on relational and interaction data
- GNNs have been applied to a variety of tasks, such as computer vision, recommender systems, transportation, etc.
- The number of layers in a neural network is often considered to be important for performance
- GNNs are often relatively shallow and have few layers
- Issues impairing the performance of deep GNNs include graph bottlenecks, over-squashing, and over-smoothing
- Over-smoothing is when node features converge to the same constant value as the number of layers increases
- Recent literature has focused on defining over-smoothing through measures of node feature similarities
- This article aims to unify existing approaches and define over-smoothing in a formal and tractable manner
- It also reviews approaches to mitigate over-smoothing and provides empirical evaluation
Definition of over-smoothing
- G is an undirected graph with v nodes and e edges
- Each node has an m-dimensional feature vector
- Message-Passing GNN updates the node features
- Local (1-neighborhood) coupling of the form (F(X, G)) i = F(X i , X jโNi )
- Over-smoothing is defined as layer-wise exponential convergence of the node-similarity measure to zero
- Node-similarity measure must satisfy triangle inequality and subadditivity
- Definition can be generalized to disconnected graphs
Over-smoothing measures
- Existing approaches to measure over-smoothing in deep GNNs are based on Dirichlet energy on graphs
- Dirichlet energy is a node-similarity measure that satisfies conditions 1 and 2 of the over-smoothing definition
- Mean Average Distance (MAD) is not a node-similarity measure and is always zero in the scalar case
- MAD does converge exponentially to zero for increasing number of layers if the GNN over-smooths
- Dirichlet energy should be favored over MAD
- Other measures that constitute a node-similarity measure can be used
- Rusch et al. (2022) have empirically demonstrated the qualitative behavior described in Definition 1
- Exponential convergence of the layer-wise over-smoothing measure is necessary for GNN to suffer from over-smoothing
Methods
- Normalization and regularization can reduce over-smoothing in deep GNNs
- Explicit regularization techniques measure over-smoothing using Dirichlet energy
- Implicit regularization techniques add noise to the optimization process
- Change of GNN dynamics can mitigate over-smoothing
- Residual connections can be added to deep GNNs
- Residual connections aggregate all node features of every layer of a deep GNN at the final layer
- Residual connections can lead to major improvements over competing methods
Empirical evaluation
- Evaluated effectiveness of methods to mitigate over-smoothing in deep GNNs
- Used 3 different graphs (Texas, Cora, Cornell5)
- Fixed one node-similarity measure (Dirichlet energy)
- DropEdge-GCN and Res-GCN suffer from over-smoothing
- Other methods keep layer-wise Dirichlet energy approximately constant
Risk of sacrificing expressivity to mitigate over-smoothing
- Over-smoothing can be mitigated by several methods
- Adding a bias vector to a deep GCN with shared parameters can keep the layer-wise Dirichlet energy constant
- Keeping the Dirichlet energy constant is not enough to construct well performing deep GNNs
- PairNorm exhibits an approximately constant layer-wise Dirichlet energy but its performance drops exponentially
- G2-GCN keeps the node-similarity measure approximately constant and increases its expressive power for increasing number of layers
Extension to continuous-time gnns
- GNNs are continuous in depth
- Message-passing propagation is modeled by differential equations
- Different vector fields yield different architectures
- Over-smoothing is defined as exponential convergence of node-similarity measure to zero
Conclusion
- Stacking multiple message-passing layers is necessary to process relational data with long-range interactions.
- Over-smoothing is a central challenge in constructing deep GNNs.
- An axiomatic definition of over-smoothing is provided.
- Measures for over-smoothing are tested on three different graph datasets.
- Approaches to mitigate over-smoothing are reviewed and tested.
- Balancing the ability to mitigate over-smoothing and expressive power is necessary.