Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Deep Learning (DL) has problems such as feature redundancy and vanishing/exploding gradients.
Riemannian-based DL uses geometric optimization to update parameters on Riemannian manifolds.
This article surveys the application of geometric optimization in DL networks for AI tasks.
Toolboxes that implement optimization on manifold are discussed.
Performance comparison between deep geometric optimization methods is made.

Paper Content

Introduction

Increasing computing power has enabled deep neural networks to be successful in various tasks.
Deep learning models often contain many layers and parameters, which can be challenging to optimize.
Geometric optimization can reduce parameters and convert constrained optimization problems into unconstrained ones.
Geometric optimization has been applied to various deep neural networks, such as CNN, RNN and ViT.
This article reviews the theory and applications of geometric optimization in shallow and deep learning.
It also investigates representative manifold optimization toolboxes and compares performance of different geometric deep learning methods.

Geometric optimization theory

Optimization problems are used to find the maximum or minimum value of a cost function.
Conventional optimization methods can be used to solve unconstrained optimization problems.
Constrained optimization problems can be transformed into unconstrained problems using Lagrange multipliers or a barrier penalty function.
Geometric optimization methods are developed to exploit the underlying geometry of a cost function.
Geometric optimization methods use Riemannian optimizers to find an optimal solution.

Geometric optimization process on manifolds

Figure 3 shows the update process in geometric optimization
Each point on the manifold has a corresponding tangent space
The tangent space has an inner product which helps with vector metrics
A Riemannian gradient is a tangent vector on the tangent space
A geodesic is a locally shortest path between two points on the manifold
The geodesic defined by the negative Riemannian gradient reveals the next point in the optimization direction
Exponential mapping and retraction operation are used to map a point from the tangent space to the manifold

Gradient descent optimizers

Optimization problems can be abstracted as where θ are trainable parameters and E means the Euclidean space
There are a variety of standard optimizers for Equation (7)
Gradient descent is a basic optimization strategy
SGD can accelerate convergence
SGD-M is developed to maintain the inertia of the previous step
RMSProp can adaptively determine the learning rate of parameters
Gradient descent takes the form where λ is a hyper-parameter representing the step size
SGD uses random mini-batches of training data to update parameters
SGD-M exerts the influence of the last update on the current update
RMSProp considers the influence of the last update when calculating the upcoming update
Euclidean gradient descent can be transferred to Riemannian manifolds
Constraint SGD-M and constraint RMSProp are instances of generalizing gradient descent optimizers from Euclidean space to manifolds

Manifold examples

Different kinds of matrix manifolds have different geometry structures and advantages when applying geometric optimization to deep learning.
Oblique manifold is useful for dictionary learning due to its property of unit-norm columns.
Stiefel manifold helps optimize RNNs since matrices on the Stiefel manifold have orthogonal and uncorrelated columns.
Common manifold structures include Stiefel, oblique, Graßmann, product, quotient, SPD, sphere, and unitary.
SPD matrices are used for image and video statistical representations.
Stiefel manifold has an upper bound which allows it to achieve an optimal solution.
Oblique manifold is the set of matrices with unit-norm columns.
Graßmann manifold is different from Stiefel manifold, representing an entire subspace.
Unitary matrices are the extension of orthogonal matrices to the complex domain.
Lie groups are real or complex manifolds with group structure.

Applications in classical machine learning

Classical machine learning methods have been successful in solving AI problems.
Solving large categories of constrained classical machine learning problems in Euclidean space is difficult.
Geometric optimization can decrease the difficulty by treating constrained problems as unconstrained ones on Riemannian manifolds.

Dimension reduction

Dimension reduction (DR) is a process of finding a lower-dimensional representation of given data samples.
DR approaches can use linear or nonlinear transformations.
A generic algorithmic framework to find an optimal solution involves a maximization problem.
Solutions of the maximization problem are rotation invariant.
Most linear DR methods begin with solving tr(V T AV ) while nonlinear DR methods construct a graph by connecting nearby points.

Inverse problem

Inverse problems have a significant impact on practical applications.
Inverse problems involve reconstructing inputs from outputs.
Solutions to inverse problems can be achieved by confining the parameter matrix W to reside on a smooth Riemannian manifold.

Dictionary learning

Dictionary learning is used to obtain the most essential features of input data.
X is expanded into a linear combination of D1, …, Dn.
Dictionary learning aims to learn a D that makes the coefficients Φ be zero or close to zero.

Analysis operator learning

Analysis operator learning assumes that a few operators are enough to represent high-dimensional variables.
The operators are hidden and not observed.
The goal is to find these hidden operators to simplify the original variables.
The analysis operator learning is formulated as an optimization problem on the positive manifold M.

Temporal model

Temporal probability model composed of transition and sensor models
Transition model describes state evolution over time
Sensor model describes observation process
Temporal model used for filtering, prediction and smoothing
Transition process of states modeled with Gaussian noise
Observation process modeled with Gaussian noise
Temporal models divided into hidden Markov models and linear dynamic systems

Applications in deep learning

Deep learning methods are combining with geometric optimization
Geometric optimization techniques vary with different deep learning backbones (e.g., CNN, RNN and GNN)
Orthogonal manifold is widely used in geometric CNNs to reduce feature redundancy

Geometric cnn

Deep CNNs have achieved success in computer vision tasks
CNNs learn features from large-scale data using convolution, activation, and pooling structures
Problems such as training instability and feature redundancy can be alleviated by geometric optimization approaches
Kernel space maps original features to a higher dimensional space
Geometric regularization imposes restrictions on the parameters of the optimization function
Quasi-CNN architectures mimic traditional CNN architecture and establish a new architecture suitable for the manifold structure
SPDNet and GrNet are examples of quasi-CNN architectures
SPDNet uses Bilinear mapping, eigenvalue rectification, and eigenvalue logarithm layers
GrNet uses Full rank mapping, re-orthonormalization, inner product, and orthonormal mapping layers
Manifold regularization can be used to enhance the nonlinear locality constraints of CNN parameters
SURFMNet uses orthogonality constraints to regularize a convolution layer
Huang et al. incorporated a Lie group structure to parameter matrices in the deep human action recognition network
Chen et al. proposed a deep manifold learning framework to learn manifold information and deep representations of action videos

Geometric rnn

RNNs are designed to process sequential data
RNNs can capture spatial and temporal dependencies between the sequential input
RNNs can be applied in tasks such as speech recognition, text prediction, and machine translation
RNNs generate output predictions based on input weight matrix, recurrent weight matrix, previous hidden state, input bias, pointwise nonlinearity function, current hidden state, output weight matrix, and output bias
Gradient of the loss function for the hidden state can be computed
Exploding and vanishing gradient problem of RNN can be alleviated with orthogonal constraints
uRNN parameterizes the unitary hidden-to-hidden matrix by composing simple unitary matrices
Full-capacity uRNN is proposed to cover all N x N unitary matrices
ExpRNN exploits the exponential map to achieve orthogonal constraints
OMDSM optimizes DNN over multiple dependent Stiefel manifolds
Soft orthogonal constraints can be explored
Householder matrix can be used to reduce time complexity of parameterizing unitary matrices
GORU designs a forget gate to pay little attention to extraneous information

Geometric gnn

GNN can be used to construct a learning network based on irregular graphs
GNN encodes vertexes as feature vectors and models edges as a relationship matrix
GNN can take advantage of the graph structure and update the feature information of each vertex
GIL incorporates Euclidean space with hyperbolic geometry to model both low-dimensional regular data and complex hierarchical structures
MRDGCN integrates manifold regularization into GCN to model dynamic structure information

Geometric optimization for other deep learning methods

Robust Time Series Prediction uses low-rank constraint and feature selection to deal with noisy disturbances
Medical Reconstruction combines CNN and SToRM with conjugate gradients for fast and high quality MRI data
Transfer Learning uses knowledge distillation to transfer model knowledge from a well-trained model to a compact model
Optimal Transport uses Riemannian gradient descent and generalized doubly stochastic manifold to measure distance between two probability distributions
Robots use geometry-aware Bayesian optimization with Matérn kernel to incorporate domain geometry into optimization algorithm
Continual Learning uses low-rank orthogonal manifold to project gradient into disjoint subspace and alleviate catastrophic forgetting

Toolbox

Toolboxes can help build neural networks
Manopt, Pymanopt, McTorch, and Geomstats are classic toolboxes for manifold geometries and optimization algorithms
Manopt and Pymanopt are limited to shallow learning optimizations
McTorch extends Pytorch for deep learning optimizations
Geoopt is cheaper than McTorch
Geomstats has two core modules for geometry and learning
TheanoGeometry uses Theano for symbolic calculations and Riemannian geometry

Performance evaluation

GORU outperforms other ORNNs on the MNIST dataset
expRNN uses surjective exponential map to realize orthogonal parameterization
uRNN uses simple unitary matrices to construct the unitary hidden-to-hidden matrix
full-capacity uRNN overcomes bottleneck of uRNN
soRNN uses regularization terms to realize orthogonal parameterization
ORNN exploits householder matrix to enforce orthogonal constraint
SPDNet and GrNet achieve better classification results than state-of-the-art methods
Hariri et al. method achieves highest precision on BU-3DFE and Bosphorus datasets
SPDNet and GrNet outperform state-of-the-art methods on action recognition and face recognition tasks
SRMR outperforms state-of-the-art non-manifold methods on scene recognition datasets
Different architecture settings affect classification accuracy

Conclusions and future work

Reviewed progress of optimizing deep learning networks on manifolds
Needs further research on dataset-oriented geometric optimization
Needs further research on model-oriented geometric optimization
Needs further research on manifold-oriented geometric optimization

Link to paper#

Abstract#

Paper Content#

Introduction#

Geometric optimization theory#

Geometric optimization process on manifolds#

Gradient descent optimizers#

Manifold examples#

Applications in classical machine learning#

Dimension reduction#

Inverse problem#

Dictionary learning#

Analysis operator learning#

Temporal model#

Applications in deep learning#

Geometric cnn#

Geometric rnn#

Geometric gnn#

Geometric optimization for other deep learning methods#

Toolbox#

Performance evaluation#

Conclusions and future work#

Link to paper

Abstract

Paper Content

Introduction

Geometric optimization theory

Geometric optimization process on manifolds

Gradient descent optimizers

Manifold examples

Applications in classical machine learning

Dimension reduction

Inverse problem

Dictionary learning

Analysis operator learning

Temporal model

Applications in deep learning

Geometric cnn

Geometric rnn

Geometric gnn

Geometric optimization for other deep learning methods

Toolbox

Performance evaluation

Conclusions and future work