Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Deep neural networks are accurate and powerful wavefunction ansatz for solving the electronic Schr"odinger equation.
Optimizing the wavefunction from scratch for each new system is computationally costly.
A novel neural network ansatz is proposed which maps uncorrelated, computationally cheap Hartree-Fock orbitals to correlated, high-accuracy neural network orbitals.
This ansatz is capable of learning a single wavefunction across multiple compounds and geometries.
Pre-training of a generalized wavefunction model across different compounds and geometries could lead to a foundation wavefunction model.
This model could yield high-accuracy ab-initio energies with minimal computational effort.

Paper Content

Introduction

Accurate predictions of quantum mechanical properties for molecules is important for development of new compounds
Solution to Schrödinger equation yields wavefunction and electron density
Computing accurate approximations to Schrödinger equation quickly becomes computationally intractable with increasing number of particles
Deep-learning-based Variational Monte Carlo (DL-VMC) methods have emerged as a high-accuracy approach with favorable scaling
Improvements and applications have emerged, such as enhancements of architecture, optimization and overall approach
DL-VMC has been adapted to many different systems and observables
DL-VMC does not require any input beyond the Hamiltonian
Computational cost for large-scale use is high
We propose a novel neural network ansatz which does not depend explicitly on the number of particles
Our model exhibits strong generalization when transferring weights from small molecules to larger, similar molecules

Results

A multi-compound wavefunction ansatz

Existing high-accuracy approaches for neural network wavefunctions have a specific structure.
This structure is unsuited to represent wavefunctions for multiple different compounds at once.
Several potential options exist to overcome this challenge, such as generating a fixed number of orbitals or using separate matrices for each molecule.

Properties of our ansatz

Constant parameter count
Equivariant to sign of HF-orbital
Locality and high expressivity

Size consistency of the ansatz

Design goal of ansatz is to allow transfer of weights from small systems to larger systems.
Test case is chains of equally spaced Hydrogen atoms of increasing lengths.
Ansatz achieves high zero-shot-accuracy in interpolation and extrapolation regimes.
Fine-tuning pre-trained model yields near perfect agreement with specialized MRCI+Q method.
Results outperform GLOBE+ FermiNet and GLOBE+Moon.

Equivariance with respect to hf-phase

Our orbitals are equivariant with respect to a change of sign of the Hartree Fock orbitals.
We tested our proposed architecture by rotating a H2O molecule and comparing it to a standard backflow matrix.
The standard backflow-type architecture had a spike in the HF-pre-training loss at the position of the sign flip, causing slower convergence.
Phase alignment is not possible in all circumstances.

Transfer to larger, chemically similar compounds

Experiment tests generalization and transferability of approach
Train ansatz on dataset of multiple geometries of small compound
Re-use weights for different geometries of larger compound
Compare results to DeepErwin and GLOBE
Measure accuracy with mean energy error and maximum relative energy error
Our approach yields lower and more consistent energies

Towards a first foundation model for neural network wavefunctions

Compiled a dataset of 360 distorted geometries across 18 different compounds
Pre-trained a basemodel for 500,000 steps on this diverse dataset
Evaluated performance when computing Potential Energy Surfaces
Compared results against a baseline model
Fine-tuning pre-trained model yields substantially lower energies than usual optimization from a HF-pre-trained model

Scaling behaviour

Increasing the amount of pre-training leads to better results
Accuracy depends on model size and number of compounds in pre-training dataset
Results indicate approach may be sufficient to train accurate multi-compound, multigeometry foundation model for wavefunctions

Discussion

Presents an ansatz for deep-learning based VMC
Can be applied to molecules of arbitrary size
Properties include extensivity, zero-shot prediction of wavefunctions, invariance to phase of orbitals, and fast finetuning
First successful demonstration of a general wavefunction trained on diverse dataset
Pre-training on large data and fine-tuning on specific problems can be applied to wavefunctions
Outperforms conventional variational methods
Open questions and limitations to be addressed in future work
Code, dataset, and model parameters open sourced

Methods

Variational monte carlo

Molecules can be described by the time-independent Schrödinger equation
Electrons are divided into spin-up and spin-down
Finding the groundstate wavefunction requires finding the solution to the Schrödinger equation with the lowest eigenvalue
Approximate solution can be found through minimization of the loss using a parameterized trial wavefunction
Expectation value is computed by drawing samples from the unnormalized probability distribution using Markov Chain Monte Carlo
Calculation typically consists of three steps: supervised HF-pre-training, variational optimization, and evaluation

Obtaining orbital descriptors from hartree-fock

Hartree-Fock method uses single-particle orbitals to obtain coefficients and orbitals
Orbitals can be localized using different metrics and localization schemes
Foster-Boys method is used in this paper
Graph convolutional neural network is used to add context about surrounding atoms

Mapping orbital descriptors to wavefunctions

Combining a high-dimensional electron embedding with a function of the orbital descriptor to obtain entry Φ ik of the Slater determinant
Functions GCN a θ , f a θ , and g s θ are trainable and (anti-)symmetric with respect to change in sign of their argument c
Using message-passing architecture outlined in [5] to obtain electron embeddings, which is invariant with respect to permutation of electrons of the same spin
All samples in a batch come from the same geometry
Relying on the second order optimizer K-FAC [35,36]
Increasing initial damping by 10x and decreasing it to 1 × 10 −3 with a inverse scheduler
Pre-trained runs used a 5x lower initial learning rate with a learning rate scheduler offset o = 32, 000
Foundation model used 512, 000 optimization steps
Small-and medium-sized model differ from the large model by the number of hidden layers, neurons per layer, and iterations of the GCN

Link to paper#

Abstract#

Paper Content#

Introduction#

Results#

A multi-compound wavefunction ansatz#

Properties of our ansatz#

Size consistency of the ansatz#

Equivariance with respect to hf-phase#

Transfer to larger, chemically similar compounds#

Towards a first foundation model for neural network wavefunctions#

Scaling behaviour#

Discussion#

Methods#

Variational monte carlo#

Obtaining orbital descriptors from hartree-fock#

Mapping orbital descriptors to wavefunctions#

Link to paper

Abstract

Paper Content

Introduction

Results

A multi-compound wavefunction ansatz

Properties of our ansatz

Size consistency of the ansatz

Equivariance with respect to hf-phase

Transfer to larger, chemically similar compounds

Towards a first foundation model for neural network wavefunctions

Scaling behaviour

Discussion

Methods

Variational monte carlo

Obtaining orbital descriptors from hartree-fock

Mapping orbital descriptors to wavefunctions