Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Machine-learned CG models can simulate large molecular complexes.
- Training accurate CG models is a challenge.
- Commonly used mapping methods are inefficient and incorrect.
- Optimized force maps can lead to improved CG force-fields.
Paper Content
Introduction
- Current simulations are limited by computational cost.
- Coarse-graining is used to reduce the computational burden.
- Finding a force-field that accurately represents physical interactions is a challenge.
- Many approaches to parameterizing bottom-up CG force-fields require repeated simulations.
- Minimizing the mean-squared deviation between a CG candidate force-field and suitably mapped atomistic forces yields the many-body potential of mean force.
- No work has directly and systematically investigated the influence of the mapping that projects fine-grained forces to the CG resolution.
- Overfitting can occur when parameterizing a machine-learned force-field on a finite trajectory.
- Noise present in the forces can lead to high data requirements.
- Designing the force mapping to reduce this noise improves trained CG force-fields.
- Multiple force mappings can be used as long as they obey consistency requirements.
- A variational statement minimizes the noise of the mapped forces.
- High noise and constraint-inconsistent force mappings degrade learned CG force-fields.
Defining valid force mapping operators
- Ciccotti et al. found a relation between CG mean force and atomistic forces
- Orthogonality to constraints is important for the force mapping operator
- Compatibility with configurational mapping is also important
- Force mapping operator is generally ambiguous for a fixed configurational mapping
- Noid et al. defined conditions to satisfy both orthogonality and compatibility
- Flexibility of force mapping operator can be exploited for noise-reduction
Dual variational principle for force matching and noise-reduction
- Force residual can be decomposed into PMF error and noise
- PMF error is due to limited expressivity of CG model and finite data
- Noise is due to stochastic nature of mapped training forces
- Noise can dominate force residual, leading to high variance and data inefficiency
- Force mapping can be improved to reduce noise and improve training objective
- Derive new dual variational principle for force matching and noise-reduction
- Optimizing force mapping facilitates finding CG potential and vice versa
Computationally efficient optimization of linear force mappings
- Use variational principle to optimize force residual
- Compute expression in Eq. (4) at each optimization step
- Construct configuration independent (“linear”) force map to minimize average magnitude of mapped forces
- Utilize control variates in force averaging procedure to minimize gradient noise
Results
- Choice of force mapping can affect quality of CG forcefield
- Low dimensional CG potential used to model water dimer
- Visualize and discuss issues caused by atomistic constraints
- Investigate effect on high-dimensional CG neural network potentials
- Neural network potentials trained to reproduce folding behavior of miniprotein Chignolin and Trp Cage
Water dimers demonstrate the importance of force mappings
- Water dimer system contains two TIP3P molecules interacting via Coulomb and Lennard-Jones interactions
- Two datasets created by running MD simulations with and without rigid bond and angle constraints
- Most favorable configuration is dimer state with oxygen-oxygen distance slightly below 0.3 nm
- CG potential defined as linear combination of radial basis functions on oxygen-oxygen distance
- Two aspects of coarse-graining task varied: force mapping and training data
- Rigid system discussed to influence of atomistic constraints
- Orthogonality condition in Eq. (5) ensures that force mappings eliminate spurious atomistic contributions
- Basic aggregated forces capture hydrogen-bond-driven water-water attraction
- Sliced forces dominated by noise produced by fluctuations in intramolecular bonds and angles
- Basic aggregation reduces noise in mapped forces and helps train CG potentials on finite datasets
Optimized forces improve protein models
- Proposed force mappings improve coarse-graining of proteins using high-dimensional force-fields
- Chignolin and Trp Cage miniproteins used to investigate CG force-field design
- Modeled proteins by preserving C α positions
- Two force mapping operators tested: basic aggregation and optimized
- Reference atomistic simulations used constrained bonds to hydrogens
- CG force-fields validated using MD and TICA
- Free energy surfaces compared in three ways
- Effect of reduced dataset size and cross validation investigated
- Sliced forces produce worst accuracy
- Optimized forces increase efficiency by a factor of 3
- Optimized forces result in lower force residuals
- Aggregated configurational maps may be less likely to produce errors
Conclusion
- Machine-learned force-fields are becoming more powerful
- Selection of force mapping affects resulting force-field
- Optimized force mapping reduces overfitting and increases accuracy, robustness, and data-efficiency
- Partly decoupling force mapping coefficients from configurational map may improve optimization
- Decomposition of force matching residual specified and discussed
- Minimizing noise with respect to η is equivalent to minimizing average magnitude of mapped forces
- Quadratic programming used to approximate optimization of F (r; η) 2 2 r
- Water dimer reference systems simulated in OpenMM
- CG energy implemented in PyTorch
- Water datasets subsampled using 10 different random seeds
- Training conducted over 1000 epochs using Adam optimizer
- CLN025 solvated and equilibrated using TIP3P waters and CHARMM22* force-field
- Langevin integrator used with integration timestep of 4 fs and friction damping constant of 0.1 ps −1
- Hydrogen-heavy atom bonds holonomically constrained
- MSM-based sampling approach used
- Atomistic TICs created by featurizing the atomistic trajectory
- CG model of CLN025 defined by retaining only the 10 backbone C α atoms
- CG force-field defined as a sum of a prior model and a modified SchNet GNN
- Hyperparameters not scanned over for this publication
- Hold-out force residuals for models trained using optimized forces were consistency slightly lower than those trained using basic or sliced forces