Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • AI in the form of deep learning has potential for drug discovery and chemical biology
  • AI can be used to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules
  • Deep learning efforts in drug discovery have focused on ligand-based approaches
  • Structure-based drug discovery has potential to tackle unsolved challenges
  • Advances in deep learning methodologies and availability of accurate predictions for protein tertiary structure advocate for a renaissance in structure-based approaches for drug discovery guided by AI

Paper Content

Introduction

  • Deep learning is a subfield of AI
  • Uses multi-layer neural networks
  • Used to advance mathematics, investigate galaxies, and generate realistic images
  • Chemistry and biology have seen AI breakthroughs in protein structure prediction, chemical synthesis planning, and atomistic simulations
  • Drug discovery has benefited from deep learning
  • Deep learning can accelerate navigation of chemical space of drug-like molecules
  • Most deep learning studies have focused on ligand-based approaches
  • Structure-based deep learning approaches have not found parallel interest yet
  • Deep learning can help address existing drug discovery challenges
  • Deep learning does not require explicit feature engineering
  • Accurate protein structure prediction efforts can accelerate computer-assisted SBDD
  • Deep learning for SBDD is still in its infancy but is moving forward quickly

Representing proteins for deep learning

  • Deep learning approaches for SBDD are more intricate than ligand-based approaches.
  • Proteins have 3 levels of complexity: primary, secondary and tertiary structure.
  • Protein representations used for deep learning include primary sequence and tertiary structure.

Box 1: glossary of selected terms

  • Binding site: Protein region responsible for interaction with other molecules
  • Convolutional neural network (CNN): Neural network architecture that learns from local information
  • Molecular docking: Computational procedure used to predict binding mode of molecule
  • Generative deep learning: Deep learning methods that model data distribution and generate new data points
  • Geometric deep learning: Neural network architectures that incorporate symmetry information
  • Molecular descriptors: Numerical features obtained from molecular representation
  • Ligand: Molecule that binds to protein with high affinity and specificity
  • Reinforcement learning: Machine learning that maximizes cumulative reward in environment
  • SMILES: Representation of small molecule ligands
  • Protein surface: Separation between solvent-accessible and inaccessible regions

Deep learning for structure-based drug discovery

  • SBDD approaches fueled by deep learning are discussed
  • Three key tasks are focused on: binding site detection, drug-target interaction prediction, and structure-based de novo design
  • Protein representation used includes sequence, structure, and surface

Drug-target interaction prediction

  • Predicting interaction between proteins and ligands
  • Amino-acid sequence and 3D structure used to predict protein structures
  • AlphaFold v 2.0 used to predict 2,316 structures
  • AlphaFill used to transplant common ligands and cofactors
  • Binding MOAD used to predict X-ray crystal structures with bound ligands and experimental binding affinities
  • DUD-E used to benchmark molecular docking programs
  • BindingDB used to validate sets of protein-ligand pairs
  • BigBind used to associate protein structures to ChEMBL assay data
  • KIBA used to measure bioactivity of compounds against kinases
  • Davis used to measure binding affinities of inhibitors against kinases
  • 3D structure-based and surface-based approaches used to predict binding affinity
  • Deep learning used to optimize ligand pose and predict binding pose

Binding site detection

  • Identification of ‘druggable’ binding sites in proteins is important for SBDD
  • Several methods have been developed for binding site detection
  • Deep learning methods have been used to detect binding sites
  • Sequence-based models use AI to highlight relevant residues
  • 3D structure-based models use spatial information of proteins
  • Surface-based models use protein surfaces
  • De novo design generates novel chemical entities
  • Sequence-based de novo design uses machine translation
  • 3D structure-based de novo design generates 3D molecular graphs or strings
  • Structure-based de novo design has not been applied prospectively

Box 2: simplified depiction of common neural networks applied to protein representations.

  • Protein representations are suited to different neural network architectures
  • RNNs are used to process protein primary sequences and capture long-range dependencies
  • CNNs are used with voxelized representations to capture spatial dependencies
  • GNNs operate on molecular graphs and can capture structural and functional relationships

Gaps, opportunities, and outlook

  • Deep learning for structure-based drug discovery is gaining traction
  • Protein structure prediction is an example of the potential of deep learning
  • SBDD is more challenging than ligand-based counterpart
  • Geometric deep learning and diffusion models are gaining popularity
  • Training data is limited and biased
  • Deep learning is met with skepticism by experimentalists
  • Need to validate methods experimentally and use explainable AI

Conclusions

  • Deep learning has been used in drug discovery to explore chemical space.
  • It can be used for ligand binding site detection, drug-target interaction prediction, and structure-based de novo design.
  • Data collection and curation is needed to overcome barriers in structure-based methods.