z-logo
open-access-imgOpen Access
Data Set Augmentation Allows Deep Learning-Based Virtual Screening to Better Generalize to Unseen Target Classes and Highlight Important Binding Interactions
Author(s) -
Jack Scantlebury,
Nathan Brown,
F. von Delft,
Charlotte M. Deane
Publication year - 2020
Publication title -
journal of chemical information and modeling
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.24
H-Index - 160
eISSN - 1549-960X
pISSN - 1549-9596
DOI - 10.1021/acs.jcim.0c00263
Subject(s) - virtual screening , set (abstract data type) , computer science , training set , artificial intelligence , deep learning , machine learning , data set , protein ligand , ligand (biochemistry) , simple (philosophy) , data mining , bioinformatics , drug discovery , chemistry , biology , biochemistry , philosophy , receptor , organic chemistry , epistemology , programming language
Current deep learning methods for structure-based virtual screening take the structures of both the protein and the ligand as input but make little or no use of the protein structure when predicting ligand binding. Here, we show how a relatively simple method of data set augmentation forces such deep learning methods to take into account information from the protein. Models trained in this way are more generalizable (make better predictions on protein/ligand complexes from a different distribution to the training data). They also assign more meaningful importance to the protein and ligand atoms involved in binding. Overall, our results show that data set augmentation can help deep learning-based virtual screening to learn physical interactions rather than data set biases.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom