
Site-Level Bioactivity of Small-Molecules from Deep-Learned Representations of Quantum Chemistry
Author(s) -
Kathryn Sarullo,
Matthew K. Matlock,
S. Joshua Swamidass
Publication year - 2020
Publication title -
the journal of physical chemistry. a/the journal of physical chemistry. a.
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.756
H-Index - 235
eISSN - 1520-5215
pISSN - 1089-5639
DOI - 10.1021/acs.jpca.0c06231
Subject(s) - generalizability theory , quantum chemistry , electrophile , molecule , quantum chemical , quantitative structure–activity relationship , computer science , chemistry , reactivity (psychology) , small molecule , quantum , computational chemistry , chemical space , deep learning , observable , drug discovery , biological system , artificial intelligence , machine learning , mathematics , physics , organic chemistry , quantum mechanics , medicine , biochemistry , supramolecular chemistry , statistics , alternative medicine , pathology , biology , catalysis
Atom- or bond-level chemical properties of interest in medicinal chemistry, such as drug metabolism and electrophilic reactivity, are important to understand and predict across arbitrary new molecules. Deep learning can be used to map molecular structures to their chemical properties, but the data sets for these tasks are relatively small, which can limit accuracy and generalizability. To overcome this limitation, it would be preferable to model these properties on the basis of the underlying quantum chemical characteristics of small molecules. However, it is difficult to learn higher level chemical properties from lower level quantum calculations. To overcome this challenge, we pretrained deep learning models to compute quantum chemical properties and then reused the intermediate representations constructed by the pretrained network. Transfer learning, in this way, substantially outperformed models based on chemical graphs alone or quantum chemical properties alone. This result was robust, observable in five prediction tasks: identifying sites of epoxidation by metabolic enzymes and identifying sites of covalent reactivity with cyanide, glutathione, DNA and protein. We see that this approach may substantially improve the accuracy of deep learning models for specific chemical structures, such as aromatic systems.