Premium
Generative Adversarial Networks (GANs) Based Synthetic Sampling for Predictive Modeling
Author(s) -
Barigye Stephen J.,
García de la Vega José M.,
PerezCastillo Yunierkis
Publication year - 2020
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.202000086
Subject(s) - chemical space , computer science , artificial intelligence , molecular descriptor , synthetic data , machine learning , property (philosophy) , pattern recognition (psychology) , matthews correlation coefficient , generative model , generative grammar , quantitative structure–activity relationship , data mining , drug discovery , support vector machine , chemistry , philosophy , epistemology , biochemistry
Abstract In the present report we evaluate the possible utility of the Generative Adversarial Networks (GANs) in mapping the chemical structural space for molecular property profiles, with the goal of subsequently yielding synthetic (artificial) samples for ligand‐based molecular modeling. Two case studies are considered: BACE‐1 (β‐Secretase 1) and DENV (Dengue Virus) inhibitory activities, with the former focused on data populating and the latter on data balancing tasks. We train GANs using subsamples extracted from datasets for each bioactivity endpoint, and apply the trained networks in generating synthetic examples from the respective bioactivity chemical spaces. Original and synthetic samples are pooled together and employed to build BACE‐1 and DENV inhibitory activity classifiers and their performance evaluated over tenfold external validation sets. In both case studies, the obtained classifiers demonstrate satisfactory predictivity with the former yielding accuracy (ACC) and Mathew's correlation coefficient (MCC) values of 0.80 and 0.59, while the latter produces balanced accuracy(BACC) and MCC values of 0.81 and 0.70, respectively. Moreover, the statistics of these classifiers are compared with those of other models in the literature demonstrating comparable to better performance. These results suggest that GANs may be useful in mapping the chemical space for molecular property profiles of interest, and thus allow for the extraction of synthetic examples for computational modeling.