z-logo
open-access-imgOpen Access
Molecular Generators and Optimizers Failure Modes
Author(s) -
Mani Manavalan
Publication year - 2021
Publication title -
malaysian journal of medical and biological research
Language(s) - English
Resource type - Journals
eISSN - 2313-0016
pISSN - 2313-0008
DOI - 10.18034/mjmbr.v8i2.583
Subject(s) - chembl , computer science , generative model , generative grammar , benchmark (surveying) , function (biology) , virtual screening , machine learning , reusability , artificial intelligence , drug discovery , bioinformatics , programming language , biology , geodesy , software , evolutionary biology , geography
In recent years, there has been an uptick in interest in generative models for molecules in drug development. In the field of de novo molecular design, these models are used to make molecules with desired properties from scratch. This is occasionally used instead of virtual screening, which is limited by the size of the libraries that can be searched in practice. Rather than screening existing libraries, generative models can be used to build custom libraries from scratch. Using generative models, which may optimize molecules straight towards the desired profile, this time-consuming approach can be sped up. The purpose of this work is to show how current shortcomings in evaluating generative models for molecules can be avoided. We cover both distribution-learning and goal-directed generation with a focus on the latter. Three well-known targets were downloaded from ChEMBL: Janus kinase 2 (JAK2), epidermal growth factor receptor (EGFR), and dopamine receptor D2 (DRD2) (Bento et al. 2014). We preprocessed the data to get binary classification jobs. Before calculating a scoring function, the data is split into two halves, which we shall refer to as split 1/2. The ratio of active to inactive users. Our goal is to train three bioactivity models with equal prediction performance, one to be used as a scoring function for chemical optimization and the other two to be used as performance evaluation models. Our findings suggest that distribution-learning can attain near-perfect scores on many existing criteria even with the most basic and completely useless models. According to benchmark studies, likelihood-based models account for many of the best technologies, and we propose that test set likelihoods be included in future comparisons.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here