GA(M)E-QSAR: A Novel, Fully Automatic Genetic-Algorithm-(Meta)-Ensembles Approach for Binary Classification in Ligand-Based Drug Design | Zendy

Yunierkis PérezCastillo | Zendy; Cosmin Lazar | Zendy; Jonatan Taminau | Zendy; Matheus Froeyen | Zendy; Miguel Ángel CabreraPérez | Zendy; Ann Nowé | Zendy

Open Access

GA(M)E-QSAR: A Novel, Fully Automatic Genetic-Algorithm-(Meta)-Ensembles Approach for Binary Classification in Ligand-Based Drug Design

Author(s) -

Yunierkis PérezCastillo,

Cosmin Lazar,

Jonatan Taminau,

Matheus Froeyen,

Miguel Ángel CabreraPérez,

Ann Nowé

Publication year - 2012

Publication title -

journal of chemical information and modeling

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.24

H-Index - 160

eISSN - 1549-960X

pISSN - 1549-9596

DOI - 10.1021/ci300146h

Subject(s) - computer science , artificial intelligence , adaboost , feature selection , machine learning , quantitative structure–activity relationship , algorithm , robustness (evolution) , genetic algorithm , ensemble learning , classifier (uml) , statistical classification , pattern recognition (psychology) , data mining , biochemistry , chemistry , gene

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research