z-logo
open-access-imgOpen Access
Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets
Author(s) -
Kimberley M. Zorn,
Thomas R. Lane,
Daniel P. Russo,
Alex M. Clark,
Vadim Makarov,
Sean Ekins
Publication year - 2019
Publication title -
molecular pharmaceutics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.13
H-Index - 127
eISSN - 1543-8392
pISSN - 1543-8384
DOI - 10.1021/acs.molpharmaceut.8b01297
Subject(s) - machine learning , artificial intelligence , reverse transcriptase , naive bayes classifier , adaboost , nucleoside reverse transcriptase inhibitor , support vector machine , random forest , test set , computer science , decision tree , human immunodeficiency virus (hiv) , medicine , virology , biology , antiretroviral therapy , polymerase chain reaction , viral load , gene , biochemistry
The human immunodeficiency virus (HIV) causes over a million deaths every year and has a huge economic impact in many countries. The first class of drugs approved were nucleoside reverse transcriptase inhibitors. A newer generation of reverse transcriptase inhibitors have become susceptible to drug resistant strains of HIV, and hence, alternatives are urgently needed. We have recently pioneered the use of Bayesian machine learning to generate models with public data to identify new compounds for testing against different disease targets. The current study has used the NIAID ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database for machine learning studies. We curated and cleaned data from HIV-1 wild-type cell-based and reverse transcriptase (RT) DNA polymerase inhibition assays. Compounds from this database with ≤1 μM HIV-1 RT DNA polymerase activity inhibition and cell-based HIV-1 inhibition are correlated (Pearson r = 0.44, n = 1137, p < 0.0001). Models were trained using multiple machine learning approaches (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, support vector classification, k-Nearest Neighbors, and deep neural networks as well as consensus approaches) and then their predictive abilities were compared. Our comparison of different machine learning methods demonstrated that support vector classification, deep learning, and a consensus were generally comparable and not significantly different from each other using 5-fold cross validation and using 24 training and test set combinations. This study demonstrates findings in line with our previous studies for various targets that training and testing with multiple data sets does not demonstrate a significant difference between support vector machine and deep neural networks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here