z-logo
Premium
Evaluation of the performance of various machine learning methods on the discrimination of the active compounds
Author(s) -
Shamsara Jamal
Publication year - 2021
Publication title -
chemical biology and drug design
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.59
H-Index - 77
eISSN - 1747-0285
pISSN - 1747-0277
DOI - 10.1111/cbdd.13819
Subject(s) - support vector machine , artificial intelligence , random forest , hyperparameter , machine learning , computer science , feature selection , matthews correlation coefficient , pattern recognition (psychology) , naive bayes classifier , classifier (uml) , cross validation , bayesian probability , ensemble learning , feature (linguistics) , philosophy , linguistics
Machine learning (ML) method performances, including deep learning (DL) on a diverse set with or without feature selection (FS), were evaluated. The superior performance of DL on small sets has not been approved previously. On the other hand, the available sets for the newly identified targets usually are limited in terms of size. It was explored whether the FS, hyperparameters search, and using ensemble model are able to improve the ML and DL performance on the small sets. The QSAR classifier models were developed using K‐nearest (KN) neighbors, DL, random forest (RF), naïve Bayesian (NB) classification, support vector machine (SVM), and logistic regression (LR). Generally, the best individual performers were DL and SVM. The LR had a similar performance to the DL and SVM on the small subsets. The nested cross‐validation method was able to include different feature vectors in combination with different ML methods to generate an ensemble model for the datasets with similar performance to the best performers. The general performance for the baseline NB model was Matthews correlation coefficient = 0.356, and it was improved to around 0.66 and 0.63 by NB assisted FS with subsequent SVM/DL classification and an ensemble model, respectively.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here