Analysis of protein features and machine learning algorithms for prediction of druggable proteins | Zendy

Sun Tanlin | Zendy; Lai Luhua | Zendy; Pei Jianfeng | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Analysis of protein features and machine learning algorithms for prediction of druggable proteins

Author(s) -

Sun Tanlin,

Lai Luhua,

Pei Jianfeng

Publication year - 2018

Publication title -

quantitative biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.707

H-Index - 15

eISSN - 2095-4697

pISSN - 2095-4689

DOI - 10.1007/s40484-018-0157-2

Subject(s) - druggability , computer science , machine learning , word2vec , support vector machine , artificial intelligence , pipeline (software) , protein function , drug discovery , training set , data mining , bioinformatics , biology , biochemistry , embedding , gene , programming language

Background Computational tools have been widely used in drug discovery process since they reduce the time and cost. Prediction of whether a protein is druggable is fundamental and crucial for drug research pipeline. Sequence based protein function prediction plays vital roles in many research areas. Training data, protein features selection and machine learning algorithms are three indispensable elements that drive the successfulness of the models. Methods In this study, we tested the performance of different combinations of protein features and machine learning algorithms, based on FDA‐approved small molecules’ targets, in druggable proteins prediction. We also enlarged the dataset to include the targets of small molecules that were in experiment or clinical investigation. Results We found that although the 146‐d vector used by Li et al . with neuron network achieved the best training accuracy of 91.10%, overlapped 3‐gram word2vec with logistic regression achieved best prediction accuracy on independent test set (89.55%) and on newly approved‐targets. Enlarged dataset with targets of small molecules in experiment and clinical investigation were trained. Unfortunately, the best training accuracy was only 75.48%. In addition, we applied our models to predict potential targets for references in future study. Conclusions Our study indicates the potential ability of word2vec in the prediction of druggable protein. And the training dataset of druggable protein should not be extended to targets that are lack of verification. The target prediction package could be found on https://github.com/pkumdl/target_prediction .

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research