Premium
A computational approach for predicting drug–target interactions from protein sequence and drug substructure fingerprint information
Author(s) -
Li Yang,
Liu Xiaozhang,
You ZhuHong,
Li LiPing,
Guo JianXin,
Wang Zheng
Publication year - 2021
Publication title -
international journal of intelligent systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.291
H-Index - 87
eISSN - 1098-111X
pISSN - 0884-8173
DOI - 10.1002/int.22332
Subject(s) - computer science , artificial intelligence , drug target , data mining , support vector machine , feature (linguistics) , sequence (biology) , protein structure prediction , pattern recognition (psychology) , machine learning , protein structure , medicine , linguistics , philosophy , biology , pharmacology , genetics , physics , nuclear magnetic resonance
Abstract Identification of drug–target interactions (DTIs) is critical for discovering potential target protein candidates for new drugs. However, traditional experimental methods have limitations in discovering DTIs. They are time‐consuming, tedious, and expensive, and often suffer from high false‐positive rates and false‐negative rates. Therefore, using computational methods to predict DTIs has received extensive attention from many researchers in recent years. To address this issue, in this paper, an effective prediction model is presented which is based on the information of drug molecular structure data and protein sequence data. It performs prediction with the following procedures. First, we transform the sequences of each target into a position‐specific scoring matrix (PSSM), such that the features can retain biological evolutionary information. We then use a feature vector of molecular substructure fingerprints to describe the chemical structure information of the drug compounds. Second, the Legendre moments algorithm is used to extract new features from the PSSM. Finally, a classification algorithm called rotation forest is used to perform prediction, we tested its prediction performance on four golden standard data sets: enzymes, G‐protein‐coupled receptors, ion channels, and nuclear receptors. As a result, the proposed method achieves average accuracies of 0.9026, 0.8260, 0.8703, and 0.7444 on these four data sets using five‐fold cross‐validation. We also compare the proposed method with the support vector machine and other existing approaches. The proposed model is proved to be superior to comparative methods, showing that it is feasible, effective, and robust for predicting potential DTI.