Premium
Descriptor‐based protein remote homology identification
Author(s) -
Zhang Ziding,
Kochhar Sunil,
Grigorov Martin G.
Publication year - 2005
Publication title -
protein science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.353
H-Index - 175
eISSN - 1469-896X
pISSN - 0961-8368
DOI - 10.1110/ps.041035505
Subject(s) - support vector machine , artificial intelligence , pattern recognition (psychology) , cosine similarity , computer science , homology (biology) , sequence alignment , similarity (geometry) , structural classification of proteins database , set (abstract data type) , identification (biology) , computational biology , protein structure , peptide sequence , image (mathematics) , biology , genetics , biochemistry , botany , gene , programming language
Abstract Here, we report a novel protein sequence descriptor‐based remote homology identification method, able to infer fold relationships without the explicit knowledge of structure. In a first phase, we have individually benchmarked 13 different descriptor types in fold identification experiments in a highly diverse set of protein sequences. The relevant descriptors were related to the fold class membership by using simple similarity measures in the descriptor spaces, such as the cosine angle. Our results revealed that the three best‐performing sets of descriptors were the sequence‐alignment‐based descriptor using PSI‐BLAST e ‐values, the descriptors based on the alignment of secondary structural elements (SSEA), and the descriptors based on the occurrence of PROSITE functional motifs. In a second phase, the three top‐performing descriptors were combined to obtain a final method with improved performance, which we named DescFold. Class membership was predicted by Support Vector Machine (SVM) learning. In comparison with the individual PSI‐BLAST‐based descriptor, the rate of remote homology identification increased from 33.7% to 46.3%. We found out that the composite set of descriptors was able to identify the true remote homolog for nearly every sixth sequence at the 95% confidence level, or some 10% more than a single PSI‐BLAST search. We have benchmarked the DescFold method against several other state‐of‐the‐art fold recognition algorithms for the 172 LiveBench‐8 targets, and we concluded that it was able to add value to the existing techniques by providing a confident hit for at least 10% of the sequences not identifiable by the previously known methods.