Premium
Remote homolog detection using local sequence–structure correlations
Author(s) -
Hou Yuna,
Hsu Wynne,
Lee Mong Li,
Bystroff Christopher
Publication year - 2004
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.20221
Subject(s) - support vector machine , pattern recognition (psychology) , hidden markov model , feature vector , artificial intelligence , homology (biology) , sequence (biology) , computer science , structural classification of proteins database , similarity (geometry) , markov chain , computational biology , structural alignment , feature (linguistics) , sequence alignment , mathematics , protein structure , peptide sequence , genetics , biology , amino acid , gene , machine learning , linguistics , philosophy , image (mathematics) , biochemistry
Remote homology detection refers to the detection of structural homology in proteins when there is little or no sequence similarity. In this article, we present a remote homolog detection method called SVM‐HMMSTR that overcomes the reliance on detectable sequence similarity by transforming the sequences into strings of hidden Markov states that represent local folding motif patterns. These state strings are transformed into fixed‐dimension feature vectors for input to a support vector machine. Two sets of features are defined: an order‐independent feature set that captures the amino acid and local structure composition; and an order‐dependent feature set that captures the sequential ordering of the local structures. Tests using the Structural Classification of Proteins (SCOP) 1.53 data set show that the SVM‐HMMSTR gives a significant improvement over several current methods. Proteins 2004. © 2004 Wiley‐Liss, Inc.