z-logo
Premium
Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry
Author(s) -
Karchin Rachel,
Cline Melissa,
MandelGutfreund Yael,
Karplus Kevin
Publication year - 2003
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.10369
Subject(s) - hidden markov model , protein secondary structure , pattern recognition (psychology) , artificial intelligence , computer science , structural alignment , test set , computational biology , generalization , set (abstract data type) , sequence alignment , biology , genetics , mathematics , peptide sequence , biochemistry , gene , programming language , mathematical analysis
An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold‐recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs that may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two‐track profile hidden Markov models ( HMM s). We did not rely on a simple helix‐strand‐coil definition of secondary structure, but experimented with a variety of local structure descriptions, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality. On a test set of 1298 nonhomologous proteins, HMM s incorporating a 3‐letter STRIDE alphabet improved fold recognition accuracy by 15% over amino‐acid‐only HMM s and 23% over PSI‐BLAST , measured by ROC‐65 numbers. We compared two‐track HMM s to amino‐acid‐only HMM s on a difficult alignment test set of 200 protein pairs (structurally similar with 3–24% sequence identity). HMM s with a 6‐letter STRIDE secondary track improved alignment quality by 62%, relative to DALI structural alignments, while HMM s with an STR track (an expanded DSSP alphabet that subdivides strands into six states) improved by 40% relative to CE . Proteins 2003;51:504–514. © 2003 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here