Premium
A novel fold recognition method using composite predicted secondary structures
Author(s) -
An Yuling,
Friesner Richard A.
Publication year - 2002
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.10145
Subject(s) - template , protein secondary structure , subsequence , protein data bank (rcsb pdb) , pattern recognition (psychology) , structural alignment , computer science , sss* , multiple sequence alignment , sequence (biology) , cluster analysis , artificial intelligence , sequence alignment , mathematics , peptide sequence , biology , genetics , mathematical analysis , biochemistry , gene , bounded function , programming language
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS‐based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue‐based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N‐terminal and C‐terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739–747). The program successfully locates homologues with high Z‐score and low root‐mean‐score deviation within the top 30–50 predictions in the overwhelming majority of cases. Proteins 2002;48:352–366. © 2002 Wiley‐Liss, Inc.