Premium
Template‐based recognition of protein fold within the midnight and twilight zones of protein sequence similarity
Author(s) -
Pirun Mono,
Babnigg Gyorgy,
Stevens Fred J.
Publication year - 2005
Publication title -
journal of molecular recognition
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.401
H-Index - 79
eISSN - 1099-1352
pISSN - 0952-3499
DOI - 10.1002/jmr.728
Subject(s) - template , fold (higher order function) , computational biology , protein sequencing , sequence database , sequence alignment , sequence (biology) , biology , multiple sequence alignment , peptide sequence , computer science , pattern recognition (psychology) , artificial intelligence , genetics , gene , programming language
Most homologous pairs of proteins have no significant sequence similarity to each other and are not identified by direct sequence comparison or profile‐based strategies. However, multiple sequence alignments of low similarity homologues typically reveal a limited number of positions that are well conserved despite diversity of function. It may be inferred that conservation at most of these positions is the result of the importance of the contribution of these amino acids to the folding and stability of the protein. As such, these amino acids and their relative positions may define a structural signature. We demonstrate that extraction of this fold template provides the basis for the sequence database to be searched for patterns consistent with the fold, enabling identification of homologs that are not recognized by global sequence analysis. The fold template method was developed to address the need for a tool that could comprehensively search the midnight and twilight zones of protein sequence similarity without reliance on global statistical significance. Manual implementations of the fold template method were performed on three folds—immunoglobulin, c‐lectin and TIM barrel. Following proof of concept of the template method, an automated version of the approach was developed. This automated fold template method was used to develop fold templates for 10 of the more populated folds in the SCOP database. The fold template method developed three‐dimensional structural motifs or signatures that were able to return a diverse collection of proteins, while maintaining a low false positive rate. Although the results of the manual fold template method were more comprehensive than the automated fold template method, the diversity of the results from the automated fold template method surpassed those of current methods that rely on statistical significance to infer evolutionary relationships among divergent proteins. Copyright © 2004 John Wiley & Sons, Ltd.