z-logo
Premium
A structurally‐defined gap function for pairwise sequence alignment of proteins in the twilight zone
Author(s) -
Chapman Barbara S.
Publication year - 2006
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.20.5.a928-b
Subject(s) - pairwise comparison , structural alignment , computer science , multiple sequence alignment , sequence alignment , alignment free sequence analysis , smith–waterman algorithm , algorithm , affine transformation , protein structure , theoretical computer science , artificial intelligence , peptide sequence , mathematics , biology , genetics , biochemistry , pure mathematics , gene
Protein classification and annotation from sequence data alone require methods able to recognize remote evolutionary relationships and to produce biologically correct alignments for building three‐dimensional models. Algorithms proficient at detecting remote homology tend not to align accurately enough for modeling. Good alignment algorithms lack the speed and sensitivity required for comprehensive database searching. The problem is most acute in the twilight zone of sequence identity (15–35% matched residues). Most pairwise alignment algorithms use dynamic programming with affine gap functions which impose arbitrary gap initiation and extension penalties. Yet, the insertion or deletion points and composition of the residues inserted or deleted during evolution cannot be modeled by arbitrary penalties, which ignore the structural context of main‐chain conformation, solvent accessibility and hydrogen bonding. This work proposes a gap scoring function based on the propensity of gaps to open outside of helices and strands, with an extension rule adjusted for the propensity of certain amino acid residues to predominate in turn and coil contexts. Potential gap residues have been identified using STRIDE ( Frishman & Argos 1995 Proteins 23 : 566 ) in high‐resolution structures sharing less than 20% pairwise sequence identity, culled from the Protein Data Bank by the PISCES server ( Wang & Dunbrack 2003 Bioinformatics 19 : 1589 ). Performance of the new function is assessed in JAligner, an open source implementation of the Smith‐Waterman‐Gotoh alignment algorithm.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here