Premium
Generalized affine gap costs for protein sequence alignment
Author(s) -
Altschul Stephen F.
Publication year - 1998
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/(sici)1097-0134(19980701)32:1<88::aid-prot10>3.0.co;2-j
Subject(s) - affine transformation , generalization , sequence (biology) , computer science , multiple sequence alignment , mathematics , algorithm , distribution (mathematics) , sequence alignment , mathematical optimization , peptide sequence , biology , geometry , mathematical analysis , genetics , gene
Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length‐dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment. Proteins 32:88–96, 1998. Published 1998 Wiley‐Liss, Inc. This article is a US government work and, as such, is in the public domain in the United States of America.