z-logo
Premium
From analysis of protein structural alignments toward a novel approach to align protein sequences
Author(s) -
Sunyaev Shamil R.,
Bogopolsky Gennady A.,
Oleynikova Natalia V.,
Vlasov Peter K.,
Finkelstein Alexei V.,
Roytberg Mikhail A.
Publication year - 2003
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.10503
Subject(s) - smith–waterman algorithm , multiple sequence alignment , structural alignment , protein structure prediction , sequence alignment , computer science , gold standard (test) , algorithm , homology modeling , casp , protein function prediction , similarity (geometry) , alignment free sequence analysis , protein structure , protein function , pattern recognition (psychology) , artificial intelligence , mathematics , biology , peptide sequence , genetics , statistics , biochemistry , gene , image (mathematics) , enzyme
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology‐based modeling of three‐dimensional (3D)‐structure. We investigated correspondence between “gold standard” alignments of 3D protein structures and the sequence alignments produced by the Smith–Waterman algorithm, currently the most sensitive method for pair‐wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith–Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, “islands” between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith–Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith–Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency. Proteins 2003;9999:000–000. © 2003 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here