Alignments of DNA and protein sequences containing frameshift errors
Author(s) -
Xiaojun Guan,
Edward C. Uberbacher
Publication year - 1996
Publication title -
computer applications in the biosciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1460-2059
pISSN - 0266-7061
DOI - 10.1093/bioinformatics/12.1.31
Subject(s) - frameshift mutation , computer science , algorithm , translation (biology) , indel , alignment free sequence analysis , sequence (biology) , frame (networking) , error detection and correction , dna sequencing , dynamic programming , word error rate , genetics , computational biology , dna , sequence alignment , biology , artificial intelligence , peptide sequence , mutation , gene , telecommunications , messenger rna , single nucleotide polymorphism , genotype
Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rates and often generate artefactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, standard algorithms using six-frame translation can miss important homologies because only subfragments of the correct translation are available in any given frame. We present a new algorithm which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the presence of a 7% frameshift error rate. Our algorithm uses dynamic programming, producing a guaranteed optimal alignment in the presence of frameshifts, and has a sensitivity equivalent to Smith-Waterman. The computational efficiency of the algorithm is O(nm) where n and m are the sizes of two sequences being compared. The algorithm does not rely on prior knowledge or heuristic rules and performs significantly better than any previously reported method.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom