Alignments of DNA and protein sequences containing frameshift errors | Zendy

Xiaojun Guan | Zendy; Edward C. Uberbacher | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Alignments of DNA and protein sequences containing frameshift errors

Author(s) -

Xiaojun Guan,

Edward C. Uberbacher

Publication year - 1996

Publication title -

computer applications in the biosciences

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1460-2059

pISSN - 0266-7061

DOI - 10.1093/bioinformatics/12.1.31

Subject(s) - frameshift mutation , computer science , algorithm , translation (biology) , indel , alignment free sequence analysis , sequence (biology) , frame (networking) , error detection and correction , dna sequencing , dynamic programming , word error rate , genetics , computational biology , dna , sequence alignment , biology , artificial intelligence , peptide sequence , mutation , gene , telecommunications , messenger rna , single nucleotide polymorphism , genotype

Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rates and often generate artefactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, standard algorithms using six-frame translation can miss important homologies because only subfragments of the correct translation are available in any given frame. We present a new algorithm which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the presence of a 7% frameshift error rate. Our algorithm uses dynamic programming, producing a guaranteed optimal alignment in the presence of frameshifts, and has a sensitivity equivalent to Smith-Waterman. The computational efficiency of the algorithm is O(nm) where n and m are the sizes of two sequences being compared. The algorithm does not rely on prior knowledge or heuristic rules and performs significantly better than any previously reported method.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research