z-logo
open-access-imgOpen Access
Transducers: an emerging probabilistic framework for modeling indels on trees
Author(s) -
Robert K. Bradley,
Ian Holmes
Publication year - 2007
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btm402
Subject(s) - probabilistic logic , indel , computer science , artificial intelligence , biology , biochemistry , genotype , single nucleotide polymorphism , gene
When it comes to dealing with indels, molecular evolution lags heuristic bioinformatics by decades. Sophisticated alignment algorithms have been widely known since the 1960s (and in bioinformatics since 1970), but we are still struggling to understand the corresponding phylogenetic models. Big ideas drive change: as we dream of reconstructing ancestral genotypes, it is ever clearer that indels cannot be ignored. We need to develop a robust understanding of probabilistic indel analysis and its relationship to alignment. We believe that a suitable foundation for such analysis already exists, where evolutionary models meet automata theory: the framework of finite-state transducers. This framework links Hidden Markov Models (Brown et al., 1993; Churchill, 1992), sequence alignment algorithms (Gotoh, 1982; Miller andMyers, 1988; Needleman and Wunsch, 1970; Smith and Waterman, 1981), finite-state machines and Chomsky grammars (Durbin et al., 1998) and molecular phylogenetics (Miklos et al., 2004; Thorne et al., 1991). In this letter we outline this framework, also describing a preliminary analysis of one recent algorithm— Indelign—for reconstructing ancestral indel histories (Kim and Sinha, 2007). Below, we briefly review the theory of transducers, concentrating not on the details of individual algorithms but rather on their unifying qualitative character. We show that Indelign, which reconstructs maximum-likelihood indel histories, is implicitly based on a transducer model. Thus, we can compare the computational complexity of Indelign to other transducerframed algorithms, with reference to alignment data from recent comparative genomics projects in Drosophila and Eutheria (ENCODE). Finally, we discuss several programs, algorithms and resources available for working with transducers, offering an outlook on areas of bioinformatics that may benefit from this theory. 1.1 Theory of finite-state transducers

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom