On Counting Tandem Duplication Trees
Author(s) -
Jialiang Yang
Publication year - 2004
Publication title -
molecular biology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.637
H-Index - 218
eISSN - 1537-1719
pISSN - 0737-4038
DOI - 10.1093/molbev/msh115
Subject(s) - gene duplication , tandem exon duplication , biology , segmental duplication , locus (genetics) , genome , dna sequencing , genetics , tandem repeat , evolutionary biology , dna , gene , gene family
Large genomes are full of repeated DNA sequences.It was estimated that over half of the human DNA consistsof repeated sequences (Baltimore 2001; Eichler 2001;Leem et al. 2002). Tandem duplication is one of theimportant evolutionary mechanisms for producing re-peated DNA sequences, in which the copies that may ormay not contain genes are adjacent along the genome.Fitch (1977) rst observed that tandem duplicationhistories are much more constrained than speciationhistories and proposed to model them assuming thatunequal crossover is the biological mechanism from whichthey originate. The corresponding trees are now calledtandem duplication trees, the term tandem sometimesbeing omitted for the sake of conciseness. With more andmore genomic sequences becoming known, inferringtandem duplication history has again redrawn researchers’attention (Benson and Dong 1999; Tang, Waterman, andYooseph 2002; Elemento, Gascuel, and Lefranc 2002;Zhang et al. 2003).The aim of this article is, rst, to present a simplerecurrence formula for the number of rooted duplicationtrees based on the recurrence formula proved in Gascuel etal. (2003). We also give a simple non-counting proof ofthe fact that the number of rooted duplication trees forn segments is exactly twice the number of unrootedduplication trees for n segments. Notice that this fact wasproved based on a counting argument in Gascuel et al.(2003).Duplication Tree ModelAssume n sequence segments f1, 2, ..., ng wereformed from a locus by tandem duplication. Then, assumethat the locus had grown from a single copy througha series of tandem duplications. Each duplication replaceda stretch of DNA sequences containing several repeatswith two identical and adjacent copies of itself. If thestretch contains k repeats, the duplication is called a k-duplication.A rooted duplication tree M for tandemly repeatedsegments f1, 2, ..., ng is a rooted binary tree that containsblocks as shown in gure 1. A node in M representsa repeat. Obviously, the root represents the original copy atthe locus and leaves the given segments.A block in M represents a duplication event. Eachnon-leaf node appears in a unique block; no node is anancestor of another in a block. If the block corresponds toa k-duplication, it has k nodes u
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom