Estimating the Repeat Structure and Length of DNA Sequences Using ℓ-Tuples
Author(s) -
Xiaoman Li,
Michael S. Waterman
Publication year - 2003
Publication title -
genome research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.556
H-Index - 297
eISSN - 1549-5469
pISSN - 1088-9051
DOI - 10.1101/gr.1251803
Subject(s) - shotgun sequencing , biology , genome , shotgun , hybrid genome assembly , k mer , tuple , computational biology , genetics , sequence assembly , dna sequencing , dna , gene , mathematics , discrete mathematics , gene expression , transcriptome
In shotgun sequencing projects, the genome or BAC length is not always known. We approach estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project. Moreover, we can find the consensus for repeat families before assembly. Our methods are based on the l-tuple content of the reads.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom