Premium
Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences
Author(s) -
Forêt Sylvain,
Burden Conrad J.,
Wilson Susan R.
Publication year - 2007
Publication title -
pamm
Language(s) - English
Resource type - Journals
ISSN - 1617-7061
DOI - 10.1002/pamm.200700202
Subject(s) - word (group theory) , statistic , sequence (biology) , word length , mathematics , characterization (materials science) , computer science , combinatorics , statistics , natural language processing , biology , physics , genetics , geometry , optics
The present work is concerned with a fast and accurate sequence comparison method: the count of the number of words of length k letters shared by two sequences, also known as the D2 statistic. We link recent theoretical advances in the characterization of D2 asymptotic distributions with applications to biological sequences. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)