Premium
A compound Poisson model for word occurrences in DNA sequences
Author(s) -
Robin Stéphane
Publication year - 2002
Publication title -
journal of the royal statistical society: series c (applied statistics)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.205
H-Index - 72
eISSN - 1467-9876
pISSN - 0035-9254
DOI - 10.1111/1467-9876.00279
Subject(s) - poisson distribution , markov chain , sequence (biology) , word (group theory) , set (abstract data type) , poisson process , compound poisson process , computer science , markov model , hidden markov model , markov process , mathematics , algorithm , statistical physics , artificial intelligence , statistics , genetics , physics , biology , geometry , programming language
Summary. We present a compound Poisson model describing the occurrence process of a set of words in a random sequence of letters. The model takes into account the frequency of the words and their overlapping structure. The model is compared with a Markov chain model in terms of fit and parsimony. Special attention is given to the detection of poor or rich regions. Several applications of the model are presented and a combination of the Markov and compound Poisson models is proposed.