z-logo
open-access-imgOpen Access
Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences
Author(s) -
Bin Fu,
MingYang Kao,
Lusheng Wang
Publication year - 2009
Publication title -
siam journal on discrete mathematics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.843
H-Index - 66
eISSN - 1095-7146
pISSN - 0895-4801
DOI - 10.1137/080720401
Subject(s) - combinatorics , alphabet , probabilistic logic , mathematics , motif (music) , sigma , algorithm , probabilistic method , discrete mathematics , statistics , physics , philosophy , linguistics , quantum mechanics , acoustics
We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are $k$ background sequences, and each character in a background sequence is a random character from an alphabet $\Sigma$. A motif $G=g_1g_2\cdots g_m$ is a string of $m$ characters. Each background sequence is implanted into a probabilistically generated approximate copy of $G$. For an approximate copy $b_1b_2\cdots b_m$ of $G$, every character $b_i$ is probabilistically generated such that the probability for $b_i\neq g_i$ is at most $\alpha$. In this paper, we give the first analytical proof that multiple background sequences do help with finding subtle and faint motifs. This work is a theoretical approach with a rigorous probabilistic analysis. We develop an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large. Specifically, we prove that for $\alpha0$ such that if the length of the motif is at least $\delta_0\log n$, the alphabet has at least $t_0$ characters, and there are at least $\delta_1\log n_0$ input sequences, then in $O(n^3)$ time our algorithm finds the motif with probability at least $1-\frac{1}{2^x}$, where $n$ is the longest length of any input sequence and $n_0\leq n$ is an upper bound for the length of the motif.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom