z-logo
open-access-imgOpen Access
WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences
Author(s) -
Graziano Pesole,
Nicola Prunella,
Sabino Liuni,
Marcella Attimonelli,
Cecilia Saccone
Publication year - 1992
Publication title -
nucleic acids research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.008
H-Index - 537
eISSN - 1362-4954
pISSN - 0305-1048
DOI - 10.1093/nar/20.11.2871
Subject(s) - biology , tata box , genetics , sequence (biology) , computational biology , set (abstract data type) , sequence motif , dna , nucleic acid sequence , dna sequencing , consensus sequence , promoter , sequence analysis , caat box , base sequence , gene , computer science , gene expression , programming language
We present here a fast and sensitive method designed to isolate short nucleotide sequences which have non-random statistical properties and may thus be biologically active. It is based on a first order Markov analysis and allows us to detect statistically significant sequence motifs from six to ten nucleotides long which are significantly shared (or avoided) in the sequences under investigation. This method has been tested on a set of 521 sequences extracted from the Eukaryotic Promoter Database (2). Our results demonstrate the accuracy and the efficiency of the method in that the sequence motifs which are known to act as eukaryotic promoters, such as the TATA-box and the CAAT-box, were clearly identified. In addition we have found other statistically significant motifs, the biological roles of which are yet to be clarified.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom