z-logo
open-access-imgOpen Access
Final Report: Sequence Landscapes, March 1, 1998 - June 30, 1999
Author(s) -
Gary D. Stormo,
Samuel Lévy,
Fugen Li
Publication year - 1999
Language(s) - English
Resource type - Reports
DOI - 10.2172/765523
Subject(s) - sequence (biology) , oligonucleotide , computer science , set (abstract data type) , genome , computational biology , class (philosophy) , exon , dna sequencing , gene , biology , algorithm , genetics , artificial intelligence , programming language
Sequence Landscapes are a graphical display of the word frequencies from a database (DB) for every word of every length in a target sequence (TS). If the TS and the DB are the same sequence this is a convenient method to detect all of the repeated sequences, of any length. However, we have been exploring the use of this approach for classifying regions of DNA sequence into functional domains, such as exons, introns, promoters, etc. Using DB from each class, the landscapes can be used to derive likelihoods that every region of the sequence belongs to each possible class. We think information can be combined with other types of information to help provide improved recognition algorithms. We are especially interested now in improving methods for determining promoter regions and transcription initiation sites. The information in the landscape can also be very useful for determining the best oligos to use on DNA chips. One of the criteria to be used in choosing the best oligos are those that are most specific for the gene being assayed. Therefore one would like to pick, for each, the oligo which has the most mismatches to the most similar other sites in the genome. This can be accomplished easily and efficiently with the landscape information.We return a list of candidate oligos which can then be ranked by other criteria, including hybridization energy and TM. To this end, we have been developing an algorithm to find the optimal set of oligos to be included on a hybridization chip. By optimal we mean those that are most specific for each gene and should minimize cross hybridization to every other gene. The method selects oligos on two primary criteria: those oligos for each that maximize the minimum number of mismatches for every other gene; those oligos for which the difference in hybridization energy for the correct matching site and the next best site is maximized. The first criterion is used first to get a candidate list for each gene, using fast approximate matching algorithms. Those candidates are then ranked by the second criterion. In addition a web site is being designed to store the information so that it can be accessed by anyone wishing to build an expression array chip for any of several organisms. We have started with E. coli, yeast and C. elegans, but will add more as the need arises. The program to determine the optimal set of oligos is available for use by anyone for their own sequences of interest

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here