EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA | Zendy

Richard Mott | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA

Author(s) -

Richard Mott

Publication year - 1997

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/13.4.477

Subject(s) - genetics , biology , genome , genomic dna , gene prediction , intron , computational biology , gene

This note describes the program EST_GENOME for aligning spliced DNA to unspliced genomic DNA. It is written in ANSI C and has been tested under Digital OSF3.2. The spurce code and documentation are available from ftp:// www.sanger.ac.uky ftp/pub/ badger/est_genome.2.tar.Z. The prediction of genes in uncharacterized genomic DNA sequence is currently one of the main problems facing sequence annotators. Methods based on de novo prediction, e.g. searching for motifs like the splice-site consensus, or on statistical properties such as biased codon usage, etc. (Solovyev et al., 1994; Hebsgaard et al., 1996) have been only partially successful, and investigators have often found that the surest way of predicting a gene is by alignment with a homologous protein sequence (Birney et al., 1996; Gelfand et al., 1996; Huang and Zhang, 1996), or a spliced gene product [an expressed sequence tag (EST), mRNA or cDNA], particularly now that a large number of ESTs are available (Hillier et al., 1996). Standard alignment tools are not ideal for finding the correct alignment of a spliced product to genomic DNA, because of the large introns which can occur in the genomic sequence and because the programs ignore the conserved sequences found at donor/acceptor splice sites (intron/exon boundaries). In addition, very large genomic DNA sequences can be hard to align using quadratic-space dynamic programming because they require too much memory. The program EST_GENOME addresses this problem. It allows large introns, can recognize splice sites and uses limited memory. This combination of features makes a powerful and useful tool. EST_GENOME is used routinely at the Sanger Centre to help annotate human genomic sequence. As it is slow compared with search methods like BLAST (Altschul et al., 1990), we first screen genomic DNA against dbEST using BLASTN. Any matching ESTs are realigned using EST_GENOME. The algorithm uses a modification of Smith and Waterman (1981). The penalty structure used to score an alignment is as follows (defaults are in parentheses). Aligned bases score +match (1) or cost —mismatch (1) as appropriate. An indel in

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research