Open Access
Computer analysis of nucleic acid regulatory sequences.
Author(s) -
Laurence Jay Korn,
Cary Queen,
Mark N. Wegman
Publication year - 1977
Publication title -
proceedings of the national academy of sciences of the united states of america
Language(s) - English
Resource type - Journals
eISSN - 1091-6490
pISSN - 0027-8424
DOI - 10.1073/pnas.74.10.4401
Subject(s) - nucleic acid , biology , oligonucleotide , nucleotide , sequence (biology) , computational biology , genetics , dna , nucleic acid sequence , rna , transcription (linguistics) , consensus sequence , base sequence , gene , linguistics , philosophy
We describe a computer program designed to facilitate the analysis of nucleic acid sequences. The program can search several nucleic acid sequences for oligonucleotides common to all of them. It can examine a DNA or RNA sequence for two kinds of homologous regions--repetitions and dyad symmetries. The homologies need not be perfect: mismatches and "looping out" of nucleotides are allowed. The program also finds (A+T)- and (G+C)-rich regions, locates restriction enzyme recognition sites, determines the distribution of di- and trinucleotides, and performs various other functions. We include two representative applications of the program. All published prokaryotic transcription termination sequences (June 1977) were found to share the following features: (i) a string of at least five T residues, (ii) the sequence CGGGC or a close analog immediately preceding the T cluster, (iii) a region of strong dyad symmetry preceding the Ts and including the CGGGC sequence. A sequence of 221 nucleotides consisting of the Escherichia coli trp promoter, operator, and leader was found to contain two strong dyad symmetries. These homologies both occur at known regulatory sites; no comparable homologies occur in regions without regulatory significance.