Searching for RNA genes using base-composition statistics | Zendy

Peter Schattner | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Searching for RNA genes using base-composition statistics

Author(s) -

Peter Schattner

Publication year - 2002

Publication title -

nucleic acids research

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 9.008

H-Index - 537

eISSN - 1362-4954

pISSN - 0305-1048

DOI - 10.1093/nar/30.9.2076

Subject(s) - biology , rna , gene , base (topology) , composition (language) , genetics , computational biology , mathematical analysis , linguistics , philosophy , mathematics

The hypothesis that genomic regions rich in non-protein-coding RNAs (ncRNAs) can be identified using local variations in single-base and dinucleotide statistics has been investigated. (G+C)%, (G-C)% difference, (A-T)% difference and dinucleotide-frequency statistics were compared among seven classes of ncRNAs and three genomes. Significant variations were observed in (G+C)% and, in Methanococcus jannaschii, in the frequency of the dinucleotide 'CG'. Screening programs based on these two base-composition statistics were developed. With (G+C)% screening alone, a 1% fraction of the M.jannaschii genome containing all 44 known transfer RNAs, ribosomal RNAs and signal recognition particle RNAs could be identified. When (G+C)% combined with CG dinucleotide-frequency screening was used, 43 of the 44 known M.jannaschii structural ncRNAs were again identified, while the number of presumably false hits overlapping a known or putative protein-coding gene was reduced from 15 to 6. In addition, 19 candidate ncRNAs were identified including one with significant homology to several known archaeal RNaseP RNAs.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research