Computer Survey for Likely Genes in the One Megabase Contiguous Genomic Sequence Data of Synechocystis sp. Strain PCC6803
Author(s) -
Makoto Hirosawa,
Takakazu Kaneko,
Satoshi Tabata,
James D. McIninch,
William S. Hayes,
Mark Borodovsky,
Katsumi Isono
Publication year - 1995
Publication title -
dna research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.647
H-Index - 98
eISSN - 1756-1663
pISSN - 1340-2838
DOI - 10.1093/dnares/2.6.239
Subject(s) - orfs , biology , gene , genetics , genome , open reading frame , phylogenetic tree , whole genome sequencing , coding region , computational biology , peptide sequence
Using the computer program GeneMark, the open reading frames (ORFs) previously assigned within the one megabase sequence data of the genome of the cyanobacterium, Synechocystis sp. strain PCC6803 (Kaneko et al., DNA Res. 2: 153-166, 1995), were re-examined. Matrices required by GeneMark for its statistical calculation were generated and modified by running a script termed GeneMark-Genesis that performed recursive application of GeneMark against the Synechocystis data and evaluated the probability scores for optimization. Based on the matrices thus generated, 752 of the 818 previously assigned ORFs (92%) were supported by GeneMark as likely coding sequences, of which 26 were predicted to start at more internal positions than previously assigned. In addition, 50 ORFs were newly identified as likely coding sequences, most of them being shorter than 300 bp. Thus, the procedure was proven to be very powerful to locate likely coding regions within the genomic sequence data of Synechocystis without having prior information concerning their similarity to the genes of other organisms. However, GeneMark did not predict 66 previously assigned ORFs as likely genes: 14 of them showed significant degrees of similarity to known genes and 10 others were found within IS-like elements. It seems that these genes, many of which appear to be exogenous origin, escaped detection by GeneMark as in the case of "class 3 (horizontally transferred) genes" of E. coli, which in turn suggests that genes of different phylogenetic origins might also be detected as such by modifying the matrices.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom