
Systematic prediction of functionally linked genes in bacterial and archaeal genomes
Author(s) -
Sergey Shmakov,
Guilhem Faure,
Kira S. Makarova,
Yuri I. Wolf,
Konstantin Severinov,
Eugene V. Koonin
Publication year - 2019
Publication title -
nature protocols
Language(s) - English
Resource type - Journals
eISSN - 1754-2189
pISSN - 1750-2799
DOI - 10.1038/s41596-019-0211-1
Subject(s) - genome , gene , biology , computational biology , genetics , operon , gene prediction , locus (genetics) , genomics , escherichia coli
Functionally linked genes in bacterial and archaeal genomes are often organized into operons. However, the composition and architecture of operons are highly variable and frequently differ even among closely related genomes. Therefore, to efficiently extract reliable functional predictions for uncharacterized genes from comparative analyses of the rapidly growing genomic databases, dedicated computational approaches are required. We developed a protocol to systematically and automatically identify genes that are likely to be functionally associated with a 'bait' gene or locus by using relevance metrics. Given a set of bait loci and a genomic database defined by the user, this protocol compares the genomic neighborhoods of the baits to identify genes that are likely to be functionally linked to the baits by calculating the abundance of a given gene within and outside the bait neighborhoods and the distance to the bait. We exemplify the performance of the protocol with three test cases, namely, genes linked to CRISPR-Cas systems using the 'CRISPRicity' metric, genes associated with archaeal proviruses and genes linked to Argonaute genes in halobacteria. The protocol can be run by users with basic computational skills. The computational cost depends on the sizes of the genomic dataset and the list of reference loci and can vary from one CPU-hour to hundreds of hours on a supercomputer.