z-logo
open-access-imgOpen Access
Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
Author(s) -
Oliver Schwengers,
Lukas Jelonek,
Marius Alfred Dieckmann,
Sebastian Beyvers,
Jochen Blom,
Alexander Goesmann
Publication year - 2021
Publication title -
microbial genomics
Language(s) - English
Resource type - Journals
ISSN - 2057-5858
DOI - 10.1099/mgen.0.000685
Subject(s) - annotation , computer science , json , software , refseq , genome , identification (biology) , genome project , metadata , python (programming language) , metagenomics , workflow , ensembl , bacterial genome size , genbank , database , information retrieval , computational biology , biology , world wide web , genomics , programming language , artificial intelligence , genetics , botany , gene
Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom