Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy | Zendy

Yuval Bussi | Zendy; Ruti Kapon | Zendy; Ziv Reich | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy

Author(s) -

Yuval Bussi,

Ruti Kapon,

Ziv Reich

Publication year - 2021

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0258693

Subject(s) - genome , jaccard index , comparative genomics , phylogenetic tree , biology , genomics , computational biology , pairwise comparison , phylum , evolutionary biology , cluster analysis , genetics , computer science , gene , artificial intelligence

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k -mers, are particularly powerful. We evaluated the utility of varying k -mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research