z-logo
open-access-imgOpen Access
MMseqs software suite for fast and deep clustering and searching of large protein sequence sets
Author(s) -
Maria Hauser,
Martin Steinegger,
Johannes Söding
Publication year - 2016
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btw006
Subject(s) - uniprot , computer science , cluster analysis , sequence database , software suite , data mining , software , metagenomics , smith–waterman algorithm , sequence alignment , database , artificial intelligence , peptide sequence , biology , biochemistry , gene , programming language
Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundancy of sequence databases by similarity clustering improves speed and sensitivity of iterative searches. But existing tools cannot efficiently cluster databases of the size of UniProt to 50% maximum pairwise sequence identity or below. Furthermore, in metagenomics experiments typically large fractions of reads cannot be matched to any known sequence anymore because searching with sensitive but relatively slow tools (e.g. BLAST or HMMER3) through comprehensive databases such as UniProt is becoming too costly.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom