CD-HIT: accelerated for clustering the next-generation sequencing data | Zendy

LiMin Fu | Zendy; Beifang Niu | Zendy; Zhengwei Zhu | Zendy; Sitao Wu | Zendy; Weizhong Li | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

CD-HIT: accelerated for clustering the next-generation sequencing data

Author(s) -

LiMin Fu,

Beifang Niu,

Zhengwei Zhu,

Sitao Wu,

Weizhong Li

Publication year - 2012

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/bts565

Subject(s) - speedup , computer science , cluster analysis , redundancy (engineering) , data mining , sequence (biology) , parallel computing , machine learning , biology , operating system , genetics

CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research