CD-HIT: accelerated for clustering the next-generation sequencing data

Overview
TitleCD-HIT: accelerated for clustering the next-generation sequencing data
AuthorsFu L, Niu B, Zhu Z, Wu S, Li W
Pubmed ID23060610
Journal NameBioinformatics (Oxford, England)
Volume28
Issue23
Year2012
Page(s)3150-2
CitationFu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics (Oxford, England). 2012 Dec 01; 28(23):3150-2.

Abstract

SUMMARY
CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.

AVAILABILITY
http://cd-hit.org.

CONTACT
liwz@sdsc.edu

SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.

Properties
Additional details for this publication include:
Property NameValue
eISSN1367-4811
Publication Date2012 Dec 01
Journal AbbreviationBioinformatics
DOI10.1093/bioinformatics/bts565
Elocation10.1093/bioinformatics/bts565
Publication ModelPrint-Electronic
ISSN1367-4811
Language Abbreng
Publication TypeJournal Article
Journal CountryEngland
LanguageEnglish
Publication TypeResearch Support, N.I.H., Extramural
Cross References
This publication is also available in the following databases:
DatabaseAccession
PMID: PubmedPMID:23060610