Single cell sequencing

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets

Hamidreza Chitsaz, Joyclyn L. Yee-Greenbaum, Glenn Tesler, Mary-Jane Lombardo, Christopher L. Dupont, Jonathan H. Badger, Mark Novotny, Douglas B. Rusch, Louise J. Fraser, Niall A. Gormley, Ole Schulz-Trieglaff, Geoffrey P. Smith, Dirk J. Evers, Pavel A. Pevzner, Roger S. Lasken.
Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.
Nature Biotechnology, vol. 29, no. 11, pp. 915-921 (2011), advance online publication, 18 Sep 2011 (doi:10.1038/nbt.1966).

Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.

Article: Publisher, PMID 21926975
Press coverage: UCSD news release (and Spanish news release), North County Times, GenomeWeb, EurekAlert!, PhysOrg, Galileo (Italy/Italian), Veja (Brazil/Portuguese)
Nature briefs: Nature Biotechnology: “In this issue (Oct. 13, 2011)”, Nature Methods: “Picking up the pieces (Oct. 28, 2011)”

Software

Single cell data sets

Single cell data sets may be downloaded from here.

Warning: These Illumina read data sets are extremely large. Most lanes are approximately 2-3 GB each, compressed. Please use a very high speed connection, and verify that you have the space for them before downloading.

Support

This work was partially supported by grants to R.S.L. from the National Human Genome Research Institute (NIH-2 R01 HG003647) and the Alfred P. Sloan Foundation (Sloan Foundation-2007-10-19), and by a grant to P.A.P. and G.T. from the National Institutes of Health (NIH grant 3P41RR024851-02S1).

Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the organizations or agencies that provided support for the project.