Bigscale: An analytical framework for big-scale single-cell data

Giovanni Iacono, Elisabetta Mereu, Amy Guillaumet-Adkins, Roser Corominas, Ivon Cusco, Gustavo Rodríguez-Esteban, Marta Gut, Luis Perez-Jurado, Ivo Gut, Holger Heyn

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin (Reln)-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets.

LanguageEnglish
Pages878-890
Number of pages13
JournalGenome Research
Volume28
Issue number6
DOIs
Publication statusPublished - 1 Jun 2018

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Iacono, G., Mereu, E., Guillaumet-Adkins, A., Corominas, R., Cusco, I., Rodríguez-Esteban, G., ... Heyn, H. (2018). Bigscale: An analytical framework for big-scale single-cell data. Genome Research, 28(6), 878-890. https://doi.org/10.1101/gr.230771.117
Iacono, Giovanni ; Mereu, Elisabetta ; Guillaumet-Adkins, Amy ; Corominas, Roser ; Cusco, Ivon ; Rodríguez-Esteban, Gustavo ; Gut, Marta ; Perez-Jurado, Luis ; Gut, Ivo ; Heyn, Holger. / Bigscale : An analytical framework for big-scale single-cell data. In: Genome Research. 2018 ; Vol. 28, No. 6. pp. 878-890.
@article{fcb6ee54faa8411eaed227a4a5deb923,
title = "Bigscale: An analytical framework for big-scale single-cell data",
abstract = "Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin (Reln)-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets.",
author = "Giovanni Iacono and Elisabetta Mereu and Amy Guillaumet-Adkins and Roser Corominas and Ivon Cusco and Gustavo Rodr{\'i}guez-Esteban and Marta Gut and Luis Perez-Jurado and Ivo Gut and Holger Heyn",
year = "2018",
month = "6",
day = "1",
doi = "10.1101/gr.230771.117",
language = "English",
volume = "28",
pages = "878--890",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "6",

}

Iacono, G, Mereu, E, Guillaumet-Adkins, A, Corominas, R, Cusco, I, Rodríguez-Esteban, G, Gut, M, Perez-Jurado, L, Gut, I & Heyn, H 2018, 'Bigscale: An analytical framework for big-scale single-cell data', Genome Research, vol. 28, no. 6, pp. 878-890. https://doi.org/10.1101/gr.230771.117

Bigscale : An analytical framework for big-scale single-cell data. / Iacono, Giovanni; Mereu, Elisabetta; Guillaumet-Adkins, Amy; Corominas, Roser; Cusco, Ivon; Rodríguez-Esteban, Gustavo; Gut, Marta; Perez-Jurado, Luis; Gut, Ivo; Heyn, Holger.

In: Genome Research, Vol. 28, No. 6, 01.06.2018, p. 878-890.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Bigscale

T2 - Genome Research

AU - Iacono, Giovanni

AU - Mereu, Elisabetta

AU - Guillaumet-Adkins, Amy

AU - Corominas, Roser

AU - Cusco, Ivon

AU - Rodríguez-Esteban, Gustavo

AU - Gut, Marta

AU - Perez-Jurado, Luis

AU - Gut, Ivo

AU - Heyn, Holger

PY - 2018/6/1

Y1 - 2018/6/1

N2 - Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin (Reln)-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets.

AB - Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin (Reln)-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets.

UR - http://www.scopus.com/inward/record.url?scp=85048129620&partnerID=8YFLogxK

U2 - 10.1101/gr.230771.117

DO - 10.1101/gr.230771.117

M3 - Article

VL - 28

SP - 878

EP - 890

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 6

ER -

Iacono G, Mereu E, Guillaumet-Adkins A, Corominas R, Cusco I, Rodríguez-Esteban G et al. Bigscale: An analytical framework for big-scale single-cell data. Genome Research. 2018 Jun 1;28(6):878-890. https://doi.org/10.1101/gr.230771.117