poss_dataset_ids = dataset_info
.map(d => d.dataset_id)
.filter(d => results.map(r => r.dataset_id).includes(d))
poss_method_ids = method_info
.map(d => d.method_id)
.filter(d => results.map(r => r.method_id).includes(d))
poss_metric_ids = metric_info
.map(d => d.metric_id)
.filter(d => results.map(r => Object.keys(r.scaled_scores)).flat().includes(d))
Denoising
Removing noise in sparse single-cell RNA-sequencing count data
17 datasets · 4 methods · 2 control methods · 2 metrics
Task info Method info Metric info Dataset info Results
Single-cell RNA-Seq protocols only detect a fraction of the mRNA molecules present in each cell. As a result, the measurements (UMI counts) observed for each gene and each cell are associated with generally high levels of technical noise (Grün et al., 2014). Denoising describes the task of estimating the true expression level of each gene in each cell. In the single-cell literature, this task is also referred to as imputation, a term which is typically used for missing data problems in statistics. Similar to the use of the terms “dropout”, “missing data”, and “technical zeros”, this terminology can create confusion about the underlying measurement process (Sarkar and Stephens, 2020).
A key challenge in evaluating denoising methods is the general lack of a ground truth. A recent benchmark study (Hou et al., 2020) relied on flow-sorted datasets, mixture control experiments (Tian et al., 2019), and comparisons with bulk RNA-Seq data. Since each of these approaches suffers from specific limitations, it is difficult to combine these different approaches into a single quantitative measure of denoising accuracy. Here, we instead rely on an approach termed molecular cross-validation (MCV), which was specifically developed to quantify denoising accuracy in the absence of a ground truth (Batson et al., 2019). In MCV, the observed molecules in a given scRNA-Seq dataset are first partitioned between a training and a test dataset. Next, a denoising method is applied to the training dataset. Finally, denoising accuracy is measured by comparing the result to the test dataset. The authors show that both in theory and in practice, the measured denoising accuracy is representative of the accuracy that would be obtained on a ground truth dataset.
Summary
Display settings
Filter datasets
Filter methods
Filter metrics
Results
Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets.
Dataset info
Show
Human Lung Cell Atlas
An integrated cell atlas of the human lung in health and disease (core) (Sikkema et al. 2023).
Mouse Brain Atlas
Adult mouse primary visual cortex (Tasic et al. 2016).
Mouse Pancreatic Islet Atlas
Mouse pancreatic islet scRNA-seq atlas across sexes, ages, and stress conditions including diabetes (Hrovatin et al. 2023).
CeNGEN
Complete Gene Expression Map of an Entire Nervous System (Hammarlund et al. 2018).
Triple-Negative Breast Cancer
1535 cells from six fresh triple-negative breast cancer tumors (Wu et al. 2021).
Human immune
Human immune cells dataset from the scIB benchmarks (Luecken et al. 2021).
GTEX v9
Single-nucleus cross-tissue molecular reference maps to decipher disease gene function (Eraslan et al. 2022).
Diabetic Kidney Disease
Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression (Wilson et al. 2022).
Tabula Sapiens
A multiple-organ, single-cell transcriptomic atlas of humans (Jones et al. 2022).
Immune Cell Atlas
Cross-tissue immune cell analysis reveals tissue-specific features in humans (Domínguez Conde et al. 2022).
Zebrafish embryonic cells
Single-cell mRNA sequencing of zebrafish embryonic cells (D. E. Wagner et al. 2018).
5k PBMCs
5k peripheral blood mononuclear cells from a healthy donor (10x Genomics 2019).
Mouse HSPC
Haematopoeitic stem and progenitor cells from mouse bone marrow (Nestorowa et al. 2016).
HypoMap
A unified single cell gene expression atlas of the murine hypothalamus (Steuernagel et al. 2022).
1k PBMCs
1k peripheral blood mononuclear cells from a healthy donor (10x Genomics 2018).
Mouse myeloid
Myeloid lineage differentiation from mouse blood (Olsson et al. 2016).
Human pancreas
Human pancreas cells dataset from the scIB benchmarks (Luecken et al. 2021).
Method info
Show
ALRA
ALRA imputes missing values in scRNA-seq data by computing rank-k approximation, thresholding by gene, and rescaling the matrix (Linderman, Zhao, and Kluger 2018). Links: Docs.
DCA
A deep autoencoder with ZINB loss function to address the dropout effect in count data (Eraslan et al. 2019). Links: Docs.
KNN Smoothing
Iterative kNN-smoothing denoises scRNA-seq data by iteratively increasing the size of neighbourhoods for smoothing until a maximum k value is reached (F. Wagner, Yan, and Yanai 2018). Links: Docs.
MAGIC
MAGIC imputes and denoises scRNA-seq data that is noisy or dropout-prone (Dijk et al. 2018). Links: Docs.
Control method info
Show
No Denoising
negative control by copying train counts
Perfect Denoising
Positive control by copying the test counts
Metric info
Show
Mean-squared error
The mean squared error between the denoised counts of the training dataset and the true counts of the test dataset after reweighing by the train/test ratio (Batson, Royer, and Webber 2019).
Poisson Loss
The Poisson log likelihood of observing the true counts of the test dataset given the distribution given in the denoised dataset (Batson, Royer, and Webber 2019).
Quality control results
Show
Category | Name | Value | Condition | Severity |
---|---|---|---|---|
Scaling | Worst score alra poisson | -17.3505000 | worst_score >= -1 | ✗✗✗ |
Scaling | Worst score knn_smoothing poisson | -13.4420000 | worst_score >= -1 | ✗✗✗ |
Raw results | Dataset 'cellxgene_census/hcla' %missing | 1.0000000 | pct_missing <= .1 | ✗✗✗ |
Raw results | Dataset 'cellxgene_census/hypomap' %missing | 1.0000000 | pct_missing <= .1 | ✗✗✗ |
Raw results | Dataset 'cellxgene_census/tabula_sapiens' %missing | 1.0000000 | pct_missing <= .1 | ✗✗✗ |
Scaling | Worst score alra mse | -9.9708000 | worst_score >= -1 | ✗✗✗ |
Scaling | Worst score dca mse | -8.5238000 | worst_score >= -1 | ✗✗✗ |
Scaling | Worst score magic mse | -7.6749000 | worst_score >= -1 | ✗✗✗ |
Scaling | Worst score knn_smoothing mse | -7.5261000 | worst_score >= -1 | ✗✗✗ |
Raw results | Dataset 'cellxgene_census/immune_cell_atlas' %missing | 0.6666667 | pct_missing <= .1 | ✗✗✗ |
Raw results | Dataset 'cellxgene_census/mouse_pancreas_atlas' %missing | 0.6666667 | pct_missing <= .1 | ✗✗✗ |
Raw results | Dataset 'cellxgene_census/gtex_v9' %missing | 0.5000000 | pct_missing <= .1 | ✗✗✗ |
Raw results | Method 'alra' %missing | 0.3529412 | pct_missing <= .1 | ✗✗✗ |
Raw results | Method 'knn_smoothing' %missing | 0.3529412 | pct_missing <= .1 | ✗✗✗ |
Raw results | Method 'magic' %missing | 0.3529412 | pct_missing <= .1 | ✗✗✗ |
Scaling | Best score knn_smoothing poisson | 6.2850000 | best_score <= 2 | ✗✗✗ |
Raw results | Method 'dca' %missing | 0.2941176 | pct_missing <= .1 | ✗✗ |
Raw results | Metric 'mse' %missing | 0.2843137 | pct_missing <= .1 | ✗✗ |
Raw results | Metric 'poisson' %missing | 0.2843137 | pct_missing <= .1 | ✗✗ |
Raw results | Method 'no_denoising' %missing | 0.1764706 | pct_missing <= .1 | ✗ |
Raw results | Method 'perfect_denoising' %missing | 0.1764706 | pct_missing <= .1 | ✗ |
Normalisation visualisation
Show
Authors
10x Genomics. 2018. “1k PBMCs from a Healthy Donor (V3 Chemistry).” https://www.10xgenomics.com/resources/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0.
———. 2019. “5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor with a Panel of TotalSeq-b Antibodies (V3 Chemistry).” https://www.10xgenomics.com/resources/datasets/5-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-with-cell-surface-proteins-v-3-chemistry-3-1-standard-3-1-0.
Batson, Joshua, Loı̈c Royer, and James Webber. 2019. “Molecular Cross-Validation for Single-Cell RNA-Seq.” bioRxiv. https://doi.org/10.1101/786269.
Dijk, David van, Roshan Sharma, Juozas Nainys, Kristina Yim, Pooja Kathail, Ambrose J. Carr, Cassandra Burdziak, et al. 2018. “Recovering Gene Interactions from Single-Cell Data Using Data Diffusion.” Cell 174 (3): 716–729.e27. https://doi.org/10.1016/j.cell.2018.05.061.
Domínguez Conde, C., C. Xu, L. B. Jarvis, D. B. Rainbow, S. B. Wells, T. Gomes, S. K. Howlett, et al. 2022. “Cross-Tissue Immune Cell Analysis Reveals Tissue-Specific Features in Humans.” Science 376 (6594). https://doi.org/10.1126/science.abl5197.
Eraslan, Gökcen, Eugene Drokhlyansky, Shankara Anand, Evgenij Fiskin, Ayshwarya Subramanian, Michal Slyper, Jiali Wang, et al. 2022. “Single-Nucleus Cross-Tissue Molecular Reference Maps Toward Understanding Disease Gene Function.” Science 376 (6594). https://doi.org/10.1126/science.abl4290.
Eraslan, Gökcen, Lukas M. Simon, Maria Mircea, Nikola S. Mueller, and Fabian J. Theis. 2019. “Single-Cell RNA-Seq Denoising Using a Deep Count Autoencoder.” Nature Communications 10 (1). https://doi.org/10.1038/s41467-018-07931-2.
Hammarlund, Marc, Oliver Hobert, David M. Miller, and Nenad Sestan. 2018. “The CeNGEN Project: The Complete Gene Expression Map of an Entire Nervous System.” Neuron 99 (3): 430–33. https://doi.org/10.1016/j.neuron.2018.07.042.
Hrovatin, Karin, Aimée Bastidas-Ponce, Mostafa Bakhti, Luke Zappia, Maren Büttner, Ciro Sallino, Michael Sterr, et al. 2023. “Delineating Mouse β-Cell Identity During Lifetime and in Diabetes with a Single Cell Atlas.” bioRxiv. https://doi.org/10.1101/2022.12.22.521557.
Jones, Robert C., Jim Karkanias, Mark A. Krasnow, Angela Oliveira Pisco, Stephen R. Quake, Julia Salzman, Nir Yosef, et al. 2022. “The Tabula Sapiens: A Multiple-Organ, Single-Cell Transcriptomic Atlas of Humans.” Science 376 (6594). https://doi.org/10.1126/science.abl4896.
Linderman, George C., Jun Zhao, and Yuval Kluger. 2018. “Zero-Preserving Imputation of scRNA-Seq Data Using Low-Rank Approximation.” bioRxiv. https://doi.org/10.1101/397588.
Luecken, Malte D., M. Büttner, K. Chaichoompu, A. Danese, M. Interlandi, M. F. Mueller, D. C. Strobl, et al. 2021. “Benchmarking Atlas-Level Data Integration in Single-Cell Genomics.” Nature Methods 19 (1): 41–50. https://doi.org/10.1038/s41592-021-01336-8.
Nestorowa, Sonia, Fiona K. Hamey, Blanca Pijuan Sala, Evangelia Diamanti, Mairi Shepherd, Elisa Laurenti, Nicola K. Wilson, David G. Kent, and Berthold Göttgens. 2016. “A Single-Cell Resolution Map of Mouse Hematopoietic Stem and Progenitor Cell Differentiation.” Blood 128 (8): e20–31. https://doi.org/10.1182/blood-2016-05-716480.
Olsson, Andre, Meenakshi Venkatasubramanian, Viren K. Chaudhri, Bruce J. Aronow, Nathan Salomonis, Harinder Singh, and H. Leighton Grimes. 2016. “Single-Cell Analysis of Mixed-Lineage States Leading to a Binary Cell Fate Choice.” Nature 537 (7622): 698–702. https://doi.org/10.1038/nature19348.
Sikkema, Lisa, Ciro Ramírez-Suástegui, Daniel C. Strobl, Tessa E. Gillett, Luke Zappia, Elo Madissoon, Nikolay S. Markov, et al. 2023. “An Integrated Cell Atlas of the Lung in Health and Disease.” Nature Medicine 29 (6): 1563–77. https://doi.org/10.1038/s41591-023-02327-2.
Steuernagel, Lukas, Brian Y. H. Lam, Paul Klemm, Georgina K. C. Dowsett, Corinna A. Bauder, John A. Tadross, Tamara Sotelo Hitschfeld, et al. 2022. “HypoMap—a Unified Single-Cell Gene Expression Atlas of the Murine Hypothalamus.” Nature Metabolism 4 (10): 1402–19. https://doi.org/10.1038/s42255-022-00657-y.
Tasic, Bosiljka, Vilas Menon, Thuc Nghi Nguyen, Tae Kyung Kim, Tim Jarsky, Zizhen Yao, Boaz Levi, et al. 2016. “Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics.” Nature Neuroscience 19 (2): 335–46. https://doi.org/10.1038/nn.4216.
Wagner, Daniel E., Caleb Weinreb, Zach M. Collins, James A. Briggs, Sean G. Megason, and Allon M. Klein. 2018. “Single-Cell Mapping of Gene Expression Landscapes and Lineage in the Zebrafish Embryo.” Science 360 (6392): 981–87. https://doi.org/10.1126/science.aar4362.
Wagner, Florian, Yun Yan, and Itai Yanai. 2018. “K-Nearest Neighbor Smoothing for High-Throughput Single-Cell RNA-Seq Data.” bioRxiv. https://doi.org/10.1101/217737.
Wilson, Parker C., Yoshiharu Muto, Haojia Wu, Anil Karihaloo, Sushrut S. Waikar, and Benjamin D. Humphreys. 2022. “Multimodal Single Cell Sequencing Implicates Chromatin Accessibility and Genetic Background in Diabetic Kidney Disease Progression.” Nature Communications 13 (1). https://doi.org/10.1038/s41467-022-32972-z.
Wu, Sunny Z., Ghamdan Al-Eryani, Daniel Lee Roden, Simon Junankar, Kate Harvey, Alma Andersson, Aatish Thennavan, et al. 2021. “A Single-Cell and Spatially Resolved Atlas of Human Breast Cancers.” Nature Genetics 53 (9): 1334–47. https://doi.org/10.1038/s41588-021-00911-1.