Challenges in Single-Cell Data Analysis

An article titled “Challenges in Single-Cell Data Analysis,” published in Biocompare. Written by Caitlin Smith, the article explains best practices for working with large and noisy datasets, such as single-cell experiments.

Vilija Lomeikaitė Džikė, Lead Bioinformatician at VUGENE

In this article, Vilija Lomeikaitė Džikė (Lead Bioinformatician at VUGENE), Eric Hobbs (Executive VP of R&D and Operations at Bruker), Vicky Morrison (Senior Product Manager of Software at Parse Biosciences), and Peter Smibert (VP of Biology at 10x Genomics) discuss how the main challenge in single-cell data analysis is preserving the variance originating from biological differences while clarifying these signals by reducing variance from non-biological sources.

Strategies for success in single-cell data analysis

Though some might argue that most things about single-cell experiments are difficult, there are points within the single-cell analysis workflow that seem like open doors to chaos. Anything that obscures biological signals is problematic—and not unusual, because the signals are already very small. However, batch effects, background noise, dropout events, and doublets are known offenders for which you can be prepared.

Batch effects

Batch effects can easily result in variance that masks true biological signals, especially if samples are processed at various times, in different runs, by a variety of researchers. The key to preventing batch effects is consistency—and while this is true for any experiment, it’s especially true for those generating signals close to background noise. “Standardized protocols for sample preparation, along with sample multiplexing and randomizing samples across sequencing runs, help ensure a balanced experimental design and reduce batch effects,” says Lomeikaitė Džikė.

After attempts to prevent batch effects, you can use batch correction during analysis to remove remaining differences between samples processed under different conditions. “If uncorrected, batch effects can drive artificial clustering and mask true biological relationships,” says Morrison. Lomeikaitė Džikė notes that “data integration using tools like Harmony remove technical variation while preserving biological differences.” [1]

Background noise

There are multiple possible origins of background noise, and one can use filtering to reduce it. For example, in single-cell RNA sequencing (scRNA-seq), background noise can arise from residual RNA released from burst and apoptotic cells. Thus, in droplet-based methods, “quality control involves removing ambient RNA using tools like SoupX, which estimates background RNA in empty droplets and subtracts it from real cell profiles,” says Lomeikaitė Džikė [2].

In scRNA-seq, it is especially important to filter cells during analysis based on the proportion of mitochondrial and ribosomal protein-coding genes. This step removes dead or stressed cells (with high mitochondrial content) and metabolically inactive cells (with low ribosomal content). “Filtering is the first key step and focuses on removing low-quality data points and technical noise,” says Morrison. “This includes excluding background, including empty droplets (common when using droplet-based approaches), and removing dead or dying cells that typically show elevated mitochondrial gene expression.” She adds that it’s also important to consider biological context when setting filtering thresholds for each of these factors, as some systems will have an inherently higher or lower proportion of mitochondrial and/or ribosomal protein coding transcripts.

Dropouts and doublets

Another source of unwanted, non-biological variance is dropout events. These occur when transcripts are so low in abundance that they are not detected by the assay, leading to a zero reading, and doubt as to whether they are truly absent. “Ensuring sufficient sequencing depth and isolating healthy cells via MACS or FACS can mitigate data sparsity and minimize the impacts of dropout events,” says Lomeikaitė Džikė.

In droplet-based scRNA-seq platforms, doublets and multiplets occur when a droplet contains and barcodes two or more cells, creating misleading hybrid expression profiles that distort downstream analyses. Morrison notes that optimizing cell concentrations helps to limit the occurrence of doublets and multiplets. Bioinformatics tools, such as DoubletFinder for scRNA-seq data, can help to remove doublet signals, also called false or hybrid cells.

Read full article: Link
Published by: Biocompare® on April 28, 2026
Cover photo credits: Andrej Vasilenko

Similar Resources

Data analysis

July 2, 2026

BIO 2026: Beyond the DNA Blueprint

The interview titled “BIO 2026: Beyond the DNA Blueprint,” published in The Pharma Navigator. Karolina Žukauskienė, Scientific Project Manager at VUGENE — a specialist in multi-omics data analysis and interpretation, ...

Data analysis

June 18, 2026

Choosing Your Omics

Choosing the right omics type begins with a question: Which molecular layer or layers hold the answer to the biological phenomenon I am observing? The conventional layers remain the backbone ...