Best ways to account for batch effects

What is a batch effects?

With recent technological advancements, we’re generating high-throughput experimental data like never before. When this data comes from different experiments – with variations in timing, personnel, reagents, equipment, and platforms – non-biological factors, known as batch effects, can occur. These can create artificial differences, making it seem like there are biological differences between features of interest when, in fact, the variation is due to batch effects rather than true biological variation.

How to proceed?

Batch effects can be highly nonlinear, making it challenging to align datasets while preserving genuine biological variation. To ensure reliable and reproducible results, it’s crucial to understand experimental design, apply batch effect detection methods and statistical correction techniques. By effectively managing batch effects, we improve data quality and also strengthen the validity of research findings, paving the way for more accurate biological insights.

At VUGENE, we employ the following process to detect batch effects, remove them or account for their presence. First, to visually detect possible batch effects , we use dimensionality reduction methods – Principal Component Analysis (PCA) and the non-linear Uniform Manifold Approximation and Projection (UMAP). Second, to quantitatively assess batch effect, we fit linear regression models onto data projections.

When substantial batch effects are detected, we correct them to ensure the reliability of biological conclusions using one of the following approaches:

ComBat: empirical Bayes method to adjust for known batch effects, estimating and correcting variability while preserving the biological signal (Leek et al., 2012).
removeBatchEffects: by applying linear regression, this algorithm isolates and subtracts known batch effects, ensuring that only the biological signals of interest remain (Ritchie et al., 2015).
Surrogate variable analysis (SVA): this method detects unknown sources of variation in the data and corrects them by regressing out these latent variables, further enhancing the reliability of the results (Leek et al., 2012).
Remove unwanted variation (RUV): detects and removes unwanted sources of variation by leveraging negative control features to estimate and adjust for these unwanted variations, thereby improving the accuracy of downstream analyses (Risso et al., 2014).Each approach has its pros and cons. Contact us to learn which batch effect correction method suits your data best.

Bibliography

Leek et al., 2012, read full article: Link
Risso et al., 2014, read full article: Link
Ritchie et al., 2015, read full article: Link

Written by: Karolis Krinickis, Miglė Gabrielaitė, PhD, Ingrida Olendraitė, PhD
Cover image credits: Olena / Adobe Stock