When to perform batch correction?

Dear all,

I am a begginer in scRNAseq and researching batch correction methods. I have found and setup Combat, Scanorama, BBKNN and MNN (even though MNN is awfully slow to run at home computer).
I am wondering specifically WHEN is best to run this correction, is there an universally accepted step where batch correction should be applied or is it method dependent?. I have read the best practices guide, which uses Combat and is ran after Normalization and logarithmization. Following batch correction then it is recommented to go with HVG and PCA.
However, other methods like BBKNN requires PCA to be computed in advance…in this case when do you perform HVG? After BBKNN would be my guess but then…PCA will look very different compared to Combat? Unless you run PCA again after BBKNN as well?.

As you see I am a bit confused so if anyone could shed some light it would be appreciated!



1 Like

check this out https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/scanpy/scanpy_03_integration.html

Dear @jayypaul , thank you for your reply.
I have followed that notebook you link to before. However, it does not solve my doubts…the comparison between Combat and BBKNN results still looks different…

If you’re following BBKNN here: https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html and the previous link or best practices guide for combat, then it’s expected that the results are different. The algorithms are ultimately data dependent and they work better for different data sources (cell types, heterogeneity present, etc). As explained in the best practices, when to apply batch correction depends on the algorithm being used.

Late thanks, but thanks @jayypaul