I am analyzing snRNAseq data and I have come across with a pretty general question, but to which I don’t seem able to find a categorical, clear answer:
What is the advantage of using “correcting” vs “integrating” methods for batch-effects removal?
If I got it right, the final point is to be able to cluster your data in a physiologically relevant way (not biased by experimental errors) so that we can identify potential cell populations/states. Once we have clustered the data, the recommendation I am finding mostly everywhere is to use the lognormalised (not batch-corrected) data for differential expression. In this scenario, why would you choose using “correcting” methods (such as Combat for example) that will alter your expression matrix (even though you got your “raw” data in a safe place) if you can use “integrating” methods (Scanorama integrate, BBKNN, etc) that will not touch your expression matrix and are faster to compute?
What am I missing?
Looking forward to your advice,