HVGs in mnn_correct

In this tutorial: https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/scanpy/scanpy_03_integration.html , they use HVGs that are in at least 2 batches. Then use those HVGs later on for dim reduction.

Would it be a better idea to just use the HVGs that are in at least 2 batches (also should this number change with increased batch # integration?), then once again find the HVGs with the batch corrected data, as suggested here: Highly variable genes - best practice?

In practice, HVG selection is done before batch correction as these methods work better with HVGs. That being said, ideally you would do it afterwards to ensure batch effects don’t affect the selection. As you suggest, a way to avoid this is to select HVGs per batch and use an aggregation strategy. This is currently done in Scanpy via the batch_key parameter and also in some of our other resources such as the benchmarking data integration package scIB.

Thanks a lot for the explanation!