Hi everyone! Wondering if there are any plans to implement a function in scanpy that would filter “empty droplets” containing cells, like EmptyDrops (https://www.biorxiv.org/content/10.1101/234872v2) or SoupX (https://www.biorxiv.org/content/10.1101/303727v1)? Or really anything more advanced than just setting a fixed cutoff for counts/genes per cell?
I guess empty droplets is now pretty much incorporated already into Cell Ranger v3, so this would hopefully happen at an earlier stage. You are completely right that SoupX, on the other hand, would be a great addition to scanpy external. We would have to reach out to them to see if they are interested in integrating this into scanpy. I will put it on the list
If you’re using CellRanger, this is fair - but we currently are using kallisto | bustools which doesn’t include any empty drops filtering. (For some obscure genomes, STAR doesn’t play particularly nicely, so using CellRanger isn’t an option).
Picking at kneeplots tends to do an okay job but I also was on the search for something more objective for this process. I am currently working with dropkick and first attempts seem to be reasonably successful. I have however found it to be resource hungry and could only get it to work when running with lots of memory + cores via the command line (maximum resources for an interactive/jupyter notebook job at our institute were insufficient). While the package is written in a way which seems like they may plan to incorporate it in scanpy external in the future, I’m not sure how practical it would be for most people.