I’m pretty new to scanpy and anndata objects.
I have counts for 3 species (human, mouse and dog). I determined the species for each gene based on the gene ID (which is the “var” I loaded):
an.var["Species"] = an.var_names.str[0:4] # ENSG / ENSM / ENSC
Then I’d like to determine the species of the cells based on the percentage of reads per species genes, a cell with > 70% of its reads assigned to one species would be labelled with it.
My ugly solution for now is:
for c in adata.obs.index: for s in adata.var["Species"].unique(): adata.obs.loc[c, s + "_count"] = adata[c, adata.var["Species"] == s].X.sum() adata.obs.loc[c, s + "_perc"] = adata[c, adata.var["Species"] == s].X.sum() / adata[c, :].X.sum() * 100 if adata.obs.loc[c, s + "_perc"] > 60: adata.obs.loc[c, "Species"] = s[0:4] # ENSG / ENSM / ENSC
I’ve tried different things but I feel really far from anndata philosphy and I don’t have a clear view of all this for now.
Can someone help me please?
Also I have side questions:
- I’m not sure I understand the difference between “var” and “vars”? The former is some kind of index for the latter?
- Is there a easy hands-on tuto that you could recommend?