Subset out Ribosomal Genes for Tracksplot

Hi Everyone,

I am trying to remove all the ribosomal genes out of my anndata object to create a tracksplot & Heatmap that shows more immunological functioning readouts.

However, I think I am not deleting them everywhere as they are persisting in my tracksplot from rank_gene_groups.

Is there another partition I need to address when subsetting out genes for analysis?’

rps = adata.var_names.str.contains('[R][P][S][\d+]')
rpl = adata.var_names.str.contains('[R][P][L][\d+]')
rps1 = adata.var_names.str.contains('[R][P][S][\d+]+')
rpl1 = adata.var_names.str.contains('[R][P][L][\d+]+')
rps2 = adata.var_names.str.contains('[R][P][S][\d+]+[A-Z]')
rpl2 = adata.var_names.str.contains('[R][P][L][\d+]+[A-Z]')

rpl3 = adata.var_names.str.contains('[R][P][L][0-9][0-9]')
rps3 = adata.var_names.str.contains('[R][P][S][0-9][0-9]')

rpl4 = adata.var_names.str.contains('[R][P][L][0-9]')
rps4 = adata.var_names.str.contains('[R][P][S][0-9]')


remove = np.add(rps, rpl)
remove = np.add(remove, rps1)
remove = np.add(remove, rpl1)
remove = np.add(remove, rpl2)
remove = np.add(remove, rps2)

remove = np.add(remove, rpl3)
remove = np.add(remove, rps3)

remove = np.add(remove, rpl4)
remove = np.add(remove, rps4)



keep = np.invert(remove)
adata_subset_rp = adata[:,keep]

print(adata.n_vars, adata_subset_rp.n_vars)

They are no longer in adata.var but I do see them in the result

sc.pl.rank_genes_groups_tracksplot(adata_subset_rp, n_genes = 5)

Here is how I remove ribo genes in my data sets:

# read in list of ribo genes from kegg
ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
ribo_genes = pd.read_table( ribo_url, header=[1])
# create mask for all ribo genes
ribo_mk = np.in1d( adata.var_names.values.astype(str), ribo_genes)
adata = adata[:,~ribo_mk]

and after removed, you will have to rerun sc.tl.rank_genes_groups() .

Hope this help,
Chuck

Hi @herrinca,

Thank you for the insightful response.
However the filtering method you suggested still does not clear the RPS/RPL genes in the tracksplot result. I can see that the list does have the ribosomal gene matches for my adata genes, but for some reason it is not clearing them. Is there a variable data type casting I may be missing?

ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
ribo_genes = pd.read_table(ribo_url, header=[1])
# create mask for all ribo genes
ribo_mk = np.in1d( adata.var_names.values.astype(str), ribo_genes)
adata_rb_subset = adata[:,~ribo_mk]
#
sc.tl.rank_genes_groups(adata_rb_subset, groupby= 'new_cluster', method = 'wilcoxon')
sc.pl.rank_genes_groups_tracksplot(adata_rb_subset, n_genes = 5, save ='tracksplot_rbsubset.png')

Thank you,

Ed

Hi Edd,

That’s a bit strange, which verison of scanpy are you using?

Only other thing I can think to try, is to delete .uns['rank_genes_groups'] :

del adata_rb_subset.uns['rank_genes_groups']
sc.tl.rank_genes_groups( adata_rb_subset, groupby='new_cluster', method='wilcoxon')
sc.pl.rank_genes_groups_tracksplot( adata_rb_subset, n_genes=5, save='tracksplot_rbsubset.png')

Best,
Chuck

Hi @herrinca,

It is bit strange right?
I am using scanpy version 1.7.1, do you happen to be using a different version?

Thank you again for the suggestion, I really appreciate the ideas, but the delete didn’t work either.

adata_rb_subset = adata

ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
ribo_genes = pd.read_table(ribo_url, header=[1])
# create mask for all ribo genes
ribo_mk = np.in1d(adata_rb_subset.var_names.values.astype(str), ribo_genes)
adata_rb_subset = adata_rb_subset[:,~ribo_mk]
del adata_rb_subset.uns['rank_genes_groups']

sc.tl.rank_genes_groups(adata_rb_subset, groupby= 'new_cluster', method = 'wilcoxon')
sc.pl.rank_genes_groups_tracksplot(adata_rb_subset, n_genes = 5, save ='tracksplot_rbsubset.png')

Best,

Ed