How to calculate the average gene expression within each cluster?

Hello everyone.

I’m a new in Scanpy and impressed by its speed and user-friendly. Since I used to be a big fan of Seurat, the most popular R package for snRNA-seq analysis, I don’t know how to do some operations I often do in Seurat with Scanpy. The next is an example.

In Seurat, I could get the average gene expression of each cluster easily by the code showed in the picture.

I think Scanpy can do the same thing as well, but I don’t know how to do right now. I would be grateful if anyone could offer me some help.

Hey!

If you just want the average expression within each cluster, you could do:

import pandas as pd                                                                                                                                                                         
import scanpy as sc

adata = sc.datasets.pbmc68k_reduced()

res = pd.DataFrame(columns=adata.var_names, index=adata.obs['bulk_labels'].cat.categories)                                                                                                 

for clust in adata.obs.bulk_labels.cat.categories: 
    res.loc[clust] = adata[adata.obs['bulk_labels'].isin([clust]),:].X.mean(0)

The average expression values are then stored in res. Just replace 'bulk_labels' with your cluster label.

If you want the average z-scores of the expression values you can use sc.pp.scale(adata) before running this.

1 Like

Got it! Thank you for your help!

Hi @LuckyMD, how would you get the number of cells that have expression for a gene per cluster, rather than the average?

Hi @sgmccalla,

You should be able to do that via:
(adata[adata.obs['bulk_labels'].isin([clust]),:][:,'GeneA'].X > 0).mean(0)

However the dotplot already gives you that value. So you if you want to be sure you could check the code for that plotting function.