# How to calculate the average gene expression within each cluster?

Hello everyone.

I’m a new in Scanpy and impressed by its speed and user-friendly. Since I used to be a big fan of Seurat, the most popular R package for snRNA-seq analysis, I don’t know how to do some operations I often do in Seurat with Scanpy. The next is an example.

In Seurat, I could get the average gene expression of each cluster easily by the code showed in the picture.

I think Scanpy can do the same thing as well, but I don’t know how to do right now. I would be grateful if anyone could offer me some help.

Hey!

If you just want the average expression within each cluster, you could do:

``````import pandas as pd
import scanpy as sc

``````

The average expression values are then stored in `res`. Just replace `'bulk_labels'` with your cluster label.

If you want the average z-scores of the expression values you can use `sc.pp.scale(adata)` before running this.

1 Like

Got it! Thank you for your help!

Hi @LuckyMD, how would you get the number of cells that have expression for a gene per cluster, rather than the average?

Hi @sgmccalla,

You should be able to do that via:
`(adata[adata.obs['bulk_labels'].isin([clust]),:][:,'GeneA'].X > 0).mean(0) `

However the dotplot already gives you that value. So you if you want to be sure you could check the code for that plotting function.

Hi @LuckyMD,

Thank you for posting, this is very helpful.
Is it possible to calculate p value instead of z score for the average expression of genes within a cluster?

Hey @Edd

You can get p-values with the tl.rank_genes_groups function. Docs on that are here:

e.g.:

``````sc.tl.rank_genes_groups(adata, 'cluster_label', method='t-test')
pval_table = pd.DataFrame(
{group + '_' + key[:2]: result[key][group]
for group in groups for key in ['names', 'pvals_adj']})
``````

Hi @Edd,
A p-value necessitates a hypothesis test. What hypothesis would you be interested in testing? the `rank_genes_groups` test that @alexcwsmith pointed out would test whether the gene is expressed higher in a particular cluster than in all other cells. If you want to test a different hypothesis, you’ll need a different test.

Thank you both for getting back to me !
I am familiar with retrieving logfoldchanges and pvals from sc.tl.rank_genes_groups().
I may be trying to do something different, that may not be possible.

I was wondering is it possible to test for average gene expression for genes within a cluster with an associated p value. Or is this only possible with logfoldchange?

Thanks again for all your time and attention.

Ed