I’m clustering my single-cell data using Scanpy package, and I use rank_gene_groups to rank genes for characterizing groups. This returns names, scores, logfoldchanges, and pvals_adj for my Anndata format. I’m trying to interpret scores for this data, but I couldn’t find any explanation except the source code. My code:
sc.tl.rank_genes_groups(adata, 'leiden', method='t-test', n_genes=adata.var.shape)
The values for scores are all between about 1.2 to 39 for the whole list of marker genes (not the limited first 100). This range doesn’t make sense based on the definition, as below:
scores: structured `np.ndarray` (`.uns['rank_genes_groups']`) Structured array to be indexed by group id storing the **z-score** underlying the computation of a p-value for each gene for each group. Ordered according to scores.
A couple of lines from my result is shown as follows. 0-n means the ‘name’ of cluster 0, followed by 0_s as ‘scores’, 0-p as ‘pvals_adj’, and 0_l as ‘logfoldchanges’ for cluster 0. Afterward, we have the same data for other clusters. (I have 5 clusters, but not all are shown in the image)
Could you please tell me what “scores” really mean in this result? Your help would be much appreciated! Thanks a lot in advance!