I would like to use rank_genes_groups() to find the differentially expressed genes between two-time points. I have 200 cells and 10000 genes per each time point. Let’s suppose the gene expression matrix for time point 1 is A0, which is a 200*10000 dimension matrix, and for time point 2 is A1. First, I convert the data to annData as follow:

n_obs = A0.shape[1]

obs = pd.DataFrame()

obs[‘day0’] = [f’C_{i}’ for i in range(1, n_obs+1)]

var_names = geneNames

var = pd.DataFrame(index=var_names)

adata1 = ad.AnnData(A0, obs=obs, var=var, dtype=‘int32’)

n_obs1 = A1.shape[1]

obs1 = pd.DataFrame()

obs1[‘day1’] = [f’C_{i}’ for i in range(1, n_obs+1)]

var_names = geneNames

var = pd.DataFrame(index=var_names)

adata2 = ad.AnnData(A1, obs=obs1, var=var, dtype=‘int32’)

adata = adata1.concatenate(adata2)

adata

AnnData object with n_obs × n_vars = 400 × 10000

obs: ‘day0’, ‘day10’, ‘batch’

My questions are why to use sc.pl.rank_genes_groups() first we need to do

sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)

sc.tl.leiden(adata)

sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)

sc.tl.rank_genes_groups(adata, ‘leiden’, method=‘wilcoxon’)

According to my understanding, first, it finds the clusters in the data set and then compares them.

And then what I get is a figure with 9 subfigures. Therefore, I do not know how to interpret the results for these two-time points. I expect to get one set of genes. Is there any explanation for this?

Second, is there a way that considers each cluster as one time-point? Then I can have the differentially expressed genes for these two time-points? Any help is appreciated. I am new in this topic.