Distances between different cell types

Dear Scanpy team,

I would like to ask you a question about computing the distances between different cell types. I would like to get the numerical estimate of how close cells of a particular type are to each other. For example, I have cells that come from different labs and would like to estimate the gene expression similarities between them for all genes, or at least for highly variable ones.

One solution could be to compute the dim. reduced representation of data and then compute the distances.

import scanpy as sc

normalize the data

adata.layers[“counts”] = adata.X.copy() # preserve the counts
sc.pp.normalize_total(adata, target_sum=1e4, exclude_highly_expressed=True)

scale the data

sc.pp.scale(adata, max_value=10)

freeze raw counts

adata.raw = adata #freeze the state in .raw

highly variable genes selection

sc.pp.highly_variable_genes(adata, n_top_genes=2000, subset=False, layer=“counts”, flavor=“seurat_v3”)

sc.tl.pca(adata, svd_solver=‘arpack’,n_comps=100)

neighbours graph

sc.pp.neighbors(adata, n_pcs=100)

compute UMAP


My question is, how could I compute the median distance between cells that have different observations. For example, I have cells with obs[‘cell_type’]==‘type_1’ and obs[‘cell_type’]==‘type_2’. How could I compare the distance between type_1 and type_2 cells?


You could consider using PAGA connectivities for this analysis. PAGA evaluates how much more connected two clusters are than expected at random. Distances on low dimensional embeddings can be difficult to interpret, however as PAGA uses the kNN graph directly, this might be a better representation.