Components/Neighbors to use to denoise the graph


I thought I would open a new thread for this question. I went through the PAGA tutorial (paul-15) and tried it on my data, with and without the denoising step. Adding that step seems to have a profound effect on how the data looks and I am not too sure 1) whether it makes sense 2) whether the parameters I am using are appropriate.

In the (paul-15) tutorial, you are using 4 neighbors at first:
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
This number appears rather small compared to the sort of default values usually used?

Then in the denoising step, the number neighbors increases to 10 for the diffusion map:
sc.pp.neighbors(adata, n_neighbors=10, use_rep='X_diffmap')

I am not very familiar with the concepts underlying these decisions and this renders the choice somewhat difficult. Of course I could just try every possible combination and pick the one that looks best but I’d prefer to understand a bit more the logic behind it. Would it be possible to obtain more ‘details’ about what should motivate this choice?

Many thanks!

1 Like

I have the same question! Following the thread.

Hi @Mevelo, @andattardi,

I will try to give a brief insight into this. The number of neighbors in the sc.pp.neighbors() call is not the denoising part in this tutorial, it’s the diffusion map basis. Diffusion maps typically represent a more smooth representation of trajectories a low dimensional embedding. That is because they are based on diffusion processes along the KNN graph (which are also used by other denoising approaches as mentioned in the tutorial).

The diffusion map call is based on a KNN graph with 4 neighbors, which limits the smoothing somewhat. The k=10 graph is then used for PAGA. There is not really a good heuristic for choosing k or choosing whether to base PAGA on diffmap outputs or not. I have not done it previously, but you might prefer to do it.

Hope that helps a bit!