I can answer the first two questions about UMAP initialization, and have pinged a
dynverse author about the third.
- What is meant by PAGA initialization of UMAP? Is it done at the level of the similarity matrix or further in the optimization process and the cost function of UMAP ?
It’s just the initial positions in the embedding space. If you’ve built the coarse grained PAGA embedding from the neighbors network, you can then do a force directed layout of that PAGA graph. Calling
sc.tl.umap(adata, init_pos="paga", ...) will just use the coordinates from that layout as its initial position for all points in each of the reference clusters.
- What is the difference between Spectral initialization and PAGA initialization in UMAP?
Just the initial position of the points before optimizing the layout. If you use
spectral, a spectral decomposition of the nearest neighbor graph is used to get the initial positions before UMAP optimizes the 2d embedding.
To put it simply I don’t understand the links between Leiden, PAGA and UMAP except maybe the fact that they use the same KNN graph in the first place…
I have looked at the UMAP and PAGA papers, but I honestly have a hard time understanding everything.
These can be quite dense – especially the UMAP publication – but all you should need is an intuition. I highly recommend this description from the UMAP documentation for a high level understanding. This should give you an understanding of how UMAP weights the KNN graph and uses it to generate a 2d layout.
Leiden is a clustering algorithm which partitions the nodes in the KNN graph into communities/ clusters. PAGA will build a “coarse grained” representation of the full KNN graph by summarizing each of the clusters into a single node. Edges between PAGA nodes are summarized from connectivities between the groups of nodes on the full graph.
Does this help?