Hi,

I can answer the first two questions about UMAP initialization, and have pinged a `dynverse`

author about the third.

- What is meant by PAGA initialization of UMAP? Is it done at the level of the similarity matrix or further in the optimization process and the cost function of UMAP ?

It’s just the initial positions in the embedding space. If you’ve built the coarse grained PAGA embedding from the neighbors network, you can then do a force directed layout of that PAGA graph. Calling `sc.tl.umap(adata, init_pos="paga", ...)`

will just use the coordinates from that layout as its initial position for all points in each of the reference clusters.

- What is the difference between Spectral initialization and PAGA initialization in UMAP?

Just the initial position of the points before optimizing the layout. If you use `spectral`

, a spectral decomposition of the nearest neighbor graph is used to get the initial positions before UMAP optimizes the 2d embedding.

To put it simply I don’t understand the links between Leiden, PAGA and UMAP except maybe the fact that they use the same KNN graph in the first place…

I have looked at the UMAP and PAGA papers, but I honestly have a hard time understanding everything.

These can be quite dense – especially the UMAP publication – but all you should need is an intuition. I highly recommend this description from the UMAP documentation for a high level understanding. This should give you an understanding of how UMAP weights the KNN graph and uses it to generate a 2d layout.

Leiden is a clustering algorithm which partitions the nodes in the KNN graph into communities/ clusters. PAGA will build a “coarse grained” representation of the full KNN graph by summarizing each of the clusters into a single node. Edges between PAGA nodes are summarized from connectivities between the groups of nodes on the full graph.

Does this help?