# Adding cell number information in dotplot

Hi all!

My use case for `scanpy` is analysis of whole-body data from a weird marine annelid. We sort of have an idea of what to expect, but a lot of the analysis is exploratory, and my main job is helping canalize the knowledge that is available in the lab into making sense of the data.

In this context, dotplots are our best friend, as it provides a very nice summary of gene expression over the whole (clustered) dataset. However, yesterday we noticed a confusing edge case: let’s say gene g is expressed in the same number of cells in two clusters, 4 and 23. Cluster 4 has many, many more cells than 23, therefore on the dotplot it will look like g is barely expressed in 4, but a great marker for 23. Of course, combining a dotplot with a feature plot helps you see that, but you get no sense of how many cells those are (more/less/the same).

To alleviate this I am proposing an extension of dotplots: instead of circles, boxes, that have a height proportional to log(\#cells_{cluster}), are filled proportionally to how many cells express gene g, and are colored according to the average expression.

I think this works better than violinplots. Sadly I see no good way to multiplex this and plot multiple genes at once. I will make a proof-of-principle and post here, really interested in feedback (maybe I am overlooking something super simple/basic?)

Case in point: compare this “domino plot” (y-axis in log2-scale to the corresponding dotplot. The dotplot suggests that Hb9 is a great marker for cluster 18, while in fact the bulk of the cells expressing the gene are in clusters 0 and 2.

(post broken up since my account is 2h old and I’m not allowed to embed multiple images yet)

1 Like

the corresponding dotplot:

An immediate issue that jumps out of the page is that the log-scale makes it hard to intuitively understand what percentage of cells actually express the gene. For comparison’s sake here is a version with the y-axis in natural scale:

Thoughts? Ideas?

Hi,

I think that this proposal warrants a Github issue. You will likely get more developer feedback this way