Heatmap, max number of cells per cluster

Would like to visualize the max number of cells per cluster so that the heatmap looks more balanced and is better to visualize. Thoughts?

I think this has come up before in the issues.

Does that sound similar to what you’re asking?

I don’t understand your statement: “If you want the area of the plot for each sample to take the same amount of space, why not just plot the mean for each group?” from that post you mentioned. Could you explain that further please?

I’m looking for a “sample” like approach like in R where I could just take out and plot for e.g. 100 cells from each cluster and their expression for the top 10 DE genes. Something similar is implemented here in the hm_gpa_sel function which will return a pheatmap: https://rdrr.io/github/pcahan1/singleCellNet/src/R/plots.R

this function will do what you want:

def downsample_to_smallest_category(
        adata,
        column="sample_short",
        random_state=None,
        min_cells=15,
        keep_small_categories=False
) -> sc.AnnData:
    """
    returns an annData object in which all categories in 'column' have
    the same size

    column
        column with the categories to downsample
    min_cells
        Minimum number of cells to downsample.
        Categories having less than `min_cells` are discarded unless
        keep_small_categories is True
    keep_small_categories
        Be default categories with less than min_cells are discarded.
        Set to true to keep them
    """
    counts = adata.obs[column].value_counts(sort=False)
    if len(counts[counts < min_cells]) > 0 and keep_small_categories is False:
        logg.warning(
            "The following categories have less than {} cells and will be "
            "ignored: {}".format(min_cells, dict(counts[counts < min_cells]))
        )
    min_size = min(counts[counts >= min_cells])
    sample_selection = None
    for sample, num_cells in counts.items():
        if num_cells <= min_cells:
            if keep_small_categories:
                sel = adata.obs.index.isin(
                    adata.obs[adata.obs[column] == sample].index)
            else:
                continue
        else:
            sel = adata.obs.index.isin(
                adata.obs[adata.obs[column] == sample]
                .sample(min_size, random_state=random_state)
                .index
            )
        if sample_selection is None:
            sample_selection = sel
        else:
            sample_selection |= sel
    logg.info(
        "The cells in category {!r} had been down-sampled to have each {} cells. "
        "The original counts where {}".format(column, min_size, dict(counts))
    )
    return adata[sample_selection].copy()

You can use it as follows:"

adata = sc.datasets.pbmc68k_reduced()
sc.pl.heatmap(
    downsample_to_smallest_category(
         adata, 'bulk_labels', min_cells=50, keep_small_categories=True), 
   genes_list,
   groupby='bulk_labels'
)

Thank you!! I appreciate the function.