Would like to visualize the max number of cells per cluster so that the heatmap looks more balanced and is better to visualize. Thoughts?

I don’t understand your statement: “If you want the area of the plot for each sample to take the same amount of space, why not just plot the mean for each group?” from that post you mentioned. Could you explain that further please?

I’m looking for a “sample” like approach like in R where I could just take out and plot for e.g. 100 cells from each cluster and their expression for the top 10 DE genes. Something similar is implemented here in the hm_gpa_sel function which will return a pheatmap: https://rdrr.io/github/pcahan1/singleCellNet/src/R/plots.R

this function will do what you want:

```
def downsample_to_smallest_category(
adata,
column="sample_short",
random_state=None,
min_cells=15,
keep_small_categories=False
) -> sc.AnnData:
"""
returns an annData object in which all categories in 'column' have
the same size
column
column with the categories to downsample
min_cells
Minimum number of cells to downsample.
Categories having less than `min_cells` are discarded unless
keep_small_categories is True
keep_small_categories
Be default categories with less than min_cells are discarded.
Set to true to keep them
"""
counts = adata.obs[column].value_counts(sort=False)
if len(counts[counts < min_cells]) > 0 and keep_small_categories is False:
logg.warning(
"The following categories have less than {} cells and will be "
"ignored: {}".format(min_cells, dict(counts[counts < min_cells]))
)
min_size = min(counts[counts >= min_cells])
sample_selection = None
for sample, num_cells in counts.items():
if num_cells <= min_cells:
if keep_small_categories:
sel = adata.obs.index.isin(
adata.obs[adata.obs[column] == sample].index)
else:
continue
else:
sel = adata.obs.index.isin(
adata.obs[adata.obs[column] == sample]
.sample(min_size, random_state=random_state)
.index
)
if sample_selection is None:
sample_selection = sel
else:
sample_selection |= sel
logg.info(
"The cells in category {!r} had been down-sampled to have each {} cells. "
"The original counts where {}".format(column, min_size, dict(counts))
)
return adata[sample_selection].copy()
```

You can use it as follows:"

```
adata = sc.datasets.pbmc68k_reduced()
sc.pl.heatmap(
downsample_to_smallest_category(
adata, 'bulk_labels', min_cells=50, keep_small_categories=True),
genes_list,
groupby='bulk_labels'
)
```

Thank you!! I appreciate the function.