Why max_out_group_fraction in sc.tl.filter_rank_genes_groups set to 0.5

try to determine what is the best value for max_out_group_fraction in sc.tl.filter_rank_genes_groups.
In source code, it is set to 0.5 by default.
what exactly does it measure?

Here is my understanding based on the code.
For example in cluster 1 vs the rest
min_in_group_fraction is the percentage of genes that is expressed in cluster 1
max_out_group_fraction is the percentage of genes that is NOT expressed in the rest of cells?
If so, should it be set to 0.75 to match the min_in_group_fraction default?

or it is the percentage of genes that is expressed in the rest of cells?
Then the following code doesn’t look right.

gene_names = gene_names[
(fraction_in_cluster_matrix > min_in_group_fraction) &
(fraction_out_cluster_matrix < max_out_group_fraction) &
(fold_change_matrix > min_fold_change)

which one is correct?

Thank you.

max_out_group_fraction is the maximum fraction of cells that express a gene in the other groups (your second guess).

Can you point out what you think is wrong with the code?

I think code itself is right.
But I am not sure the logic to exclude genes that expressed more than 50% (default) out of the group is solid.
For example, if my gene expressed 45% in group and 45% out of group (given the group size is balanced), it should be included. I set it to max_out_group_fraction=1.01 to override the default.

I think a similar parameter (like what Seurat is doing) such as max_difference_fraction=0.25 is probably a better choice if the purpose is to exclude comparison with imbalanced group size.

Also, what about adding a min_out_group_fraction to match min_in_group_fraction to make sure that we don’t include genes that not expressed well out of group.

Thanks for your suggestion, indeed I think that the defaults are very stringent. I will make a PR to relax the conditions.