Multiple gene_ids upon 10x file concatenation

Hi all I am new to coding and sc analysis

I have a few single cell samples I wanted to concatenate and analyze with scanpy in jupyter. I’m not sure why I am getting three different gene_id lists upon concatenation The integration is way off as well I think in part because of this. Any way to standardize the list? I wasnt having this problem in R so I think there is something im doing wrong in python?

Here is the code:

import numpy as np
import pandas as pd
import scanpy as sc

####Read in 10x files
CRC1 = sc.read_10x_mtx(

CRC2 = sc.read_10x_mtx(

Ctrl = sc.read_10x_mtx(

####Check Var

CRC1, CRC2, Ctrl

(AnnData object with n_obs × n_vars = 1286 × 33694
var: ‘gene_ids’, ‘feature_types’,
AnnData object with n_obs × n_vars = 1205 × 33538
var: ‘gene_ids’, ‘feature_types’,
AnnData object with n_obs × n_vars = 5037 × 33694
var: ‘gene_ids’, ‘feature_types’)

adata = CRC1.concatenate(CRC2, Ctrl)


AnnData object with n_obs × n_vars = 7528 × 22164
obs: ‘batch’
var: ‘feature_types’, ‘gene_ids-0’, ‘gene_ids-1’, 'gene_ids-2’

I hope that you don’t have different reference genomes for those samples, so the gene_ids should be the same.

I just ran into this and wrote this bit of code, to set gene_ids to the first not-null value out of those fields, and then drop the gene_ids-XX columns.

x = adata.var.loc[:, adata.var.columns[adata.var.columns.str.match("gene_ids-\d+")]]
cols = x.T.notna().idxmax()
x = x.reset_index().melt("index")
    x.set_index(["index", "variable"]).loc[zip(cols.index, cols.values), :].droplevel("variable")
adata.var.drop(adata.var.columns[adata.var.columns.str.match("gene_ids-\d+")], inplace=True, axis=1)

Also, AnnData’s docs are quite extensive on different options for concatenation of datasets: Concatenation — anndata 0.7.7.dev3+gc42e80e documentation
I haven’t tried many combinations, but maybe one option of the merge keyword argument would do that automatically.

Thank you this helped a lot!