Error while using read_10x_mtx

Not sure if I should post this here or on Github. I am running into an error when I’m trying to read 10x cell ranger output files.

Here is my command:
s = sc.read_10x_mtx(’./HNSCC_1_PBL/’, var_names=‘gene_symbols’)

Here is the error message:

–> This might be very slow. Consider passing cache=True, which enables much faster reading from a cache file.

KeyError Traceback (most recent call last)
/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 2

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in
----> 1 s = sc.read_10x_mtx(’/Users/kulkarnia2/Box/scRNASeq/HNSCC/Combined/HNSCC_1_PBL/’, var_names=‘gene_symbols’)

~/.local/lib/python3.7/site-packages/scanpy/readwrite.py in read_10x_mtx(path, var_names, make_unique, cache, cache_compression, gex_only)
302 make_unique=make_unique,
303 cache=cache,
–> 304 cache_compression=cache_compression,
305 )
306 if genefile_exists or not gex_only:

~/.local/lib/python3.7/site-packages/scanpy/readwrite.py in _read_v3_10x_mtx(path, var_names, make_unique, cache, cache_compression)
371 else:
372 raise ValueError("var_names needs to be ‘gene_symbols’ or ‘gene_ids’")
–> 373 adata.var[‘feature_types’] = genes[2].values
374 adata.obs_names = pd.read_csv(path / ‘barcodes.tsv.gz’, header=None)[0]
375 return adata

/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in getitem(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]

/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 2

Hello there,

I’m facing the exact same problem and I tried every possible way, but couldn’t fix this. Would you please share your solution if you could solve it?

Thanks a lot!

Is the file you’re reading the direct output from cellranger?

@ivirshup Hello, I am having the same problem. I downloaded matrix/barcodes/features from GSE157990. I believe it is cell ranger direct output. I keep getting a KeyError: 2. Other matrix files from GEO work. I don’t know what the problem is.

I figured it out if anyone is wondering. You have to unzip the files because it is a different version. Copying the solution below:

So the problem is actually from GEO. When people submitted the files processed by Cellranger version 2 , they gzip-ed the files. However when Scanpy sees .gz file it recognized the version as Cellranger version 3 by default, which is a little bit different from the version 2 format.

All you need to do is just to gunzip the matrix.mtx.gz , barcodes.tsv.gz , and genes.tsv.gz files. You can tell whether the data was processed using version 2 or 3 from the description in GEO, or by the name of the uploaded gene files: genes.tsv (version 2, could be genes.tsv.gz ) or features.tsv.gz (version 3)