Hi everybody,

I am trying to read the tsv file into scanpy. For some reason I could not upload it there. The tsv file is about 2GB and once I try to load it into scanpy using

sc.read_csv() or

currently testing’./data/Smajic_2020/IPDCO_hg_midbrain_UMI.tsv’, delimiter=’\t’, cache=True)

The Jupyter notebook got frozen and I could not move anywhere. I am currently running the script in Amazon Sagemaker, but loading large tsv does dot seem to work even on my local computer. Is there a particular way how to solve this problem?


I am not sure what is the problem, but I think you should be able to read the file. By any chance do you have a small subset of the data that you can try to charge to test that everything is working ok? Some other things that you can try is to read your file directly with Anndata and then just use Scanpy, or charge your file directly with python and make sure that you have the correct format or even reformat your file and try again with Scanpy.

I think I found the solution. It was the memory allocation. When I added more memory to the Sagemaker instance, it worked well and I was able to save the file.