Saving Specific Cells based on Barcodes in 1 .csv file for downstream analyses in Partek Flow

Dear Scanpy,

My question is:

Is it possible to save only the cells with certain barcodes (so barcodes that I have selected based on code I wrote myself using Numpy and other standard packages), such as ACTGAGTAGACGACGT-1 & AAAGCAAGTCACAAGG-1 and approximately 200 others) in ONE .csv file?

The output file that I would need has expression levels per gene, per cell.

Specifically, per column, and/or per row I would need:

a) cell ID (barcode)
b) gene
c) gene ID
d) expression levels.

I have tried to use the code explained here (https://adata.readthedocs.io/en/latest/adata.adata.write_csvs.html) but that gives me obs.csv, obsm.csv, var.csv and varm.csv, which I think has all the information, but I need only the above-described items in 1 file (ultimate goal: to load it into Partek Flow to make subsequent trajectory analyses).

I tried it in quite a far-fetched way to use a selection based on the Scirpy package, using the following code (where umbers refer to certain clonotypes based on shared nucleotide sequences):

adata = adata[adata.obs[“clonotype”].isin([“1527”,“1611”,“5502”,“1494”,“7112”,“296”,“3399”,“9309”,“8951”,“863”,“1991”,“3461”,“6214”,“5086”,“5864”,“3605”,“6286”,“3892”,“996”,“2488”,“100”,“6123”,“7317”,“5417”,“7279”,“2596”,“6527”,“501”,“1688”,“287”,“2984”,“3033”,“1116”,“2574”,“76”,“5250”,“1506”,“1209”,“1445”,“848”,“42”]), :]

adata.write_csvs(‘name’, sep=’,’)

This saves in Jupyter Notebook. But I’d rather use the barcodes from the initial anndata object, because that is way more efficient in this case.

I very much appreciate your help. (I am quite a newbie when it comes to using Python - so please forgive me if these are really basic questions). I think many people use Partek Flow and other programs, so I am hoping this is a relevant question for others as well.

Thanks in advance,

Josine

PS This somewhat relates to an older post with questions I asked on October 20, but I never received a reply :see_no_evil:

You can do:

adata_sub = adata[adata.obs.index.isin(list/array)]

And then adata_sub.X contains the cell-gene expression matrix, so can turn that into a pandas data frame and label columns/rows as obs.index and var.index.

Check the “subsampleAdata” function here: