Scanpy --> Monocle Trajectory

Hi Everyone,

I completed a semi-automatic pipeline for scRNA in Scanpy, however, my team is really interested in monocle linear trajectories. My main question is how feasible is it to convert an anndata object into a cell dataset for monocle analysis? Is it easy to transfer the data up until from Scanpy DGE to R-monocle?

I have tried using the Theis Lab Tutorial (Current Best Practices, single-cell-tutorial/latest_notebook at master · theislab/single-cell-tutorial · GitHub), which is beautifully done, but I always receive errors when I cross over to R with the anndata object.

SlingShot:
NotImplementedError: Conversion ‘py2rpy’ not defined for objects of type ‘<class ‘anndata._core.anndata.AnnData’>’

Monocle:
R error message: ‘Error in reduceDimension(ie_regions_cds, norm_method = “vstExprs”, reduction_method = “DDRTree”, : \n Error: all rows have standard deviation zero’

TLDR:
What is the best way to get a monocle trajectory from Scanpy anndata?

I have done the same analysis before.
I run both python and R in jupyter-lab through “rpy2”

%load_ext rpy2.ipython
%matplotlib inline
# Automatically convert rpy2 outputs to pandas dataframes
pandas2ri.activate()
anndata2ri.activate()

Then split the anndata and prepare to import them into R:

#Preprocessing for monocle
data_mat_mon = adata.layers['counts'].T
var_mon=adata.var.copy()
obs_mon=adata.obs.copy()

Then import the “data”, “var”, “obs” files into R and set up the CellDataSet data structure:

%%R -i data_mat_mon -i obs_mon -i var_mon

#Set up the CellDataSet data structure
# cell barcodes
pd <- AnnotatedDataFrame(data = obs_mon)
# genes
fd <- AnnotatedDataFrame(data = var_mon)
# cell barcodes as columns of matrix
colnames(data_mat_mon) <- rownames(pd)
# genes as rows of matrix
rownames(data_mat_mon) <- rownames(fd)
ie_regions_cds <- newCellDataSet(cellData=data_mat_mon, phenoData=pd, featureData=fd, expressionFamily=negbinomial.size())

#Normalize the count data
ie_regions_cds <- estimateSizeFactors(ie_regions_cds)

#Calculate dispersions to filter for highly variable genes
ie_regions_cds <- estimateDispersions(ie_regions_cds)

Plot:

%%R
#Filter highly variable genes and clusters from our analysis
hvg_mask = fData(ie_regions_cds)$highly_variable
cell_mask = rep(FALSE, length(as.character(pData(ie_regions_cds)$louvain_r0.5)))
ie_regions_cds <- ie_regions_cds[hvg_mask, cell_mask]

#Do dimensionality reduction
ie_regions_cds <- reduceDimension(ie_regions_cds, norm_method = 'vstExprs', reduction_method='DDRTree', verbose = F, max_components = 7)

#Run for the first time to get the ordering
ie_regions_cds <- orderCells(ie_regions_cds)

#Get a nice colour map
custom_colour_map = brewer.pal(length(unique(pData(ie_regions_cds_luminal_2)$louvain_r0.5)),'Paired')

#Find the correct root state that coresponds to the 'Stem' cluster
tab1 <- table(pData(ie_regions_cds_luminal_2)$State, pData(ie_regions_cds_luminal_2)$louvain_r0.5)
id = which(colnames(tab1) == 'luminal_2')
root_name = names(which.max(tab1[,id]))

# Visualize with our cluster labels
options(repr.plot.width=5, repr.plot.height=4)

cell_trajectory <- plot_complex_cell_trajectory(ie_regions_cds_luminal_2, color_by = 'louvain_r0.5', show_branch_points = T, 
                             cell_size = 2, cell_link_size = 1, root_states = c(root_name)) +
scale_size(range = c(0.2, 0.2)) +
theme(legend.position="left", legend.title=element_blank(), legend.text=element_text(size=rel(1.5))) +
guides(colour = guide_legend(override.aes = list(size=6))) + 
scale_color_manual(values = custom_colour_map)
cell_trajectory

Hi @danli349,

Thank you for your response. I read over your code and integrated your choices into my program but I still ran into an error. Would you please check the code and error message to see if anything sticks out to you? I will post the code, error, and session info below.

#Libraries Standard 
import numpy as np
import os
import pandas as pd
import scanpy as sc
import gseapy
import matplotlib.pyplot as plt
import re
import shutil

#settings
sc.settings.set_figure_params(dpi=200, facecolor='white')
sc.settings.verbosity = 3



#Data Preparation

adata.layers["counts"] = adata.X.copy()
data_mat_mon = adata.layers['counts'].T
var_mon=adata.var.copy()
obs_mon=adata.obs.copy()



#R Libraries
import rpy2.rinterface_lib.callbacks
import logging

from rpy2.robjects import pandas2ri
import anndata2ri
from gprofiler import gprofiler

%load_ext rpy2.ipython
#%reload_ext rpy2.ipython
%matplotlib inline
# Automatically convert rpy2 outputs to pandas dataframes
pandas2ri.activate()
anndata2ri.activate()



#R - Monocle Script 

%%R -i data_mat_mon -i obs_mon -i var_mon


library(scran)
library(RColorBrewer)
library(slingshot)
library(monocle)
library(gam)
library(clusterExperiment)
library(ggplot2)
library(plyr)
library(MAST)



## Monocle Code

#Set up the CellDataSet data structure
# cell barcodes
pd <- AnnotatedDataFrame(data = obs_mon)
# genes
fd <- AnnotatedDataFrame(data = var_mon)
# cell barcodes as columns of matrix
colnames(data_mat_mon) <- rownames(pd)
# genes as rows of matrix
rownames(data_mat_mon) <- rownames(fd)
ie_regions_cds <- newCellDataSet(cellData=data_mat_mon, phenoData=pd, featureData=fd, expressionFamily=negbinomial.size())

#Normalize the count data
ie_regions_cds <- estimateSizeFactors(ie_regions_cds)

#Calculate dispersions to filter for highly variable genes
ie_regions_cds <- estimateDispersions(ie_regions_cds)


#Do dimensionality reduction
ie_regions_cds <- reduceDimension(ie_regions_cds, norm_method = 'vstExprs', reduction_method='DDRTree', verbose = T)

#Run for the first time to get the ordering
ie_regions_cds <- orderCells(ie_regions_cds)

#Get a nice colour map
custom_colour_map = brewer.pal(length(unique(pData(ie_regions_cds)$new_cluster)),'Paired')

#Find the correct root state that coresponds to the 'Stem' cluster
tab1 <- table(pData(ie_regions_cds)$State, pData(ie_regions_cds)$new_cluster)
id = which(colnames(tab1) == 'MSC')
root_name = names(which.max(tab1[,id]))

# Visualize with our cluster labels
options(repr.plot.width=5, repr.plot.height=4)

cell_trajectory <- plot_complex_cell_trajectory(ie_regions_cds, color_by = new_cluster, show_branch_points = T, 
                             cell_size = 2, cell_link_size = 1, root_states = c(root_name)) +
scale_size(range = c(0.2, 0.2)) +
theme(legend.position="left", legend.title=element_blank(), legend.text=element_text(size=rel(1.5))) +
guides(colour = guide_legend(override.aes = list(size=6))) + 
scale_color_manual(values = custom_colour_map)
cell_trajectory
Error in reduceDimension(ie_regions_cds, norm_method = "vstExprs", reduction_method = "DDRTree",  : 
  Error: all rows have standard deviation zero
Click to view session information
-----
anndata             0.7.5
anndata2ri          1.0.6
gprofiler           NA
gseapy              0.10.4
matplotlib          3.3.4
numpy               1.19.5
pandas              1.2.3
rpy2                3.4.5
scanpy              1.7.1
session_info        1.0.0
-----
Click to view modules imported as dependencies
-----
IPython             7.21.0
jupyter_client      6.1.11
jupyter_core        4.7.1
notebook            6.2.0
-----
Python 3.8.8 (v3.8.8:024d8058b0, Feb 19 2021, 08:48:17) [Clang 6.0 (clang-600.0.57)]
macOS-10.15.7-x86_64-i386-64bit
-----
Session information updated at 2021-07-16 10:39

Can you please check your data_mat_mon variable?

Hi @danli349,

My data_mat_mon returns an array, and the numbers do look similar in sequence.

display(type(data_mat_mon))
print(data_mat_mon)

numpy.ndarray
[[ 0.01175058  0.01175058  0.01175058 ... -0.00847682 -0.00847682
  -0.00847682]
 [-0.00325744 -0.00325744 -0.00325744 ...  0.00301002  0.00301002
   0.00301002]
 [-0.00160255 -0.00160255 -0.00160255 ... -0.04450267 -0.04450267
  -0.04450267]
 ...
 [ 0.00191478  0.00191478  0.00191478 ... -0.00604746 -0.00604746
  -0.00604746]
 [ 0.00014363  0.00014363  0.00014363 ... -0.01587545 -0.01587545
  -0.01587545]
 [ 0.02431658  0.02431658  0.02431658 ...  0.02026596  0.02026596
   0.02026596]]