The code you posted just takes the mean of particular
gene_ids of whatever format of the data is stored in
adata.raw.X. In scanpy, you control what is stored there by freezing a version of your data in
adata.raw = adata. You could do the same for the data stored in
adata.layers['counts'] (if you chose to store something in that layer) by just running either of:
obs = adata[:,gene_ids].X.toarray()
obs = adata.layers['counts'][:,gene_ids].X.toarray()
instead of the obs assignment you posted. It’s important to keep track of what you do to
adata.X when you are running a Scanpy pipeline, so I cannot tell you what you decided to store in
adata.raw.X for that matter).
With regards to your other question: there is no standard way that determines how the mean expression should be calculated (log normalized, or just normalized). We typically work with log-normalized gene expression, however linear functions,
f(), like the mean will give you different results if you do:
log(f(Data)). In most cases that I can recall, people determine one unit of expression they will work with and then perform means or other functions on that unit of expression. If you decide to work on log-normalized data, then you would run
mean(log(Data)) in that case. I guess one might regard log-normalized expression as a more meaningful biological unit for expression data (given that differences are fold changes on this scale) and therefore directly work on this scale. I guess it would be technically better to logarithmize the mean again though, although I have not come across this.