Label transfer using customized heart cell reference

In this vignette, we will explore how to use hECA build customized reference dataset of the same organ and perform label transfer to annotate other cells.

Step 1: load packages

library(Seurat)
library(SingleR)
library(scran)
Step 2: build reference dataset from hECA
In this experiment we will build reference using two study of heart cells. Following 5 steps:
2.1 - Open hECA website, click “Cell Sorting” in the menu.
2.2 - “Add Filters” - “Organ” - type in “Heart”, click to select and include subtypes as default
2.3 - Click “Apply”. After seconds, click “Download Data” to download the keys of sorted cells, a csv file “keys.csv” will be downloaded.

2.4 - Download cells following the tutorials of ECAUGT and save the results to csv file (in Python):

rows_to_get = pd.read_csv('keys.csv')
rows_to_get = [[('cid',i)] for i in rows_to_get['cid']]
result = ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get, cols_to_get=None,
                                       col_filter=gene_condition, do_transfer = True,
                                       thread_num = multiprocessing.cpu_count()-1)
genes = result.columns[:43878]
metaCols = result.columns[43878:43878+18]
expr = result.loc[:,genes]
meta = result.loc[:,metaCols]
expr.to_csv("hECA_exprs.csv", index=True)
meta.to_csv("hECA_metadata.csv", index=True)

2.5 - Load downloaded expression matrix and metadata as customized reference dataset (continue in R).

expr <- read.csv("hECA_exprs.csv", header=T, row.names=1)
meta <- read.csv("hECA_metadata.csv", header=T, row.names=1)
Step 3: load query data
Please replace the “query_data.csv” to the path of your query dataset.
query_path <- "query_data.csv"
query_data <- read.csv(query_path, header=T, row.names=1)
Step 4: perform label transfer with SingleR
Now we will use SingleR to transfer labels from reference data to query data.

Step 4.1: Train SingleR model

# get labels
ct.ref <- as.character(meta$cell_type)

# train model
trainedR <- trainSingleR(expr, ct.ref, de.method = "wilcox")

## (optional) save trained model
# save(trainedR,file = "trainedModel.Rdata")

Step 4.2: Predict labels of query data

predict <- classifySingleR(query_data,trainedR)

Step 4.3: Check prediction results

# get true labels
truth <- obj.query$cell_type

# construct result dataframe
df.result <- predict[,c("pruned.labels","labels")]
df.result$truth <- truth
df.result <- data.frame(df.result)
## (optional) save results
# write.csv(df.result,"result1.csv")

# draw confusion matrix and accuarcy scores
caret::confusionMatrix(factor(df.result$pruned.labels,levels=unique(df.result$truth)),factor(df.result$truth,levels=unique(df.result$truth)))