Label transfer using customized heart cell reference¶
In this vignette, we will explore how to use hECA build customized reference dataset of the same organ and perform label transfer to annotate other cells.
Step 1: load packages
library(Seurat)
library(SingleR)
library(scran)
Step 2: build reference dataset from hECA
In this experiment we will build reference using two study of heart
cells. Following 5 steps:
2.1 - Open hECA website, click “Cell
Sorting” in the menu.
2.2 - “Add Filters” - “Organ” - type in “Heart”, click to select and
include subtypes as default
2.3 - Click “Apply”. After seconds, click “Download Data” to download
the keys of sorted cells, a csv file “keys.csv” will be downloaded.
2.4 - Download cells following the tutorials of ECAUGT and save the results to csv file (in Python):
rows_to_get = pd.read_csv('keys.csv')
rows_to_get = [[('cid',i)] for i in rows_to_get['cid']]
result = ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get, cols_to_get=None,
col_filter=gene_condition, do_transfer = True,
thread_num = multiprocessing.cpu_count()-1)
genes = result.columns[:43878]
metaCols = result.columns[43878:43878+18]
expr = result.loc[:,genes]
meta = result.loc[:,metaCols]
expr.to_csv("hECA_exprs.csv", index=True)
meta.to_csv("hECA_metadata.csv", index=True)
2.5 - Load downloaded expression matrix and metadata as customized reference dataset (continue in R).
expr <- read.csv("hECA_exprs.csv", header=T, row.names=1)
meta <- read.csv("hECA_metadata.csv", header=T, row.names=1)
Step 3: load query data
Please replace the “query_data.csv” to the path of your query dataset.
query_path <- "query_data.csv"
query_data <- read.csv(query_path, header=T, row.names=1)
Step 4: perform label transfer with SingleR
Now we will use SingleR to transfer labels from reference data to
query data.
Step 4.1: Train SingleR model
# get labels
ct.ref <- as.character(meta$cell_type)
# train model
trainedR <- trainSingleR(expr, ct.ref, de.method = "wilcox")
## (optional) save trained model
# save(trainedR,file = "trainedModel.Rdata")
Step 4.2: Predict labels of query data
predict <- classifySingleR(query_data,trainedR)
Step 4.3: Check prediction results
# get true labels
truth <- obj.query$cell_type
# construct result dataframe
df.result <- predict[,c("pruned.labels","labels")]
df.result$truth <- truth
df.result <- data.frame(df.result)
## (optional) save results
# write.csv(df.result,"result1.csv")
# draw confusion matrix and accuarcy scores
caret::confusionMatrix(factor(df.result$pruned.labels,levels=unique(df.result$truth)),factor(df.result$truth,levels=unique(df.result$truth)))