SingleR自动注释 - 单细胞高级教程

🎯 细胞注释痛点：聚类完成后，如何确定每个cluster是什么细胞类型？传统方法需要人工查找marker基因，费时费力且主观性强。SingleR 是一款基于参考数据集的自动注释工具，无需任何人工干预，即可给出可靠的细胞类型注释！1. SingleR 是什么？🔍 核心原理 SingleR通过比较每个细胞与参考数据集中每个细胞类型的转录组相似性，找到最匹配的细胞类型。相似性计算基于相关性或马氏距离。

✨ 核心优势 • 全自动：无需手动选择marker基因 • 客观：基于数据而非主观判断 • 可视化友好：热图展示清晰 • 多种参考集：支持Human Cell Atlas等 ⚠️ 注意：SingleR的准确性高度依赖于参考数据集的质量和相关性。请选择与你研究组织/物种匹配的参考集！

2. 安装与加载 # 从Bioconductor安装SingleR if ( requireNamespace ( "BiocManager" , quietly = TRUE )) { BiocManager :: install ( "SingleR" ) } # 加载相关包 library (SingleR) library (Seurat) library (ggplot2) library (celldex) # 参考数据集包 💡 celldex 包：包含多种常用参考数据集，如Human Primary Cell Atlas、Blueprint Encode等。

使用前需要单独安装。

3. 准备参考数据集 # 加载内置参考数据集 (Human Primary Cell Atlas) hpca.ref <- HumanPrimaryCellAtlasData () # 查看参考数据集信息 hpca.ref # class: SingleCellExperiment # dim: 19393 815 # metadata(2): label.main label.fine # 查看细胞类型 table (hpca.ref$label.main) # 可能有：CD4_T_cells, CD8_T_cells, B_cells, NK_cells, Monocytes... 📌 其他常用参考集：• BlueprintEncodeData() - Blueprint & Encode参考集 • DatabaseImmuneCellExpressionData() - 免疫细胞数据库 • NovershternHematopoieticData() - 造血细胞参考集 • MouseRNAseqData() - 小鼠参考集 4. 运行 SingleR 注释 # 假设你的Seurat对象是 seurat_obj # 提取表达矩阵用于SingleR test.data <- GetAssayData (seurat_obj, slot = "data" ) # 运行SingleR (可能需要几分钟) pred.hpca <- SingleR ( test = test.data, ref = hpca.ref, labels = hpca.ref$label.main, # 使用更为细致的标签 # labels = hpca.ref$label.fine, ) # 查看注释结果 head (pred.hpca) # 每个细胞都有：labels (细胞类型), scores (各类型得分), pruned.labels 💡 pruned.labels：SingleR会自动修剪低置信度的注释。

如果某个细胞与最佳匹配和次佳匹配的得分差异很小，会被标记为更广泛的细胞类型。

5. 将注释结果整合到 Seurat # 添加SingleR注释到Seurat对象 seurat_obj$SingleR_label <- pred.hpca$labels seurat_obj$SingleR_pruned <- pred.hpca$pruned.labels # 可视化SingleR注释结果 DimPlot (seurat_obj, group.by = "SingleR_label" , reduction = "umap" , label = TRUE ) 📌 对比分析：可以将SingleR注释与Seurat聚类结果对比，评估注释一致性：# 聚类与注释的一致性分析 table (seurat_obj$seurat_clusters, seurat_obj$SingleR_label) # 可视化每个cluster的注释分布 DimPlot (seurat_obj, group.by = "seurat_clusters" , split.by = "SingleR_label" ) 6. 可视化注释评分 # 查看每个细胞的注释得分 scores <- pred.hpca$scores # 绘制热图：每个cluster在各细胞类型上的平均得分 library (pheatmap) # 计算每个cluster的平均得分 cluster.scores <- tapply ( 1 : ncol (test.data), seurat_obj$seurat_clusters, function (x) { rowMeans (scores[, x]) }) # 转换为矩阵并绘制热图 score.matrix <- do.call (cbind, cluster.scores) pheatmap (score.matrix, cluster_rows = TRUE , cluster_cols = TRUE , color = colorRampPalette ( c ( "blue" , "white" , "red" ))( 50 )) 💡 热图解读：每一行代表一个参考细胞类型，每一列代表你的一个cluster。

颜色越红表示该cluster与该细胞类型越相似。可以直观看到哪些注释是可靠的。

7. 使用自定义参考集 # 如果你有自己的高质量注释数据，可以作为参考集 # 假设 ref.seurat 是已注释好的Seurat对象 # 提取参考数据 ref.data <- GetAssayData (ref.seurat, slot = "data" ) ref.labels <- ref.seurat$cell_type # 已有的细胞类型注释 # 使用自定义参考集运行SingleR pred.custom <- SingleR ( test = test.data, ref = ref.data, labels = ref.labels ) ⚠️ 自定义参考集要求：• 参考集必须已经过严格注释 • 细胞类型覆盖要全面 • 与目标数据物种/组织一致 • 建议使用Published数据集作为参考 📝 SingleR 注释流程总结 1 选择参考集从celldex选择合适的参考数据集 2 运行SingleR 自动计算每个细胞的最佳匹配 3 整合结果将注释添加到Seurat对象 4 可视化验证热图、UMAP验证注释准确性