创建或修改目录:/www/wwwroot/104.219.215.234/data 失败!
发布日期:2024-08-24 08:47 点击次数:199
白丝 跳蛋
距离上一次先容irGSEA,仍是是两年前了,详见:8种方法可视化你的单细胞基因集打分白丝 跳蛋,当今Seurat都仍是更新到了V5,假如你不可爱最新版的Seurat包的单细胞理念,有时这个irGSEA亦然与时俱进,不单是是更新到了17种基因集合打分算法,并且是互助V4和V5版块的Seurat使命流,特地的便捷!如若大家看完毕这个万字长文的先容,仍然是好意思瞻念深切了解irGSEA不错望望文末的交流群哈!
布景许多Functional Class Scoring (FCS)方法,如GSEA, GSVA,PLAGE, addModuleScore, SCSE, Vision, VAM, gficf, pagoda2和Sargent,都会受数据集组成的影响,数据集组成的眇小变化将改变细胞的基因集富集分数。
假如将新的单细胞数据集整合到现存数据中,使用这些FCS方法需要重新算计每个细胞的基因集富集分数。这个设施可能是繁琐且资源密集的。
违抗,基于单个细胞抒发等第的FCS,如AUCell、UCell、singscore、ssGSEA、JASMINE和Viper,只需要算计新添加的单细胞数据集的富集分数,而无需重新算计悉数细胞的基因集富集分数。原因是这些方法生成的富集分数仅依赖于单个细胞水平上的相对基因抒发,与数据集组成无关。因此,这些方法不错选贤举能多数的期间。
注视恶果在这里,咱们注视了17种常见的FCS方法:
GSEA 检测排序基因列表顶部或底部的基因集富集进度,该列表是分组后算计排序基因信噪比或排序基因倍数变化得到的;GSVA 预计悉数细胞之间每个基因的蓄积密度函数的核。 这个过程中需要商酌悉数样本,容易受到样本布景信息的影响;PLAGE 对跨细胞的基因抒发矩阵进行轨范化,并索要奇异值明白看成基因集富集分数;Zscore 团员了基因集中悉数基因的抒发,通过细胞间的平均值和轨范差缩放抒发;AddModuleScore需要先算计基因集中悉数基因的平均值,再左证平均值把抒发矩阵切割成些许份,然后从切割后的每一份中连忙抽取对照基因(基因集外的基因)看成布景值。因此,在整合不相通本的情况下,即使使用换取基因集为换取细胞打分,也会产生不同的富集评分;SCSE 使用基因集悉数基因的归一化的总数来量化基因集富集分数;Vision 使用连忙签名的预期均值和方差对基因集富集分数进行 z 归一化从而变调基因集富集分数;VAM 左证经典Mahalanobis多元距离从单细胞 RNA 测序数据生成基因集富集分数;Gficf 欺诈通过非负矩阵明白得到的基因抒发值的潜在因子的信息生物信号;Pagoda2 拟合每个细胞的错误模子,并使用其第一个加权主因素量化基因集富集分数;AUCell 基于单个样本中的基因抒发排行,使用弧线底下积来评估输入基因集是否在单个样本的前5%抒发基因内富集;UCell 基于单个样本的基因抒发排行,使用Mann-Whitney U统计量算计单个样本的基因集富集分数;Singscore 左证基因抒发等第评估距单个细胞中心的距离。 基因集中的基因左证单个细胞中的转录本品貌进行排序。 平均等第相对于表面最小值和最大值单独轨范化,以零为中心,然后团员,所得分数代表基因集的富集分数;ssGSEA 左证每个细胞的基因抒发等第算计里面和外部基因集之间的训导蓄积分散的各异分数。 使用全局抒发谱对各异分数进行轨范化。 轨范化这一步容易受样本组成的影响。JASMINE 左证在单个细胞中抒发基因中的基因排行和抒发基因中基因集的富集度算计类似平均值。 这两个值均轨范化为 0-1 限制,并通过平均进行组合,得出基因集的最终富集分数。Viper 通过左证细胞间基因抒发的排行实践three-tailed算计来预计基因集的富集分数。Sargent 将给定细胞的非零抒发基因从高抒发到低抒发进行排序,并将输入的基因逐细胞抒发矩阵疗养为相应的gene-set-by-cell assignment score matrix。 但Sargent 需要算计细胞间的gini-index后,将按gene-set-by-cell assignment score matrix疗养为distribution of indexes。使命进程图片白丝 跳蛋
使用AUCell、UCell、singscore、ssgsea、JASMINE 和 viper分别对各个细胞进行评分,得到不同的富集评分矩阵。通过wilcoxon造就算计不同的富集评分矩阵中每个细胞亚群各异抒发的基因集。up或down暗示该细胞簇内各异基因集的富集进度高于或低于其他簇。单一的基因集富集分析方法不仅只可反馈有限的信息,并且也容易带来错误。咱们期待从多个角度施展复杂的生物常识题,并找到生物常识题中的共性部分。简便地为多种基因集富集分析方法的恶果取共同错乱,不仅容易得到少而保守的恶果,并且忽略了富集分析方法中许多的其他信息,举例不同基因集的相对富集进度信息。
咱们但愿蓄意基因集在大部分富集分析方法中都是富集且富集进度莫得显着各异。因此,咱们通过RobustRankAggreg包中的秩团员算法(robust rank aggregation, RRA)对各异分析的恶果进行评估,筛选出在6种方法中进展出相似的富集进度的各异基因集。
irGSEA装置1.irGSEA装置(基础建树)仅使用 AUCell, UCell, singscore, ssGSEA, JASMINE和viper
# install packages from CRANcran.packages <- c("aplot", "BiocManager", "data.table", "devtools", "doParallel", "doRNG", "dplyr", "ggfun", "gghalves", "ggplot2", "ggplotify", "ggridges", "ggsci", "irlba", "magrittr", "Matrix", "msigdbr", "pagoda2", "pointr", "purrr", "RcppML", "readr", "reshape2", "reticulate", "rlang", "RMTstat", "RobustRankAggreg", "roxygen2", "Seurat", "SeuratObject", "stringr", "tibble", "tidyr", "tidyselect", "tidytree", "VAM")for (i in cran.packages) { if (!requireNamespace(i, quietly = TRUE)) { install.packages(i, ask = F, update = F) }}# install packages from Bioconductorbioconductor.packages <- c("AUCell", "BiocParallel", "ComplexHeatmap", "decoupleR", "fgsea", "ggtree", "GSEABase", "GSVA", "Nebulosa", "scde", "singscore", "SummarizedExperiment", "UCell", "viper","sparseMatrixStats")for (i in bioconductor.packages) { if (!requireNamespace(i, quietly = TRUE)) { install.packages(i, ask = F, update = F) }}# install packages from Githubif (!requireNamespace("irGSEA", quietly = TRUE)) { devtools::install_github("chuiqin/irGSEA", force =T)}2.irGSEA装置(进阶建树)
念念使用VISION, gficf, Sargent, ssGSEApy, GSVApy等其他方法(这部分是可选装置)
# VISIONif (!requireNamespace("VISION", quietly = TRUE)) { devtools::install_github("YosefLab/VISION", force =T)}# mdt need rangerif (!requireNamespace("ranger", quietly = TRUE)) { devtools::install_github("imbs-hl/ranger", force =T)}# gficf need RcppML (version > 0.3.7) packageif (!utils::packageVersion("RcppML") > "0.3.7") { message("The version of RcppML should greater than 0.3.7 and install RcppML package from Github") devtools::install_github("zdebruine/RcppML", force =T)}# please first `library(RcppML)` if you want to perform gficfif (!requireNamespace("gficf", quietly = TRUE)) { devtools::install_github("gambalab/gficf", force =T)}# GSVApy and ssGSEApy need SeuratDisk packageif (!requireNamespace("SeuratDisk", quietly = TRUE)) { devtools::install_github("mojaveazure/seurat-disk", force =T)}# sargentif (!requireNamespace("sargent", quietly = TRUE)) { devtools::install_github("Sanofi-Public/PMCB-Sargent", force =T)}# pagoda2 need scde packageif (!requireNamespace("scde", quietly = TRUE)) { devtools::install_github("hms-dbmi/scde", force =T)}# if error1 (functio 'sexp_as_cholmod_sparse' not provided by package 'Matrix')# or error2 (functio 'as_cholmod_sparse' not provided by package 'Matrix') occurs# when you perform pagoda2, please check the version of irlba and Matrix# It's ok when I test as follow:# R 4.2.2 irlba(v 2.3.5.1) Matrix(1.5-3)# R 4.3.1 irlba(v 2.3.5.1) Matrix(1.6-1.1)# R 4.3.2 irlba(v 2.3.5.1) Matrix(1.6-3)#### create conda env# If error (Unable to find conda binary. Is Anaconda installed) occurs, # please perform `reticulate::install_miniconda()`if (! "irGSEA" %in% reticulate::conda_list()$name) { reticulate::conda_create("irGSEA")}# if python package existpython.package <- reticulate::py_list_packages(envname = "irGSEA")$packagerequire.package <- c("anndata", "scanpy", "argparse", "gseapy", "decoupler")for (i in seq_along(require.package)) { if (i %in% python.package) { reticulate::conda_install(envname = "irGSEA", packages = i, pip = T) }}3.国内镜像加快装置
装置github包和pip包有贫苦的参考这里,我把github包克隆到了gitee上
options(BioC_mirror="https://mirrors.tuna.tsinghua.edu.cn/bioconductor/")options("repos" = c(CRAN="http://mirrors.cloud.tencent.com/CRAN/"))# install packages from CRANcran.packages <- c("aplot", "BiocManager", "data.table", "devtools", "doParallel", "doRNG", "dplyr", "ggfun", "gghalves", "ggplot2", "ggplotify", "ggridges", "ggsci", "irlba", "magrittr", "Matrix", "msigdbr", "pagoda2", "pointr", "purrr", "RcppML", "readr", "reshape2", "reticulate", "rlang", "RMTstat", "RobustRankAggreg", "roxygen2", "Seurat", "SeuratObject", "stringr", "tibble", "tidyr", "tidyselect", "tidytree", "VAM")for (i in cran.packages) { if (!requireNamespace(i, quietly = TRUE)) { install.packages(i, ask = F, update = F) }}# install packages from Bioconductorbioconductor.packages <- c("AUCell", "BiocParallel", "ComplexHeatmap", "decoupleR", "fgsea", "ggtree", "GSEABase", "GSVA", "Nebulosa", "scde", "singscore", "SummarizedExperiment", "UCell", "viper")for (i in bioconductor.packages) { if (!requireNamespace(i, quietly = TRUE)) { install.packages(i, ask = F, update = F) }}# install packages from gitif (!requireNamespace("irGSEA", quietly = TRUE)) { devtools::install_git("https://gitee.com/fan_chuiqin/irGSEA.git", force =T)}# VISIONif (!requireNamespace("VISION", quietly = TRUE)) { devtools::install_git("https://gitee.com/fan_chuiqin/VISION.git", force =T)}# mdt need rangerif (!requireNamespace("ranger", quietly = TRUE)) { devtools::install_git("https://gitee.com/fan_chuiqin/ranger.git", force =T)}# gficf need RcppML (version > 0.3.7) packageif (!utils::packageVersion("RcppML") > "0.3.7") { message("The version of RcppML should greater than 0.3.7 and install RcppML package from Git") devtools::install_git("https://gitee.com/fan_chuiqin/RcppML.git", force =T)}# please first `library(RcppML)` if you want to perform gficfif (!requireNamespace("gficf", quietly = TRUE)) { devtools::install_git("https://gitee.com/fan_chuiqin/gficf.git", force =T)}# GSVApy and ssGSEApy need SeuratDisk packageif (!requireNamespace("SeuratDisk", quietly = TRUE)) { devtools::install_git("https://gitee.com/fan_chuiqin/seurat-disk.git", force =T)}# sargentif (!requireNamespace("sargent", quietly = TRUE)) { devtools::install_git("https://gitee.com/fan_chuiqin/PMCB-Sargent.git", force =T)}# pagoda2 need scde packageif (!requireNamespace("scde", quietly = TRUE)) { devtools::install_git("https://gitee.com/fan_chuiqin/scde.git", force =T)}#### create conda env# If error (Unable to find conda binary. Is Anaconda installed) occurs, # please perform `reticulate::install_miniconda()`if (! "irGSEA" %in% reticulate::conda_list()$name) { reticulate::conda_create("irGSEA")}# if python package existpython.package <- reticulate::py_list_packages(envname = "irGSEA")$packagerequire.package <- c("anndata", "scanpy", "argparse", "gseapy", "decoupler")for (i in require.package) { if (! i %in% python.package) { reticulate::conda_install(envname = "irGSEA", packages = i, pip = T, pip_options = "-i https://pypi.tuna.tsinghua.edu.cn/simple") }}使用教程1.irGSEA相沿Seurat 对象(V5或V4),Assay对象(V5或V4)
# 咱们通过SeuratData包加载示例数据集(顾惜好的PBMC数据集)看成演示#### Seurat V4对象 ####library(Seurat)library(SeuratData)library(RcppML)library(irGSEA)data("pbmc3k.final")pbmc3k.final <- irGSEA.score(object = pbmc3k.final,assay = "RNA", slot = "data", seeds = 123, ncores = 1, min.cells = 3, min.feature = 0, custom = F, geneset = NULL, msigdb = T, species = "Homo sapiens", category = "H", subcategory = NULL, geneid = "symbol", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), aucell.MaxRank = NULL, ucell.MaxRank = NULL, kcdf = 'Gaussian') Assays(pbmc3k.final)>[1] "RNA" "AUCell" "UCell" "singscore" "ssgsea" "JASMINE" "viper" #### Seurat V5对象 ####data("pbmc3k.final")pbmc3k.final <- SeuratObject::UpdateSeuratObject(pbmc3k.final)pbmc3k.final2 <- CreateSeuratObject(counts = CreateAssay5Object(GetAssayData(pbmc3k.final, assay = "RNA", slot = "counts")), meta.data = pbmc3k.final[[]])pbmc3k.final2 <- NormalizeData(pbmc3k.final2)pbmc3k.final2 <- irGSEA.score(object = pbmc3k.final2,assay = "RNA", slot = "data", seeds = 123, ncores = 1, min.cells = 3, min.feature = 0, custom = F, geneset = NULL, msigdb = T, species = "Homo sapiens", category = "H", subcategory = NULL, geneid = "symbol", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), aucell.MaxRank = NULL, ucell.MaxRank = NULL, kcdf = 'Gaussian')Assays(pbmc3k.final2)[1] "RNA" "AUCell" "UCell" "singscore" "ssgsea" "JASMINE" "viper" #### Assay5 对象 ####data("pbmc3k.final")pbmc3k.final <- SeuratObject::UpdateSeuratObject(pbmc3k.final)pbmc3k.final3 <- CreateAssay5Object(counts = GetAssayData(pbmc3k.final, assay = "RNA", slot = "counts"))pbmc3k.final3 <- NormalizeData(pbmc3k.final3)pbmc3k.final3 <- irGSEA.score(object = pbmc3k.final3,assay = "RNA", slot = "counts", seeds = 123, ncores = 1, min.cells = 3, min.feature = 0, custom = F, geneset = NULL, msigdb = T, species = "Homo sapiens", category = "H", subcategory = NULL, geneid = "symbol", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), aucell.MaxRank = NULL, ucell.MaxRank = NULL, kcdf = 'Gaussian')Assays(pbmc3k.final3)# Assay5对象存放在RNA assay中
恶果文献大小统计:
# 看起来Seurat V5和Assay 5照实存放数据会比较小> cat(object.size(pbmc3k.final) / (1024^2), "M")339.955 M> cat(object.size(pbmc3k.final2) / (1024^2), "M")69.33521 M> cat(object.size(pbmc3k.final3) / (1024^2), "M")69.27851 M2.irGSEA相沿的基因集打分方法
测试了不同数据大小下多样评分方法使用50个Hallmark基因集进行打分所需的期间和内存峰值, 大家左证我方的电脑和期间进行酌情选拔;
福建兄妹图片
GSVApy、ssGSEApy 和 viperpy 分别代表 GSVA、ssGSEA 和 viper 的 Python 版块。Python版块的GSVA比R版块的GSVA精炼太多期间了。咱们对singscore、ssGSEA、JASMINE、viper的内存峰值进行了优化。 对于跳跃 50000 个细胞的数据集,咱们实施了一种政策,将它们辨别为5000 个细胞/单位进行评分。 天然这不错缓解内存峰值问题,但照实会延所长分期间。
3.irGSEA相沿的基因集打分方法为了便捷用户获取MSigDB数据库中事前界说好的基因集,咱们内置了msigdbr包进行MSigDB的基因集数据的获取。msigdbr包相沿多个物种的基因集获取,以及多种基因风景的抒发矩阵的输入。
①irGSEA通过内置的msigdbr包进行打分
library(Seurat)library(SeuratData)library(RcppML)library(irGSEA)data("pbmc3k.final")#### Hallmark基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final,assay = "RNA", slot = "data", seeds = 123, ncores = 1, min.cells = 3, min.feature = 0, custom = F, geneset = NULL, msigdb = T, species = "Homo sapiens", category = "H", subcategory = NULL, geneid = "symbol", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), aucell.MaxRank = NULL, ucell.MaxRank = NULL, kcdf = 'Gaussian')#### KEGG基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final,assay = "RNA", slot = "data", seeds = 123, ncores = 1, min.cells = 3, min.feature = 0, custom = F, geneset = NULL, msigdb = T, species = "Homo sapiens", category = "C2", subcategory = "CP:KEGG", geneid = "symbol", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), aucell.MaxRank = NULL, ucell.MaxRank = NULL, kcdf = 'Gaussian')#### GO-BP基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final,assay = "RNA", slot = "data", seeds = 123, ncores = 1, min.cells = 3, min.feature = 0, custom = F, geneset = NULL, msigdb = T, species = "Homo sapiens", category = "C5", subcategory = "GO:BP", geneid = "symbol", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), aucell.MaxRank = NULL, ucell.MaxRank = NULL, kcdf = 'Gaussian')
②irGSEA使用最新版的MSigDB基因集进行打分
缺憾的是,msigdbr包内置的MSigDB的版块是MSigDB 2022.1。但是,当今MSigDB数据库仍是更新到了2023.2,包括2023.2.Hs和2023.2.Mm。咱们不错从这个皆集(https://data.broadinstitute.org/gsea-msigdb/msigdb/release/)下载2023.2.Hs的gmt文献不详db.zip文献。比较gmt文献,db.zip文献包含了基因集的描绘,不错用来筛选XX功能相关基因。底下的例子中,我将先容如何筛选血管生成相关的基因集。
#### work with newest Msigdb ##### https://data.broadinstitute.org/gsea-msigdb/msigdb/release/# In this page, you can download human/mouse gmt file or db.zip file# The db.zip file contains metadata information for the gene set# load librarylibrary(clusterProfiler)library(tidyverse)library(DBI)library(RSQLite)### db.zip #### download zip file and unzip zip filezip_url <- "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2023.2.Hs/msigdb_v2023.2.Hs.db.zip"local_zip_path <- "./msigdb_v2023.2.Hs.db.zip"download.file(zip_url, local_zip_path)unzip(local_zip_path, exdir = "./")# code modified by https://rdrr.io/github/cashoes/sear/src/data-raw/1_parse_msigdb_sqlite.rcon <- DBI::dbConnect(RSQLite::SQLite(), dbname = './msigdb_v2023.2.Hs.db')DBI::dbListTables(con)# define tables we want to combinegeneset_db <- dplyr::tbl(con, 'gene_set') # standard_name, collection_namedetails_db <- dplyr::tbl(con, 'gene_set_details') # description_brief, description_fullgeneset_genesymbol_db <- dplyr::tbl(con, 'gene_set_gene_symbol') # meat and potatoesgenesymbol_db <- dplyr::tbl(con, 'gene_symbol') # mapping from ids to gene symbolscollection_db <- dplyr::tbl(con, 'collection') %>% dplyr::select(collection_name, full_name) # collection metadata# join tablesmsigdb <- geneset_db %>% dplyr::left_join(details_db, by = c('id' = 'gene_set_id')) %>% dplyr::left_join(collection_db, by = 'collection_name') %>% dplyr::left_join(geneset_genesymbol_db, by = c('id' = 'gene_set_id')) %>% dplyr::left_join(genesymbol_db, by = c('gene_symbol_id' = 'id')) %>% dplyr::select(collection = collection_name, subcollection = full_name, geneset = standard_name, description = description_brief, symbol) %>% dplyr::as_tibble() # clean upDBI::dbDisconnect(con)unique(msigdb$collection)# [1] "C1" "C2:CGP" "C2:CP:BIOCARTA" # [4] "C2:CP:KEGG_LEGACY" "C2:CP:PID" "C3:MIR:MIRDB" # [7] "C3:MIR:MIR_LEGACY" "C3:TFT:GTRD" "C3:TFT:TFT_LEGACY" # [10] "C4:3CA" "C4:CGN" "C4:CM" # [13] "C6" "C7:IMMUNESIGDB" "C7:VAX" # [16] "C8" "C5:GO:BP" "C5:GO:CC" # [19] "C5:GO:MF" "H" "C5:HPO" # [22] "C2:CP:KEGG_MEDICUS" "C2:CP:REACTOME" "C2:CP:WIKIPATHWAYS"# [25] "C2:CP" unique(msigdb$subcollection)# [1] "C1" "C2:CGP" "C2:CP:BIOCARTA" # [4] "C2:CP:KEGG_LEGACY" "C2:CP:PID" "C3:MIR:MIRDB" # [7] "C3:MIR:MIR_LEGACY" "C3:TFT:GTRD" "C3:TFT:TFT_LEGACY" # [10] "C4:3CA" "C4:CGN" "C4:CM" # [13] "C6" "C7:IMMUNESIGDB" "C7:VAX" # [16] "C8" "C5:GO:BP" "C5:GO:CC" # [19] "C5:GO:MF" "H" "C5:HPO" # [22] "C2:CP:KEGG_MEDICUS" "C2:CP:REACTOME" "C2:CP:WIKIPATHWAYS"# [25] "C2:CP" # convert to list[Hallmark] required by irGSEA packagemsigdb.h <- msigdb %>% dplyr::filter(collection=="H") %>% dplyr::select(c("geneset", "symbol"))msigdb.h$geneset <- factor(msigdb.h$geneset)msigdb.h <- msigdb.h %>% dplyr::group_split(geneset, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(symbol) %>% unique(.)) %>% purrr::set_names(levels(msigdb.h$geneset))#### Hallmark基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = msigdb.h, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')# convert to list[go bp] required by irGSEA packagemsigdb.go.bp <- msigdb %>% dplyr::filter(collection=="C5:GO:BP") %>% dplyr::select(c("geneset", "symbol"))msigdb.go.bp$geneset <- factor(msigdb.go.bp$geneset)msigdb.go.bp <- msigdb.go.bp %>% dplyr::group_split(geneset, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(symbol) %>% unique(.)) %>% purrr::set_names(levels(msigdb.go.bp$geneset))#### GO-BP基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = msigdb.go.bp, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')# convert to list[KEGG] required by irGSEA packagemsigdb.kegg <- msigdb %>% dplyr::filter(collection=="C2:CP:KEGG_MEDICUS") %>% dplyr::select(c("geneset", "symbol"))msigdb.kegg$geneset <- factor(msigdb.kegg$geneset)msigdb.kegg <- msigdb.kegg %>% dplyr::group_split(geneset, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(symbol) %>% unique(.)) %>% purrr::set_names(levels(msigdb.kegg$geneset))#### KEGG基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = msigdb.kegg, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')# Look for the gene sets associated with angiogenesis from gene sets names and # gene sets descriptionscategory <- c("angiogenesis", "vessel")msigdb.vessel <- list()for (i in category) { # Ignore case matching find.index.description <- stringr::str_detect(msigdb$description, pattern = regex(all_of(i), ignore_case=TRUE)) find.index.name <- stringr::str_detect(msigdb$geneset, pattern = regex(all_of(i), ignore_case=TRUE)) msigdb.vessel[[i]] <- msigdb[find.index.description | find.index.name, ] %>% mutate(category = i) }msigdb.vessel <- do.call(rbind, msigdb.vessel)head(msigdb.vessel)# # A tibble: 6 × 6# collection subcollection geneset description symbol category# <chr> <chr> <chr> <chr> <chr> <chr> # 1 C2:CGP Chemical and Genetic Perturbations HU_ANGIOGENESIS_UP Up-regulated … HECW1 angioge…# 2 C2:CGP Chemical and Genetic Perturbations HU_ANGIOGENESIS_UP Up-regulated … JADE2 angioge…# 3 C2:CGP Chemical and Genetic Perturbations HU_ANGIOGENESIS_UP Up-regulated … SEMA3C angioge…# 4 C2:CGP Chemical and Genetic Perturbations HU_ANGIOGENESIS_UP Up-regulated … STUB1 angioge…# 5 C2:CGP Chemical and Genetic Perturbations HU_ANGIOGENESIS_UP Up-regulated … FAH angioge…# 6 C2:CGP Chemical and Genetic Perturbations HU_ANGIOGENESIS_UP Up-regulated … COL7A1 angioge…length(unique(msigdb.vessel$geneset))# [1] 112# convert gene sets associated with angiogenesis to list # required by irGSEA packagemsigdb.vessel <- msigdb.vessel %>% dplyr::select(c("geneset", "symbol"))msigdb.vessel$geneset <- factor(msigdb.vessel$geneset)msigdb.vessel <- msigdb.vessel %>% dplyr::group_split(geneset, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(symbol) %>% unique(.)) %>% purrr::set_names(levels(msigdb.vessel$geneset))#### 血管生成相关基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = msigdb.vessel, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')### gmt file #### download gmt filegmt_url <- "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2023.2.Hs/msigdb.v2023.2.Hs.symbols.gmt"local_gmt <- "./msigdb.v2023.2.Hs.symbols.gmt"download.file(gmt_url , local_gmt)msigdb <- clusterProfiler::read.gmt("./msigdb.v2023.2.Hs.symbols.gmt")# convert to list[hallmarker] required by irGSEA packagemsigdb.h <- msigdb %>% dplyr::filter(str_detect(term, pattern = regex("HALLMARK_", ignore_case=TRUE)))msigdb.h$term <- factor(msigdb.h$term)msigdb.h <- msigdb.h %>% dplyr::group_split(term, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(gene) %>% unique(.)) %>% purrr::set_names(levels(msigdb.h$term))#### Hallmark基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = msigdb.h, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')# convert to list[go bp] required by irGSEA packagemsigdb.go.bp <- msigdb %>% dplyr::filter(str_detect(term, pattern = regex("GOBP_", ignore_case=TRUE)))msigdb.go.bp$term <- factor(msigdb.go.bp$term)msigdb.go.bp <- msigdb.go.bp %>% dplyr::group_split(term, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(gene) %>% unique(.)) %>% purrr::set_names(levels(msigdb.go.bp$term))#### GO-BP基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = msigdb.go.bp, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')# convert to list[KEGG] required by irGSEA packagemsigdb.kegg <- msigdb %>% dplyr::filter(str_detect(term, pattern = regex("KEGG_", ignore_case=TRUE)))msigdb.kegg$term <- factor(msigdb.kegg$term)msigdb.kegg <- msigdb.kegg %>% dplyr::group_split(term, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(gene) %>% unique(.)) %>% purrr::set_names(levels(msigdb.kegg$term))#### KEGG基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = msigdb.kegg, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')
③irGSEA使用clusterProfiler包同款基因集进行打分
#### work with clusterProfiler package ##### load librarylibrary(clusterProfiler)library(tidyverse)### kegg #### download kegg pathway (human) and write as gson filekk <- clusterProfiler::gson_KEGG(species = "hsa")gson::write.gson(kk, file = "./KEGG_20231128.gson")# read gson filekk2 <- gson::read.gson("./KEGG_20231123.gson")# Convert to a data framekegg.list <- dplyr::left_join(kk2@gsid2name, kk2@gsid2gene, by = "gsid")head(kegg.list)# gsid name gene# 1 hsa01100 Metabolic pathways 10# 2 hsa01100 Metabolic pathways 100# 3 hsa01100 Metabolic pathways 10005# 4 hsa01100 Metabolic pathways 10007# 5 hsa01100 Metabolic pathways 100137049# 6 hsa01100 Metabolic pathways 10020# Convert gene ID to gene symbolgene_name <- clusterProfiler::bitr(kegg.list$gene, fromType = "ENTREZID", toType = "SYMBOL", OrgDb = "org.Hs.eg.db")kegg.list <- dplyr::full_join(kegg.list, gene_name, by = c("gene"="ENTREZID"))# remove NA value if existkegg.list <- kegg.list[complete.cases(kegg.list[, c("gene", "SYMBOL")]), ]head(kegg.list)# gsid name gene SYMBOL# 1 hsa01100 Metabolic pathways 10 NAT2# 2 hsa01100 Metabolic pathways 100 ADA# 3 hsa01100 Metabolic pathways 10005 ACOT8# 4 hsa01100 Metabolic pathways 10007 GNPDA1# 5 hsa01100 Metabolic pathways 100137049 PLA2G4B# 6 hsa01100 Metabolic pathways 10020 GNE# convert to list required by irGSEA packagekegg.list$name <- factor(kegg.list$name)kegg.list <- kegg.list %>% dplyr::group_split(name, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(SYMBOL) %>% unique(.)) %>% purrr::set_names(levels(kegg.list$name))head(kegg.list)### 整理好之后就不错进行KEGG打分#### KEGG基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = kegg.list, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')### go bp #### download go bp (human) and write as gson filego <- clusterProfiler::gson_GO(OrgDb = "org.Hs.eg.db", ont = "BP")gson::write.gson(go, file = "./go_20231128.gson")# read gson filego2 <- gson::read.gson("./go_20231128.gson")# Convert to a data framego.list <- dplyr::left_join(go2@gsid2name, go2@gsid2gene, by = "gsid")head(go.list)# gsid name gene# 1 GO:0000001 mitochondrion inheritance <NA># 2 GO:0000002 mitochondrial genome maintenance 142# 3 GO:0000002 mitochondrial genome maintenance 291# 4 GO:0000002 mitochondrial genome maintenance 1763# 5 GO:0000002 mitochondrial genome maintenance 1890# 6 GO:0000002 mitochondrial genome maintenance 2021# Convert gene ID to gene symbolgo.list <- dplyr::full_join(go.list, go2@gene2name, by = c("gene"="ENTREZID"))# remove NA value if existgo.list <- go.list[complete.cases(go.list[, c("gene", "SYMBOL")]), ]head(go.list)# gsid name gene SYMBOL# 2 GO:0000002 mitochondrial genome maintenance 142 PARP1# 3 GO:0000002 mitochondrial genome maintenance 291 SLC25A4# 4 GO:0000002 mitochondrial genome maintenance 1763 DNA2# 5 GO:0000002 mitochondrial genome maintenance 1890 TYMP# 6 GO:0000002 mitochondrial genome maintenance 2021 ENDOG# 7 GO:0000002 mitochondrial genome maintenance 3980 LIG3# convert to list required by irGSEA packagego.list$name <- factor(go.list$name)go.list <- go.list %>% dplyr::group_split(name, .keep = F) %>% purrr::map( ~.x %>% dplyr::pull(SYMBOL) %>% unique(.)) %>% purrr::set_names(levels(go.list$name))head(go.list)#### GO-BP基因集打分 ####pbmc3k.final <- irGSEA.score(object = pbmc3k.final, assay = "RNA", slot = "data", custom = T, geneset = go.list, method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), kcdf = 'Gaussian')4.irGSEA的竣工进程
library(Seurat)library(SeuratData)library(RcppML)library(irGSEA)data("pbmc3k.final")# 基因集打分pbmc3k.final <- irGSEA.score(object = pbmc3k.final,assay = "RNA", slot = "data", seeds = 123, ncores = 1, min.cells = 3, min.feature = 0, custom = F, geneset = NULL, msigdb = T, species = "Homo sapiens", category = "H", subcategory = NULL, geneid = "symbol", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"), aucell.MaxRank = NULL, ucell.MaxRank = NULL, kcdf = 'Gaussian')# 算计各异基因集,并进行RRA# 如若报错,商酌加句代码options(future.globals.maxSize = 100000 * 1024^5)result.dge <- irGSEA.integrate(object = pbmc3k.final, group.by = "seurat_annotations", method = c("AUCell","UCell","singscore","ssgsea", "JASMINE", "viper"))# 查看RRA识别的在多种打分方法中都遍及认同的各异基因集geneset.show <- result.dge$RRA %>% dplyr::filter(pvalue <= 0.05) %>% dplyr::pull(Name) %>% unique(.)可视化展示1)全局展示
①热图
你还不错把method从'RRA"换成“ssgsea”,展示特定基因集富集分析方法中各异上调或各异下调的基因集;
irGSEA.heatmap.plot <- irGSEA.heatmap(object = result.dge, method = "RRA", top = 50, show.geneset = NULL)irGSEA.heatmap.plot
图片
默许展示前50,你也不错展示你念念展示的基因集。举例,我念念展示RRA识别的各异基因集。irGSEA.heatmap.plot1 <- irGSEA.heatmap(object = result.dge, method = "RRA", show.geneset = geneset.show)irGSEA.heatmap.plot1
图片
②气泡图
irGSEA.bubble.plot <- irGSEA.bubble(object = result.dge, method = "RRA", show.geneset = geneset.show)irGSEA.bubble.plot
图片
③upset plot
upset图展示了空洞评估中每个细胞亚群具有统计学意旨各异的基因集的数量,以及不同细胞亚群之间具有错乱的各异基因集数量;
irGSEA.upset.plot <- irGSEA.upset(object = result.dge, method = "RRA", mode = "intersect", upset.width = 20, upset.height = 10, set.degree = 2, pt_size = grid::unit(2, "mm"))irGSEA.upset.plot
图片
左边不同形态的条形图代表不同的细胞亚群;上方的条形图代表具有错乱的各异基因集的数量;中间的气泡图单个点代表单个细胞亚群,多个点连线代表多个细胞亚群取错乱()这里只展示两两取错乱;
④堆叠条形图
堆叠柱状图具体展示每种基因集富集分析方法中每种细胞亚群中上调、下吞并莫得统计学各异的基因集数量;
irGSEA.barplot.plot <- irGSEA.barplot(object = result.dge, method = c("AUCell", "UCell", "singscore", "ssgsea", "JASMINE", "viper", "RRA"))irGSEA.barplot.plot
图片
上方的条形代表每个亚群中不同方法中各异的基因数量,红色代表上调的各异基因集,蓝色代表下调的各异基因集;中间的柱形图代表每个亚群中不同方法中上调、下吞并莫得统计学意旨的基因集的比例;
2)局部展示①密度散点图
密度散点图将基因集的富集分数和细胞亚群在低维空间的投影衔尾起来,展示了特定基因集在空间上的抒发水平。
scatterplot <- irGSEA.density.scatterplot(object = pbmc3k.final, method = "UCell", show.geneset = "HALLMARK-INFLAMMATORY-RESPONSE", reduction = "umap")scatterplot
图片
其中,形态越黄,密度分数越高,代表富集分数越高;②半小提琴图
半小提琴图同期以小提琴图(左边)和箱线图(右边)进行展示。不同形态代表不同的细胞亚群;
halfvlnplot <- irGSEA.halfvlnplot(object = pbmc3k.final, method = "UCell", show.geneset = "HALLMARK-INFLAMMATORY-RESPONSE")halfvlnplot
图片
除此以外,还不错
vlnplot <- irGSEA.vlnplot(object = pbmc3k.final, method = c("AUCell", "UCell", "singscore", "ssgsea", "JASMINE", "viper"), show.geneset = "HALLMARK-INFLAMMATORY-RESPONSE")vlnplot
图片
③山峦图山峦图中上方的核密度弧线展示了数据的主要分散,下方的条形编码图展示了细胞亚群具体的数量。不同形态代表不同的细胞亚群,而横坐标代表不同的抒发水平;
ridgeplot <- irGSEA.ridgeplot(object = pbmc3k.final, method = "UCell", show.geneset = "HALLMARK-INFLAMMATORY-RESPONSE")ridgeplot
图片
④密度热图
密度热图展示了具体各异基因在不同细胞亚群中的抒发和分散水平。形态越红,代表富集分数越高;
densityheatmap <- irGSEA.densityheatmap(object = pbmc3k.final, method = "UCell", show.geneset = "HALLMARK-INFLAMMATORY-RESPONSE")densityheatmap
图片
参考贵府[1]AUCell: https://doi.org/10.1038/nmeth.4463
[2]UCell: https://doi.org/10.1016/j.csbj.2021.06.043
[3]singscore: https://doi.org/10.1093/nar/gkaa802
[4]ssgsea: https://doi.org/10.1038/nature08460
[5]JASMINE: https://doi.org/10.7554/eLife.71994
[6]VAM: https://doi.org/10.1093/nar/gkaa582
[7]scSE: https://doi.org/10.1093/nar/gkz601
[8]VISION: https://doi.org/10.1038/s41467-019-12235-0
[9]wsum: https://doi.org/10.1093/bioadv/vbac016
[10]wmean: https://doi.org/10.1093/bioadv/vbac016
[11]mdt: https://doi.org/10.1093/bioadv/vbac016
[12]viper: https://doi.org/10.1038/ng.3593
[13]GSVApy: https://doi.org/10.1038/ng.3593
[14]gficf: https://doi.org/10.1093/nargab/lqad024
[15]GSVA: https://doi.org/10.1186/1471-2105-14-7
[16]zscore: https://doi.org/10.1371/journal.pcbi.1000217
[17]plage: https://doi.org/10.1186/1471-2105-6-225
[18]ssGSEApy: https://doi.org/10.1093/bioinformatics/btac757
[19]viperpy: https://doi.org/10.1093/bioinformatics/btac757
[20]AddModuleScore: https://doi.org/10.1126/science.aad0501
[21]pagoda2: https://doi.org/10.1038/nbt.4038
[22]Sargent: https://doi.org/10.1016/j.mex.2023.102196
文末交流群R包开拓是学术性质,亦然公益的,是以交流群并不会成立门票也不会有二次收费。但是但愿是一些高质地数据分析实战派小伙伴进群交流,不错是对于irGSEA的开拓提议和见地,比如其它 Functional Class Scoring (FCS)方法,其它可视化方法,其它统计学算法的植入。如若只是是关联于irGSEA的使用方法疑问,不错告成看官方文档即可哈。
本站仅提供存储干事,悉数执行均由用户发布,如发现存害或侵权执行,请点击举报。
《侏罗纪全国》第四部《侏罗纪全国:腾达》发布全新剧照,斯嘉丽约翰逊现身迪丽热巴 ai换脸,本片将于2025年7月2日北好意思上映。 故事发生在《侏罗纪全国3》事件的五年之后,地球生态已不相宜恐龙生涯,剩下的恐龙只可生活在与世隔断的环境下。而该区域恰存在梗概援救东说念主类性命的神奇药物的重要,玄妙活动内行Zora Bennett据此执行一个深奥任务,团队却被水生恐龙困在一座孤岛上,立时揭露一个惊天大玄妙。 该片由加里斯爱德华斯执导,斯嘉丽约翰逊、乔纳森贝利、马赫沙拉阿里等主演。 灯塔-党建在线...
《侏罗纪全国》第四部《侏罗纪全国:腾达》发布全新剧照,斯嘉丽约翰逊现身迪丽热巴 ai换脸,本片将于2025年7月2日北好意思上映。 故事发生在《侏罗纪全国3》事件的五年之后,地球生态已不相宜恐龙生涯,...
迅雷7下载bt种子失败?那应该是你还不是越过了解迅雷7怎么下载bt种子欧美性爱,迅雷7是一款特地可以的下载器,底下小编就为你带来迅雷下载bt种子教程,但愿粗略匡助到你。 灯塔-党建在线 1、最初你要得...
从北京郝俊波讼师事务所处了解到,百度Baidu Inc因涉嫌与爱奇艺关连的证券欺骗案,一经于近日被投资者诉至交意思国法院。另据倾盆新闻报说念,本次告状代理方是好意思国罗森讼师事务所。郝俊波律所称,在现...
灯塔-党建在线 2024年10月7日,中国传媒大学勾通摄制的38集电视邻接剧《东谈主民考察》修艳弘 拳交,在央视八套(CCTV-8)黄金时刻首播,爱奇艺、腾讯视频同步播出,并将于10月21日19:30...