Retrieve a data frame of gene sets and their member genes.
The available species and collections can be checked with msigdbr_species()
and msigdbr_collections()
.
Arguments
- species
Species name, such as Homo sapiens or Mus musculus.
- category
MSigDB collection abbreviation, such as H or C1.
- subcategory
MSigDB sub-collection abbreviation, such as CGP or BP.
Examples
# get all human gene sets
# \donttest{
msigdbr(species = "Homo sapiens")
#> # A tibble: 4,440,672 × 15
#> gs_cat gs_sub…¹ gs_name gene_…² entre…³ ensem…⁴ human…⁵ human…⁶ human…⁷ gs_id
#> <chr> <chr> <chr> <chr> <int> <chr> <chr> <int> <chr> <chr>
#> 1 C3 MIR:MIR… AAACCA… ABCC4 10257 ENSG00… ABCC4 10257 ENSG00… M126…
#> 2 C3 MIR:MIR… AAACCA… ABRAXA… 23172 ENSG00… ABRAXA… 23172 ENSG00… M126…
#> 3 C3 MIR:MIR… AAACCA… ACTN4 81 ENSG00… ACTN4 81 ENSG00… M126…
#> 4 C3 MIR:MIR… AAACCA… ACTN4 81 ENSG00… ACTN4 81 ENSG00… M126…
#> 5 C3 MIR:MIR… AAACCA… ACVR1 90 ENSG00… ACVR1 90 ENSG00… M126…
#> 6 C3 MIR:MIR… AAACCA… ADAM9 8754 ENSG00… ADAM9 8754 ENSG00… M126…
#> 7 C3 MIR:MIR… AAACCA… ADAM9 8754 ENSG00… ADAM9 8754 ENSG00… M126…
#> 8 C3 MIR:MIR… AAACCA… ADAMTS5 11096 ENSG00… ADAMTS5 11096 ENSG00… M126…
#> 9 C3 MIR:MIR… AAACCA… AMER2 219287 ENSG00… AMER2 219287 ENSG00… M126…
#> 10 C3 MIR:MIR… AAACCA… ANK2 287 ENSG00… ANK2 287 ENSG00… M126…
#> # … with 4,440,662 more rows, 5 more variables: gs_pmid <chr>, gs_geoid <chr>,
#> # gs_exact_source <chr>, gs_url <chr>, gs_description <chr>, and abbreviated
#> # variable names ¹gs_subcat, ²gene_symbol, ³entrez_gene, ⁴ensembl_gene,
#> # ⁵human_gene_symbol, ⁶human_entrez_gene, ⁷human_ensembl_gene
# }
# get mouse C2 (curated) CGP (chemical and genetic perturbations) gene sets
# \donttest{
msigdbr(species = "Mus musculus", category = "C2", subcategory = "CGP")
#> # A tibble: 378,810 × 18
#> gs_cat gs_sub…¹ gs_name gene_…² entre…³ ensem…⁴ human…⁵ human…⁶ human…⁷ gs_id
#> <chr> <chr> <chr> <chr> <int> <chr> <chr> <int> <chr> <chr>
#> 1 C2 CGP ABBUD_… Ahnak 66395 ENSMUS… AHNAK 7.90e4 ENSG00… M1423
#> 2 C2 CGP ABBUD_… Alcam 11658 ENSMUS… ALCAM 2.14e2 ENSG00… M1423
#> 3 C2 CGP ABBUD_… Ankrd40 71452 ENSMUS… ANKRD40 9.14e4 ENSG00… M1423
#> 4 C2 CGP ABBUD_… Arid1a 93760 ENSMUS… ARID1A 8.29e3 ENSG00… M1423
#> 5 C2 CGP ABBUD_… Bckdhb 12040 ENSMUS… BCKDHB 5.94e2 ENSG00… M1423
#> 6 C2 CGP ABBUD_… AU0210… 239691 ENSMUS… C16orf… 1.47e5 ENSG00… M1423
#> 7 C2 CGP ABBUD_… Capn9 73647 ENSMUS… CAPN9 1.08e4 ENSG00… M1423
#> 8 C2 CGP ABBUD_… Cd24a 12484 ENSMUS… CD24 1.00e8 ENSG00… M1423
#> 9 C2 CGP ABBUD_… Cyfip1 20430 ENSMUS… CYFIP1 2.32e4 ENSG00… M1423
#> 10 C2 CGP ABBUD_… Dcaf11 28199 ENSMUS… DCAF11 8.03e4 ENSG00… M1423
#> # … with 378,800 more rows, 8 more variables: gs_pmid <chr>, gs_geoid <chr>,
#> # gs_exact_source <chr>, gs_url <chr>, gs_description <chr>, taxon_id <int>,
#> # ortholog_sources <chr>, num_ortholog_sources <dbl>, and abbreviated
#> # variable names ¹gs_subcat, ²gene_symbol, ³entrez_gene, ⁴ensembl_gene,
#> # ⁵human_gene_symbol, ⁶human_entrez_gene, ⁷human_ensembl_gene
# }