Skip to contents

Retrieve a data frame of gene sets and their member genes. The available species and collections can be checked with msigdbr_species() and msigdbr_collections().

Usage

msigdbr(species = "Homo sapiens", category = NULL, subcategory = NULL)

Arguments

species

Species name, such as Homo sapiens or Mus musculus.

category

MSigDB collection abbreviation, such as H or C1.

subcategory

MSigDB sub-collection abbreviation, such as CGP or BP.

Value

A data frame of gene sets with one gene per row.

Examples

# get all human gene sets
# \donttest{
msigdbr(species = "Homo sapiens")
#> # A tibble: 4,440,672 × 15
#>    gs_cat gs_sub…¹ gs_name gene_…² entre…³ ensem…⁴ human…⁵ human…⁶ human…⁷ gs_id
#>    <chr>  <chr>    <chr>   <chr>     <int> <chr>   <chr>     <int> <chr>   <chr>
#>  1 C3     MIR:MIR… AAACCA… ABCC4     10257 ENSG00… ABCC4     10257 ENSG00… M126…
#>  2 C3     MIR:MIR… AAACCA… ABRAXA…   23172 ENSG00… ABRAXA…   23172 ENSG00… M126…
#>  3 C3     MIR:MIR… AAACCA… ACTN4        81 ENSG00… ACTN4        81 ENSG00… M126…
#>  4 C3     MIR:MIR… AAACCA… ACTN4        81 ENSG00… ACTN4        81 ENSG00… M126…
#>  5 C3     MIR:MIR… AAACCA… ACVR1        90 ENSG00… ACVR1        90 ENSG00… M126…
#>  6 C3     MIR:MIR… AAACCA… ADAM9      8754 ENSG00… ADAM9      8754 ENSG00… M126…
#>  7 C3     MIR:MIR… AAACCA… ADAM9      8754 ENSG00… ADAM9      8754 ENSG00… M126…
#>  8 C3     MIR:MIR… AAACCA… ADAMTS5   11096 ENSG00… ADAMTS5   11096 ENSG00… M126…
#>  9 C3     MIR:MIR… AAACCA… AMER2    219287 ENSG00… AMER2    219287 ENSG00… M126…
#> 10 C3     MIR:MIR… AAACCA… ANK2        287 ENSG00… ANK2        287 ENSG00… M126…
#> # … with 4,440,662 more rows, 5 more variables: gs_pmid <chr>, gs_geoid <chr>,
#> #   gs_exact_source <chr>, gs_url <chr>, gs_description <chr>, and abbreviated
#> #   variable names ¹​gs_subcat, ²​gene_symbol, ³​entrez_gene, ⁴​ensembl_gene,
#> #   ⁵​human_gene_symbol, ⁶​human_entrez_gene, ⁷​human_ensembl_gene
# }

# get mouse C2 (curated) CGP (chemical and genetic perturbations) gene sets
# \donttest{
msigdbr(species = "Mus musculus", category = "C2", subcategory = "CGP")
#> # A tibble: 378,810 × 18
#>    gs_cat gs_sub…¹ gs_name gene_…² entre…³ ensem…⁴ human…⁵ human…⁶ human…⁷ gs_id
#>    <chr>  <chr>    <chr>   <chr>     <int> <chr>   <chr>     <int> <chr>   <chr>
#>  1 C2     CGP      ABBUD_… Ahnak     66395 ENSMUS… AHNAK    7.90e4 ENSG00… M1423
#>  2 C2     CGP      ABBUD_… Alcam     11658 ENSMUS… ALCAM    2.14e2 ENSG00… M1423
#>  3 C2     CGP      ABBUD_… Ankrd40   71452 ENSMUS… ANKRD40  9.14e4 ENSG00… M1423
#>  4 C2     CGP      ABBUD_… Arid1a    93760 ENSMUS… ARID1A   8.29e3 ENSG00… M1423
#>  5 C2     CGP      ABBUD_… Bckdhb    12040 ENSMUS… BCKDHB   5.94e2 ENSG00… M1423
#>  6 C2     CGP      ABBUD_… AU0210…  239691 ENSMUS… C16orf…  1.47e5 ENSG00… M1423
#>  7 C2     CGP      ABBUD_… Capn9     73647 ENSMUS… CAPN9    1.08e4 ENSG00… M1423
#>  8 C2     CGP      ABBUD_… Cd24a     12484 ENSMUS… CD24     1.00e8 ENSG00… M1423
#>  9 C2     CGP      ABBUD_… Cyfip1    20430 ENSMUS… CYFIP1   2.32e4 ENSG00… M1423
#> 10 C2     CGP      ABBUD_… Dcaf11    28199 ENSMUS… DCAF11   8.03e4 ENSG00… M1423
#> # … with 378,800 more rows, 8 more variables: gs_pmid <chr>, gs_geoid <chr>,
#> #   gs_exact_source <chr>, gs_url <chr>, gs_description <chr>, taxon_id <int>,
#> #   ortholog_sources <chr>, num_ortholog_sources <dbl>, and abbreviated
#> #   variable names ¹​gs_subcat, ²​gene_symbol, ³​entrez_gene, ⁴​ensembl_gene,
#> #   ⁵​human_gene_symbol, ⁶​human_entrez_gene, ⁷​human_ensembl_gene
# }