We will load clustermole along with dplyr to help with summarizing the data.

You can use clustermole as a simple database and get a table of all cell type markers.

markers = clustermole_markers(species = "hs")
markers
#> # A tibble: 422,292 x 8
#>    celltype_full      db    species organ celltype   n_genes gene_original gene 
#>    <chr>              <chr> <chr>   <chr> <chr>        <int> <chr>         <chr>
#>  1 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 ACCSL         ACCSL
#>  2 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 ACVR1B        ACVR…
#>  3 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 ASF1B         ASF1B
#>  4 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 BCL2L10       BCL2…
#>  5 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 BLCAP         BLCAP
#>  6 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 CASC3         CASC3
#>  7 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 CLEC10A       CLEC…
#>  8 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 CNOT11        CNOT…
#>  9 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 DCLK2         DCLK2
#> 10 1-cell stage cell… Cell… Human   Embr… 1-cell st…      32 DHCR7         DHCR7
#> # … with 422,282 more rows

Each row contains a gene and a cell type associated with it. The gene column is the gene symbol (human or mouse) and the celltype_full column contains the detailed cell type string including the species and the original database.

Check the total number of the available cell types.

markers %>% distinct(celltype_full) %>% nrow()
#> [1] 3039

Check the source databases and the number of cell types from each.

markers %>% distinct(celltype_full, db) %>% count(db)
#> # A tibble: 7 x 2
#>   db             n
#> * <chr>      <int>
#> 1 ARCHS4       108
#> 2 CellMarker   692
#> 3 MSigDB       295
#> 4 PanglaoDB    322
#> 5 SaVanT       619
#> 6 TISSUES      537
#> 7 xCell        466

Check the number of cell types per species (not available for all cell types).

markers %>% distinct(celltype_full, species) %>% count(species)
#> # A tibble: 3 x 2
#>   species     n
#> * <chr>   <int>
#> 1 ""        323
#> 2 "Human"  1866
#> 3 "Mouse"   850

Check the number of available cell types per organ (not available for all cell types).

markers %>% distinct(celltype_full, organ) %>% count(organ, sort = TRUE)
#> # A tibble: 93 x 2
#>    organ                  n
#>    <chr>              <int>
#>  1 ""                  2160
#>  2 "Brain"              122
#>  3 "Immune system"       50
#>  4 "Lung"                47
#>  5 "Kidney"              43
#>  6 "Bone marrow"         42
#>  7 "Liver"               38
#>  8 "Blood"               33
#>  9 "Embryo"              30
#> 10 "Peripheral blood"    29
#> # … with 83 more rows

Check the package version since the database contents may change.

packageVersion("clustermole")
#> [1] '1.1.0'