Gene identifier conversion and annotation is a common and critical task in bioinformatics research. Existing databases and tools use different naming conventions for genes or provide only partial annotations, making it challenging to integrate data from multiple sources. geneslator addresses this problem by providing a unified interface for genome annotation across different databases in several model organisms.
Key Features:
geneslator provides species-specific annotation
databases for several organisms. Annotation databases are stored as
SQLite files in different versions of a Zenodo record at https://doi.org/10.5281/zenodo.20448208. Each release
refers to a specific version of the databases. Versions are tagged as
year.month, where year and month
denote the year and the month of the publication of the release
(e.g. ‘2026.03’ for March 2026). Databases are updated on a monthly
basis.
Type availableDatabases() to retrieve the list of
available databases and supported species in the most recent
release.
# List organisms annotated in geneslator
availableDatabases()
#> Name Organism TaxID
#> 8 org.Athaliana.db Arabidopsis thaliana 3702
#> 6 org.Celegans.db Caenorhabditis elegans 6239
#> 4 org.Drerio.db Danio rerio 7955
#> 5 org.Dmelanogaster.db Drosophila melanogaster 7227
#> 1 org.Hsapiens.db Homo sapiens 9606
#> 2 org.Mmusculus.db Mus musculus 10090
#> 3 org.Rnorvegicus.db Rattus norvegicus 10116
#> 7 org.Scerevisiae.db Saccharomyces cerevisiae 559292
#> MD5 Version DOI
#> 8 5161342725c3bc0f7ad5cbe32558f7d4 2026.05 10.5281/zenodo.20457977
#> 6 e862899bb1328407e4641f29f04f5ef3 2026.05 10.5281/zenodo.20457977
#> 4 51fd51a03511d84116436b78633b5eff 2026.05 10.5281/zenodo.20457977
#> 5 479daba8a3fbd4baaaa43224726b775d 2026.05 10.5281/zenodo.20457977
#> 1 ae0b03569e27aec470ed3bef8404238d 2026.05 10.5281/zenodo.20457977
#> 2 81f413bda4ff3ffab4b71d95300c53f9 2026.05 10.5281/zenodo.20457977
#> 3 0828846be2b79802aa70c87aabda15eb 2026.05 10.5281/zenodo.20457977
#> 7 788863602c38669e5b11038a362aab7e 2026.05 10.5281/zenodo.20457977The parameter release.version can be used to retrieve
the list of all available databases in an older release.
# List organisms annotated in geneslator (release December 2025)
availableDatabases(release.version = "2025.12")
#> Name Organism TaxID
#> 1 org.Athaliana.db Arabidopsis thaliana 3702
#> 6 org.Celegans.db Caenorhabditis elegans 6239
#> 8 org.Drerio.db Danio rerio 7955
#> 2 org.Dmelanogaster.db Drosophila melanogaster 7227
#> 3 org.Hsapiens.db Homo sapiens 9606
#> 4 org.Mmusculus.db Mus musculus 10090
#> 5 org.Rnorvegicus.db Rattus norvegicus 10116
#> 7 org.Scerevisiae.db Saccharomyces cerevisiae 559292
#> MD5 Version DOI
#> 1 a292153eee87600c5d8c27977fe7ea45 2025.12 10.5281/zenodo.20448209
#> 6 fb4f03098e379712c17196a1f2b6c6a4 2025.12 10.5281/zenodo.20448209
#> 8 e292dcb2cca5c038d9c369b31ca16d8c 2025.12 10.5281/zenodo.20448209
#> 2 06031138af0a7e44af7d9f938f8f4239 2025.12 10.5281/zenodo.20448209
#> 3 6b6ffd437724b029e3ec5f24ab866d97 2025.12 10.5281/zenodo.20448209
#> 4 1f5af73caf5e89f65e7bcf31669f62d0 2025.12 10.5281/zenodo.20448209
#> 5 7cb6dbed9441b0b142032a5206b66126 2025.12 10.5281/zenodo.20448209
#> 7 7b36f0be0eecce6e12bf05f32d5d8779 2025.12 10.5281/zenodo.20448209A complete list of all available release versions can be obtained
with availableVersions().
# Import human db again. Now cache data will be used to import db
availableVersions()
#> [1] "2025.12" "2026.03" "2026.04" "2026.05"To query a database for a specific organism org, you
first need to import it, by using the GeneslatorDb
function. org can be either the scientific name of the
organism (e.g. “Homo sapiens”) or its Taxonomy ID (e.g. “10090” for
Mouse). The function creates a new GeneslatorDb object for
the requested database, which is then exported to the global environment
of the user as a variable having the same name of the SQLite annotation
database (e.g. org.Hsapiens.db for Human,
org.Mmusculus.db for Mouse).
# Import human annotation db (after downloading it from remote repository)
GeneslatorDb("Homo sapiens")
# Info about the imported human annotation database object
org.Hsapiens.db
#> An object of class "GeneslatorDb"
#> Slot "db":
#> OrgDb object:
#> | DBSCHEMAVERSION: 2.1
#> | DBSCHEMA: NOSCHEMA_DB
#> | ORGANISM: Homo sapiens
#> | SPECIES: Homo sapiens
#> | CENTRALID: GID
#> | Taxonomy ID: 9606
#> | Db type: OrgDb
#> | Supporting package: AnnotationDbi
# Import mouse annotation database using its Taxonomy ID
GeneslatorDb("10090")
# Info about the imported human annotation database object
org.Mmusculus.db
#> An object of class "GeneslatorDb"
#> Slot "db":
#> OrgDb object:
#> | DBSCHEMAVERSION: 2.1
#> | DBSCHEMA: NOSCHEMA_DB
#> | ORGANISM: Mus musculus
#> | SPECIES: Mus musculus
#> | CENTRALID: GID
#> | Taxonomy ID: 10090
#> | Db type: OrgDb
#> | Supporting package: AnnotationDbiWhen called for the first time on a specific organism,
GeneslatorDb function downloads the annotation database
from the remote repository, stores a local copy into your R cache folder
and finally imports the database. Future calls to
GeneslatorDb function will simply import the database from
your cache, unless a new version of the database is present in the
remote repository. In the latter case, you will be notified about that
and you will be able to choose whether or not updating your local copy
in the R cache, before importing the database.
By default, GeneslatorDb queries the latest release. To
retrieve an older version of the database, you can set the
release.version parameter to the desired release version.
Again, a local copy of the database (distinct from the latest release)
will be stored into your R cache folder, so that future calls to the
same database will simply import it from your cache.
# Import yeast annotation db from release 2025.12 (December 2025)
GeneslatorDb("Saccharomyces cerevisiae",release.version = "2025.12")
# Info about the imported human annotation database object
org.Scerevisiae.db
#> An object of class "GeneslatorDb"
#> Slot "db":
#> OrgDb object:
#> | DBSCHEMAVERSION: 2.1
#> | DBSCHEMA: NOSCHEMA_DB
#> | ORGANISM: Saccharomyces cerevisiae
#> | SPECIES: Saccharomyces cerevisiae
#> | CENTRALID: GID
#> | Taxonomy ID: 559292
#> | Db type: OrgDb
#> | Supporting package: AnnotationDbiAnnotation databases are internally represented as collections of R dataframes that can be queried through functions that map a set of values of an input column (the key) of a dataframe to the corresponding values of one or more output columns of the same or a different dataframe.
Function keytypes() lists all columns that can be used
as keys.
# Get all columns that can be used as keys in mouse annotation db
geneslator::keytypes(org.Mmusculus.db)
#> [1] "ALIAS" "ENSEMBL" "ENSEMBLOLD" "ENTREZID"
#> [5] "ENTREZIDOLD" "GENENAME" "GENETYPE" "GO"
#> [9] "KEGGPATH" "MGI" "ORTHOFLY" "ORTHOHUMAN"
#> [13] "ORTHORAT" "ORTHOWORM" "ORTHOYEAST" "ORTHOZEBRAFISH"
#> [17] "REACTOMEPATH" "SYMBOL" "UNIPROT" "WIKIPATH"Similarly, function columns() lists all possible output
columns.
# Get all available types of output values in mouse annotation db
geneslator::columns(org.Mmusculus.db)
#> [1] "ALIAS" "ENSEMBL" "ENSEMBLOLD" "ENTREZID"
#> [5] "ENTREZIDOLD" "GENENAME" "GENETYPE" "GO"
#> [9] "GOEVIDENCE" "GONAME" "GOTYPE" "KEGGPATH"
#> [13] "KEGGPATHNAME" "MGI" "ORTHOFLY" "ORTHOHUMAN"
#> [17] "ORTHORAT" "ORTHOWORM" "ORTHOYEAST" "ORTHOZEBRAFISH"
#> [21] "REACTOMEPATH" "REACTOMEPATHNAME" "SYMBOL" "UNIPROT"
#> [25] "WIKIPATH" "WIKIPATHNAME"Note that the output of the two functions is different, because only
identifier columns can be used as keys, while any column can be an
output column. Type help("columns","geneslator") to see the
complete list of columns available in the annotation databases of
geneslator, together with their description.
Function keys() is used to retrieve all values of a
column in an annotation database.
Columns of the annotation databases can be queried using properly
re-defined versions of the well-known query functions
select() and mapIds() of
AnnotationDbi R package.
The select() function allows you to query an input key
column of the annotation database (keytype argument) and
retrieve related information across one or more other columns
(columns argument).
The output of select() is a dataframe with all columns
specified by keytype and columns arguments and
one row for each mapping found between input and output values.
# Map NCBI Gene IDs to gene symbols and Ensembl IDs in Human
genes <- c("1", "2", "9")
result <- geneslator::select(org.Hsapiens.db, keys = genes,
columns = c("SYMBOL", "ENSEMBL"), keytype = "ENTREZID")
result
#> ENTREZID SYMBOL ENSEMBL
#> 1 1 A1BG ENSG00000121410
#> 2 2 A2M ENSG00000175899
#> 3 9 NAT1 ENSG00000171428Unlike select(), mapIds() maps an input key
column (argument keytype) to a single output column
(argument column).
# Convert gene symbols to ENTREZ IDs (first match only)
genes <- c("TP53", "BRCA1", "EGFR")
entrez_ids <- geneslator::mapIds(org.Hsapiens.db, keys = genes,
column = "ENTREZID", keytype = "SYMBOL")
entrez_ids
#> TP53 BRCA1 EGFR
#> "7157" "672" "1956"By default, the return type is a named vector, where each value is
the first mapping found (if any) for a given key, even if multiple
output values map to that key. However, this behaviour can be changed
through the multiVals parameter, which also controls the
shape of the output result. For example, multiVals="list"
produces a list object with all matches found for each input.
In select() and mapIds() functions, by
default, queries of annotation databases involving gene symbols are
performed by first looking at column “SYMBOL” and, if no mapping is
found using “SYMBOL”, the query is performed using the “ALIAS” column.
This is helpful when users unknowingly start from a list of names that
is actually a mix of official gene symbols and aliases.
This behaviour of select() and mapIds() can
be controlled through the boolean parameter search.aliases,
whose default value is TRUE.
In the following example, “BRCAI” is actually an alias of BRCA1 gene,
while “PTEN” is the official symbol of the PTEN gene. When mapping these
two keys (treated as SYMBOL) to ENTREZID by using select(),
BRCAI is correctly viewed as an alias of BRCA1 gene and mapped to the
NCBI gene id of BRCA1.
# Map gene symbols to their NCBI gene ids, querying also the ALIAS column
# if needed
result <- geneslator::select(org.Hsapiens.db, keys = c("BRCAI","PTEN"),
columns = "ENTREZID", keytype = "SYMBOL")
result
#> SYMBOL ENTREZID
#> 1 BRCAI 672
#> 2 PTEN 5728Whenever ALIAS column is used in place of SYMBOL column (as in this
example), a warning message is sent to the user. If we repeat the same
query with search.aliases=FALSE, select() is
unable to map BRCAI to the correct NCBI gene id.
Gene identifiers and symbols can change over time or become
deprecated, as a result of periodic updates of databases such as NCBI or
Ensembl. This could be troublesome in annotation tasks, especially when
user starts from an old set of identifiers or symbols. To overcome this,
annotation databases in geneslator contain columns
“ENTREZIDOLD” and “ENSEMBLOLD”, which collect old gene identifiers of
NCBI Gene and Ensembl databases. By default, these columns are queried
by select() and mapIds() methods whenever a
gene cannot be annotated using current identifiers. This behaviour can
be controlled through the boolean parameter
search.archives, whose default value is
TRUE.
For example, in the following query key “3” corresponds to the old
NCBI Gene identifier of gene “A2MP1”. By using archived data,
select() is able to correctly map NCBI Gene ID “3” to gene
symbol “A2MP1”.
# Map NCBI gene id 3 to gene symbol, using both current and old identifiers
result <- geneslator::select(org.Hsapiens.db, keys = "3", columns = "SYMBOL",
keytype = "ENTREZID")
result
#> ENTREZID SYMBOL
#> 1 3 PZP2PWhenever archived identifiers are used to solve a query (as in this
example), a warning message is sent to the user. If we set
search.archives=FALSE, select() is unable to
map the identifier to the correct symbol.
In queries involving orthologs mapping, by default,
select() returns all possible ortholog mappings. This
behavior is controlled by parameter orthologs.mapping,
whose default value is “multiple”.
# Get orthologs of yeast genes CHC1 and NMA2 in worm and fly
result <- geneslator::select(org.Hsapiens.db, keys = c("CHC1","SCAMP5"),
columns = c("ORTHOWORM", "ORTHOFLY"), keytype = "SYMBOL")
result
#> SYMBOL ORTHOWORM ORTHOFLY
#> 1 CHC1 ran-3 CG33288
#> 2 CHC1 ran-3 CG7420
#> 3 CHC1 ran-3 Rcc1
#> 4 SCAMP5 scm-1 ScampTo get only the first ortholog, set
orthologs.mapping="single":
result <- geneslator::select(org.Hsapiens.db, keys = c("CHC1","SCAMP5"),
columns = c("ORTHOWORM", "ORTHOFLY"), keytype = "SYMBOL",
orthologs.mapping = "single")
result
#> SYMBOL ORTHOWORM ORTHOFLY
#> 1 CHC1 ran-3 CG33288
#> 2 SCAMP5 scm-1 ScampFor mapIds() function, the option
orthologs.mapping is absent, because the number of mapped
orthologs can be directly controlled through parameter
multiVals.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] AnnotationDbi_1.75.0 IRanges_2.47.2 S4Vectors_0.51.3
#> [4] Biobase_2.73.1 BiocGenerics_0.59.7 generics_0.1.4
#> [7] geneslator_0.99.2 BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.6 sass_0.4.10 xml2_1.5.2
#> [4] RSQLite_3.53.1 zen4R_0.10.5 digest_0.6.39
#> [7] evaluate_1.0.5 fastmap_1.2.0 blob_1.3.0
#> [10] plyr_1.8.9 jsonlite_2.0.0 DBI_1.3.0
#> [13] BiocManager_1.30.27 httr_1.4.8 XML_3.99-0.23
#> [16] Biostrings_2.81.3 jquerylib_0.1.4 cli_3.6.6
#> [19] rlang_1.2.0 crayon_1.5.3 XVector_0.53.0
#> [22] bit64_4.8.2 cachem_1.1.0 yaml_2.3.12
#> [25] otel_0.2.0 tools_4.6.0 memoise_2.0.1
#> [28] curl_7.1.0 buildtools_1.0.0 vctrs_0.7.3
#> [31] R6_2.6.1 png_0.1-9 lifecycle_1.0.5
#> [34] KEGGREST_1.53.0 Seqinfo_1.3.0 bit_4.6.0
#> [37] pkgconfig_2.0.3 bslib_0.11.0 Rcpp_1.1.1-1.1
#> [40] xfun_0.58 keyring_1.4.1 sys_3.4.3
#> [43] knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
#> [46] maketools_1.3.2 compiler_4.6.0Micale G, Cavallaro G, Privitera GF (2026). geneslator: A Comprehensive Gene Identifier Conversion Tool. R package version 0.99.0.
Pages H, Carlson M, Falcon S, Li N (2024). AnnotationDbi: Manipulation of SQLite-based annotations in Bioconductor. R package.
NCBI Gene: https://www.ncbi.nlm.nih.gov/gene
Ensembl: https://www.ensembl.org
UniProt: https://www.uniprot.org
Gene Ontology: http://geneontology.org
KEGG: https://www.kegg.jp
Reactome: https://reactome.org
WikiPathways: https://www.wikipathways.org
Alliance of Genome Resources: https://www.alliancegenome.org