In this document, we present all the functionalities that the codon frequency-per-gene module incorporates.
The reference codon frequency-per-gene matrix represents the codon distribution of each protein-coding gene in a reference genome.
The getCodonFreq() function computes
the reference codon frequency-per-gene matrix. It retrieves the codon
composition table for any organism supported by
Ensembl. The function returns a list containing: (i) a
codon frequency-per-gene-matrix, and (ii) a gene translator table
mapping between Ensembl IDs and gene names. This output can be used
directly in downstream codon usage analyses.
One of the main advantages of the
getCodonFreq() function is that it can be
applied to any organism available at Ensembl. As a reference we can use
hsapiens_gene_ensembl for hg38 human genome and
mmusculus_gene_ensembl for mm39 mouse genome. Other
organism names can be extracted using datasets$dataset.
In the example below, we will use both human and mouse reference
dataset and the canonical and length filters to show the potential of
the getCodonFreq() function.
codon_freq_results_canonical <- getCodonFreq(
dataset_name = "hsapiens_gene_ensembl", filter = "canonical",
retain_geneversion = TRUE,
retain_mitochondrial = FALSE, out_format = "external_gene_name"
)hg38_codon_freq_table_canonical <- codon_freq_results_canonical[[2]]
hg38_gene_translator_table_canonical <- codon_freq_results_canonical[[1]]codon_freq_results_length <- getCodonFreq(
dataset_name = "mmusculus_gene_ensembl", filter = "length",
retain_mitochondrial = TRUE, out_format = "ensembl_gene_id"
)mm39_codon_freq_table_length <- codon_freq_results_length[[2]]
mm39_gene_translator_table_length <- codon_freq_results_length[[1]]codon_freq_results_transcript <- getCodonFreq(
dataset_name = "drerio_gene_ensembl", filter = "length",
retain_mitochondrial = TRUE, out_format = "ensembl_transcript_id"
)drerio_codon_freq_table_length <- codon_freq_results_transcript[[2]]
drerio_gene_translator_table_length <- codon_freq_results_transcript[[1]]#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] biomaRt_2.69.0 tTEscanR_0.99.0 BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] rappdirs_0.3.4 sass_0.4.10 generics_0.1.4
#> [4] RSQLite_3.53.2 stringi_1.8.7 hms_1.1.4
#> [7] digest_0.6.39 magrittr_2.0.5 evaluate_1.0.5
#> [10] fastmap_1.2.0 blob_1.3.0 jsonlite_2.0.0
#> [13] progress_1.2.3 AnnotationDbi_1.75.0 DBI_1.3.0
#> [16] BiocManager_1.30.27 httr_1.4.8 Biostrings_2.81.3
#> [19] httr2_1.2.3 jquerylib_0.1.4 cli_3.6.6
#> [22] rlang_1.2.0 crayon_1.5.3 XVector_0.53.0
#> [25] dbplyr_2.6.0 Biobase_2.73.1 bit64_4.8.2
#> [28] cachem_1.1.0 yaml_2.3.12 otel_0.2.0
#> [31] tools_4.6.1 memoise_2.0.1 dplyr_1.2.1
#> [34] filelock_1.0.3 BiocGenerics_0.59.8 curl_7.1.0
#> [37] png_0.1-9 buildtools_1.0.0 vctrs_0.7.3
#> [40] R6_2.6.1 stats4_4.6.1 BiocFileCache_3.3.0
#> [43] lifecycle_1.0.5 Seqinfo_1.3.0 KEGGREST_1.53.4
#> [46] stringr_1.6.0 IRanges_2.47.2 S4Vectors_0.51.5
#> [49] bit_4.6.0 pkgconfig_2.0.3 pillar_1.11.1
#> [52] bslib_0.11.0 glue_1.8.1 xfun_0.59
#> [55] tibble_3.3.1 tidyselect_1.2.1 sys_3.4.3
#> [58] knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
#> [61] maketools_1.3.2 compiler_4.6.1 prettyunits_1.2.0