tTEscanR Codon Frequency-per-Gene Matrix


Overview

In this document, we present all the functionalities that the codon frequency-per-gene module incorporates.

# install.packages("/avarassanchez/tTEscanR")
library(tTEscanR)

The reference codon frequency-per-gene matrix represents the codon distribution of each protein-coding gene in a reference genome.

The getCodonFreq() function computes the reference codon frequency-per-gene matrix. It retrieves the codon composition table for any organism supported by Ensembl. The function returns a list containing: (i) a codon frequency-per-gene-matrix, and (ii) a gene translator table mapping between Ensembl IDs and gene names. This output can be used directly in downstream codon usage analyses.

library(biomaRt)
datasets <- biomaRt::listDatasets(useEnsembl(biomart = "ensembl"))

One of the main advantages of the getCodonFreq() function is that it can be applied to any organism available at Ensembl. As a reference we can use hsapiens_gene_ensembl for hg38 human genome and mmusculus_gene_ensembl for mm39 mouse genome. Other organism names can be extracted using datasets$dataset.

In the example below, we will use both human and mouse reference dataset and the canonical and length filters to show the potential of the getCodonFreq() function.

Example 1:
  • Reference hg38 dataset
  • Canonical filter
  • Removing mitochondrial genes
  • External gene name as output
  • codon_freq_results_canonical <- getCodonFreq(
        dataset_name = "hsapiens_gene_ensembl", filter = "canonical",
        retain_geneversion = TRUE,
        retain_mitochondrial = FALSE, out_format = "external_gene_name"
    )
    hg38_codon_freq_table_canonical <- codon_freq_results_canonical[[2]]
    hg38_gene_translator_table_canonical <- codon_freq_results_canonical[[1]]
    Example 2:
  • Reference mm39 dataset
  • Length filter
  • Keeping mitochondrial genes
  • Ensembl gene id as output
  • codon_freq_results_length <- getCodonFreq(
        dataset_name = "mmusculus_gene_ensembl", filter = "length",
        retain_mitochondrial = TRUE, out_format = "ensembl_gene_id"
    )
    mm39_codon_freq_table_length <- codon_freq_results_length[[2]]
    mm39_gene_translator_table_length <- codon_freq_results_length[[1]]
    Example 3:
  • Using a
  • Canonical filter
  • Removing mitochondrial genes
  • Ensembl transcript id as output
  • codon_freq_results_transcript <- getCodonFreq(
        dataset_name = "drerio_gene_ensembl", filter = "length",
        retain_mitochondrial = TRUE, out_format = "ensembl_transcript_id"
    )
    drerio_codon_freq_table_length <- codon_freq_results_transcript[[2]]
    drerio_gene_translator_table_length <- codon_freq_results_transcript[[1]]
    #> R version 4.6.1 (2026-06-24)
    #> Platform: x86_64-pc-linux-gnu
    #> Running under: Ubuntu 26.04 LTS
    #> 
    #> Matrix products: default
    #> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
    #> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
    #> 
    #> locale:
    #>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
    #>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
    #>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
    #>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
    #>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    #> 
    #> time zone: Etc/UTC
    #> tzcode source: system (glibc)
    #> 
    #> attached base packages:
    #> [1] stats     graphics  grDevices utils     datasets  methods   base     
    #> 
    #> other attached packages:
    #> [1] biomaRt_2.69.0   tTEscanR_0.99.0  BiocStyle_2.41.0
    #> 
    #> loaded via a namespace (and not attached):
    #>  [1] rappdirs_0.3.4       sass_0.4.10          generics_0.1.4      
    #>  [4] RSQLite_3.53.2       stringi_1.8.7        hms_1.1.4           
    #>  [7] digest_0.6.39        magrittr_2.0.5       evaluate_1.0.5      
    #> [10] fastmap_1.2.0        blob_1.3.0           jsonlite_2.0.0      
    #> [13] progress_1.2.3       AnnotationDbi_1.75.0 DBI_1.3.0           
    #> [16] BiocManager_1.30.27  httr_1.4.8           Biostrings_2.81.3   
    #> [19] httr2_1.2.3          jquerylib_0.1.4      cli_3.6.6           
    #> [22] rlang_1.2.0          crayon_1.5.3         XVector_0.53.0      
    #> [25] dbplyr_2.6.0         Biobase_2.73.1       bit64_4.8.2         
    #> [28] cachem_1.1.0         yaml_2.3.12          otel_0.2.0          
    #> [31] tools_4.6.1          memoise_2.0.1        dplyr_1.2.1         
    #> [34] filelock_1.0.3       BiocGenerics_0.59.8  curl_7.1.0          
    #> [37] png_0.1-9            buildtools_1.0.0     vctrs_0.7.3         
    #> [40] R6_2.6.1             stats4_4.6.1         BiocFileCache_3.3.0 
    #> [43] lifecycle_1.0.5      Seqinfo_1.3.0        KEGGREST_1.53.4     
    #> [46] stringr_1.6.0        IRanges_2.47.2       S4Vectors_0.51.5    
    #> [49] bit_4.6.0            pkgconfig_2.0.3      pillar_1.11.1       
    #> [52] bslib_0.11.0         glue_1.8.1           xfun_0.59           
    #> [55] tibble_3.3.1         tidyselect_1.2.1     sys_3.4.3           
    #> [58] knitr_1.51           htmltools_0.5.9      rmarkdown_2.31      
    #> [61] maketools_1.3.2      compiler_4.6.1       prettyunits_1.2.0