--- title: " tTEscanR Codon Frequency-per-Gene Matrix" output: BiocStyle::html_document: toc: true toc_float: true theme: default css: style.css vignette: > %\VignetteIndexEntry{3. Codon Frequency-per-Fene Extraction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r file_settings, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r notes_format, echo = FALSE, results = 'asis'} cat(" ") ```
# Overview In this document, we present all the functionalities that the codon frequency-per-gene module incorporates. ```{r setup, message = FALSE, warning = FALSE} # install.packages("/avarassanchez/tTEscanR") library(tTEscanR) ``` ::: {.note} The reference **codon frequency-per-gene** matrix represents the codon distribution of each protein-coding gene in a reference genome. ::: The **`getCodonFreq()`** function computes the reference codon frequency-per-gene matrix. It retrieves the codon composition table for any organism supported by **Ensembl**. The function returns a list containing: (i) a codon frequency-per-gene-matrix, and (ii) a gene translator table mapping between Ensembl IDs and gene names. This output can be used directly in downstream codon usage analyses. ```{r load_biomart, message = FALSE, warning = FALSE} library(biomaRt) ``` ```{r datasets_ensembl, message = TRUE, warning = FALSE, eval = FALSE} datasets <- biomaRt::listDatasets(useEnsembl(biomart = "ensembl")) ``` One of the main advantages of the **`getCodonFreq()`** function is that it can be applied to any organism available at Ensembl. As a reference we can use `hsapiens_gene_ensembl` for hg38 human genome and `mmusculus_gene_ensembl` for mm39 mouse genome. Other organism names can be extracted using `datasets$dataset`. In the example below, we will use both human and mouse reference dataset and the canonical and length filters to show the potential of the **`getCodonFreq()`** function. **Example 1:**
  • Reference hg38 dataset
  • Canonical filter
  • Removing mitochondrial genes
  • External gene name as output
  • ```{r canonical_codon_freq, message = TRUE, warning = FALSE, eval = FALSE} codon_freq_results_canonical <- getCodonFreq( dataset_name = "hsapiens_gene_ensembl", filter = "canonical", retain_geneversion = TRUE, retain_mitochondrial = FALSE, out_format = "external_gene_name" ) ``` ```{r canonical_results, message = TRUE, warning = FALSE, eval = FALSE} hg38_codon_freq_table_canonical <- codon_freq_results_canonical[[2]] hg38_gene_translator_table_canonical <- codon_freq_results_canonical[[1]] ``` **Example 2:**
  • Reference mm39 dataset
  • Length filter
  • Keeping mitochondrial genes
  • Ensembl gene id as output
  • ```{r length_codon_freq, message = TRUE, warning = FALSE, eval = FALSE} codon_freq_results_length <- getCodonFreq( dataset_name = "mmusculus_gene_ensembl", filter = "length", retain_mitochondrial = TRUE, out_format = "ensembl_gene_id" ) ``` ```{r length_results, message = TRUE, warning = FALSE, eval = FALSE} mm39_codon_freq_table_length <- codon_freq_results_length[[2]] mm39_gene_translator_table_length <- codon_freq_results_length[[1]] ``` **Example 3:**
  • Using a
  • Canonical filter
  • Removing mitochondrial genes
  • Ensembl transcript id as output
  • ```{r transcript_codon_freq, message = TRUE, warning = FALSE, eval = FALSE} codon_freq_results_transcript <- getCodonFreq( dataset_name = "drerio_gene_ensembl", filter = "length", retain_mitochondrial = TRUE, out_format = "ensembl_transcript_id" ) ``` ```{r transcript_results, message = TRUE, warning = FALSE, eval = FALSE} drerio_codon_freq_table_length <- codon_freq_results_transcript[[2]] drerio_gene_translator_table_length <- codon_freq_results_transcript[[1]] ``` ```{r session-info, echo=FALSE} sessionInfo() ```