---
title: "
tTEscanR Codon Frequency-per-Gene Matrix"
output:
BiocStyle::html_document:
toc: true
toc_float: true
theme: default
css: style.css
vignette: >
%\VignetteIndexEntry{3. Codon Frequency-per-Fene Extraction}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: references.bib
---
```{r file_settings, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r notes_format, echo = FALSE, results = 'asis'}
cat("
")
```
# Overview
In this document, we present all the functionalities that the codon
frequency-per-gene module incorporates.
```{r setup, message = FALSE, warning = FALSE}
# install.packages("/avarassanchez/tTEscanR")
library(tTEscanR)
```
::: {.note}
The reference **codon frequency-per-gene** matrix represents the codon
distribution of each protein-coding gene in a reference genome.
:::
The **`getCodonFreq()`** function computes the reference codon
frequency-per-gene matrix. It retrieves the codon composition table for any
organism supported by **Ensembl**. The function returns a list containing:
(i) a codon frequency-per-gene-matrix, and (ii) a gene translator table mapping
between Ensembl IDs and gene names. This output can be used directly in
downstream codon usage analyses.
```{r load_biomart, message = FALSE, warning = FALSE}
library(biomaRt)
```
```{r datasets_ensembl, message = TRUE, warning = FALSE, eval = FALSE}
datasets <- biomaRt::listDatasets(useEnsembl(biomart = "ensembl"))
```
One of the main advantages of the **`getCodonFreq()`** function is that it can
be applied to any organism available at Ensembl. As a reference we can use
`hsapiens_gene_ensembl` for hg38 human genome and `mmusculus_gene_ensembl` for
mm39 mouse genome. Other organism names can be extracted using
`datasets$dataset`.
In the example below, we will use both human and mouse reference dataset and
the canonical and length filters to show the potential of the
**`getCodonFreq()`** function.
**Example 1:**
Reference hg38 dataset
Canonical filter
Removing mitochondrial genes
External gene name as output
```{r canonical_codon_freq, message = TRUE, warning = FALSE, eval = FALSE}
codon_freq_results_canonical <- getCodonFreq(
dataset_name = "hsapiens_gene_ensembl", filter = "canonical",
retain_geneversion = TRUE,
retain_mitochondrial = FALSE, out_format = "external_gene_name"
)
```
```{r canonical_results, message = TRUE, warning = FALSE, eval = FALSE}
hg38_codon_freq_table_canonical <- codon_freq_results_canonical[[2]]
hg38_gene_translator_table_canonical <- codon_freq_results_canonical[[1]]
```
**Example 2:**
Reference mm39 dataset
Length filter
Keeping mitochondrial genes
Ensembl gene id as output
```{r length_codon_freq, message = TRUE, warning = FALSE, eval = FALSE}
codon_freq_results_length <- getCodonFreq(
dataset_name = "mmusculus_gene_ensembl", filter = "length",
retain_mitochondrial = TRUE, out_format = "ensembl_gene_id"
)
```
```{r length_results, message = TRUE, warning = FALSE, eval = FALSE}
mm39_codon_freq_table_length <- codon_freq_results_length[[2]]
mm39_gene_translator_table_length <- codon_freq_results_length[[1]]
```
**Example 3:**
Using a
Canonical filter
Removing mitochondrial genes
Ensembl transcript id as output
```{r transcript_codon_freq, message = TRUE, warning = FALSE, eval = FALSE}
codon_freq_results_transcript <- getCodonFreq(
dataset_name = "drerio_gene_ensembl", filter = "length",
retain_mitochondrial = TRUE, out_format = "ensembl_transcript_id"
)
```
```{r transcript_results, message = TRUE, warning = FALSE, eval = FALSE}
drerio_codon_freq_table_length <- codon_freq_results_transcript[[2]]
drerio_gene_translator_table_length <- codon_freq_results_transcript[[1]]
```
```{r session-info, echo=FALSE}
sessionInfo()
```